SQUARE: A Benchmark for Research on Computing Crowd Consensus

Overview

SQUARE (Statistical QUality Assurance Robustness Evaluation) is a benchmark for comparative evaluation of consensus methods for human computation / crowdsourcing (i.e., how to generate the best possible answer for each question, given multiple judgments per question). Like any benchmark, SQUARE's goals are to assess the relative benefit of new methods, understand where further research is needed, and measure field progress over time. SQUARE includes benchmark datasets, defined tasks, evaluation metrics, and reference implementations with empirical results for several popular methods.

PAPER: Aashish Sheshadri and Matthew Lease. SQUARE: A Benchmark for Research on Computing Crowd Consensus. In Proceedings of the 1st AAAI Conference on Human Computation (HCOMP), 2013.

CODE: SQUARE software is released as an open-source library for which we welcome community participation and contributions. Download the code.


See also: The Shared Task Challenge at the HCOMP'13 workshop on Crowdsourcing at Scale

The National Science Foundation

For questions and comments email