Overview
SQUARE (Statistical QUality Assurance Robustness Evaluation) is a benchmark for comparative evaluation of consensus methods for human computation / crowdsourcing (i.e., how to generate the best possible answer for each question, given multiple judgments per question). Like any benchmark, SQUARE's goals are to assess the relative benefit of new methods, understand where further research is needed, and measure field progress over time. SQUARE includes benchmark datasets, defined tasks, evaluation metrics, and reference implementations with empirical results for several popular methods.
PAPER: Aashish Sheshadri and Matthew Lease. SQUARE: A Benchmark for Research on Computing Crowd Consensus. In Proceedings of the 1st AAAI Conference on Human Computation (HCOMP), 2013.
- 🏅 HCOMP 2024 Inaugural Test of Time and Impact Award (selection among papers from 2013-2014). See Award Presentation slides.
- See also: Aashish Sheshadri and Matthew Lease. SQUARE: Benchmarking Crowd Consensus at MediaEval. In Proceedings of MediaEval: Crowdsourcing in Multimedia Task, 2013.
CODE: SQUARE software is released as an open-source library for which we welcome community participation and contributions. Download the code.
- October 26, 2015: Version 2.0 of SQUARE now released! (new GIT repo to avoid impacting those using Version 1.0). See 2.0 page for details of what's new!
See also: The Shared Task Challenge at the HCOMP'13 workshop on Crowdsourcing at Scale