Datasets
All datasets (link updated October 18, 2024)
Binary Classification
BM - Positive/negative sentiment labels for tweets.
HCB - Relevance judgments for pairs of search queries and Web pages.
RTE - Judgments for textual entailment.
SpamCF - Judgments about whether or not an AMT HIT should be considered a "spam" task.
TEMP - Judgments for temporal ordering of events in text.
WB - Judgments indicating whether or not a waterbird image shows a duck.
WVSCM - Judgments distinguishing whether or not face images smile.
Ordinal Regression
AC2 - Judgments for website (ordinal) ratings.
HC - Graded relevance judgments for pairs of search queries and Web pages into ordinal categories.
Multiple Choice
WSD - Ternary judgments for selecting the right sense of word for the given example usage.