Fig1. Unsupervised performance of consensus methods, as measured across nine binary labeled real datasets. Accuracy is plotted relative to a Majority Vote (MV) baseline. Average performance of methods across all datasets is shown at the right. On multiple choice WSD and multi-class AC2 and HC, results are reported only for DS and ZC. Fig2. Light-supervision: Results across original datasets with increasing training set size (10% - 90%) Fig3. Full-supervision: Results across original datasets with increasing training set size (10% - 90%) Fig4. Histogram shows the distribution of worker accuracies across nine of the datasets Fig5. Histogram shows examples labeled per worker