Leveraging Pattern Recognition Consistency Estimation for Crowdsourcing Data Analysis

Crowdsourcing is an effective method for analyzing large scientific databases. However, data annotation relies on untrained volunteers, making it difficult to control the quality of the annotation. Here, we propose a method to estimate the consistency of the annotations of human classifiers in citiz...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on human-machine systems 2016-06, Vol.46 (3), p.474-480
Hauptverfasser:	Shamir, Lior, Diamond, Derek, Wallin, John
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Annotations Big data citizen science Classifiers Consistency Crowdsourcing Data analysis Feature extraction Human Manuals Quality Samples Scientists Spirals Statistical analysis Statistical methods Training Transforms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Crowdsourcing is an effective method for analyzing large scientific databases. However, data annotation relies on untrained volunteers, making it difficult to control the quality of the annotation. Here, we propose a method to estimate the consistency of the annotations of human classifiers in citizen science projects. Since the performance of supervised machine learning systems decreases as the level of noise in the data increases, the method is able to rank human annotators by the consistency with which they annotate. Because the method uses the accuracy of an automatic classifier trained with these samples, it does not require ground truth or data annotated by other citizen scientists. The method allows reducing the number of annotations required for each sample by identifying the most efficient data annotators, as well as improving the overall quality of the data by giving higher weights to the classifications of the more consistent data annotators. The proposed method can also be used for improving the citizen science user experience by providing feedback in real time. Experimental results using a large citizen science project-Galaxy Zoo-and a subset of over 1.1 × 10 6 data annotations made by 4000 citizen scientists show Pearson correlation of 0.966 between the quality estimation provided by the method and the actual performance of the data annotators. The method also demonstrated efficacy in improving the performance of offline statistical consensus methods.
ISSN:	2168-2291 2168-2305
DOI:	10.1109/THMS.2015.2463082