Parallel and Streaming Truth Discovery in Large-Scale Quantitative Crowdsourcing

To enable reliable crowdsourcing applications, it is of great importance to develop algorithms that can automatically discover the truths from possibly noisy and conflicting claims provided by various information sources. In order to handle crowdsourcing applications involving big or streaming data,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2016-10, Vol.27 (10), p.2984-2997
Hauptverfasser:	Ouyang, Robin Wentao, Kaplan, Lance M., Toniolo, Alice, Srivastava, Mani, Norman, Timothy J.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithm design and analysis Algorithms Batch processing big data Crowdsourcing Datasets Electronic mail Encoding Estimation Inference algorithms parallel algorithm Parallel algorithms quantitative task streaming algorithm truth discovery
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	To enable reliable crowdsourcing applications, it is of great importance to develop algorithms that can automatically discover the truths from possibly noisy and conflicting claims provided by various information sources. In order to handle crowdsourcing applications involving big or streaming data, a desirable truth discovery algorithm should not only be effective, but also be scalable. However, with respect to quantitative crowdsourcing applications such as object counting and percentage annotation, existing truth discovery algorithms are not simultaneously effective and scalable. They either address truth discovery in categorical crowdsourcing or perform batch processing that does not scale. In this paper, we propose new parallel and streaming truth discovery algorithms for quantitative crowdsourcing applications. Through extensive experiments on real-world and synthetic datasets, we demonstrate that 1) both of them are quite effective, 2) the parallel algorithm can efficiently perform truth discovery on large datasets, and 3) the streaming algorithm processes data incrementally, and it can efficiently perform truth discovery both on large datasets and in data streams.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2016.2515092