Detecting Low-Quality Workers in QoE Crowdtesting: A Worker Behavior-Based Approach

QoE crowdtesting is increasingly popular among researchers to conduct subjective assessments of network services. Experimenters can easily access a huge pool of human subjects through crowdsourcing platforms. Without any supervision, low-quality workers, however, can threaten the reliability of the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2017-03, Vol.19 (3), p.530-543
Hauptverfasser: Mok, Ricky K. P., Chang, Rocky K. C., Weichao Li
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:QoE crowdtesting is increasingly popular among researchers to conduct subjective assessments of network services. Experimenters can easily access a huge pool of human subjects through crowdsourcing platforms. Without any supervision, low-quality workers, however, can threaten the reliability of the assessments. One of the approaches in classifying the quality of workers is to analyze their behavior during the experiments, such as mouse cursor trajectory. However, existing works analyze the trajectory coarsely, which cannot fully extract the imbedded information. In this paper, we propose a novel method to detect low-quality workers in QoE crowdtesting by analyzing the worker behavior. Our approach is to construct a predictive model by using supervised learning algorithms. A quality score is computed by applying existing anti-cheating techniques and human inspections to label the workers. We define a set of ten worker behavior metrics, which quantifies different types of worker behavior, including finer-grained cursor trajectory analysis. A multiclass Naïve Bayes classifier is applied to train a model to predict the quality of workers from the metrics. We have conducted video QoE assessments on Amazon Mechanical Turk and CrowdFlower to collect the worker behavior. Our results show that the error rates of the model trained from four metrics are equal or less than 30%. We further find that combining the predictions from the four different 5-point Likert scale rating methods can improve the success rate in detecting low-quality workers to around 80%. Finally, our method is 16.5% and 42.9% better in precision and recall than CrowdMOS.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2016.2619901