Identifying ground truth in opinion spam: an empirical survey based on review psychology

Because it is very harmful, opinion spam, especially that involving untruthful reviews, has attracted much attention in the last decade. However, the lack of annotations, i.e., the ground truth problem, still serves as the key challenge. It is difficult because spammers always deliberately forge the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2020-11, Vol.50 (11), p.3554-3569
Hauptverfasser:	Li, Jiandun, Wang, Xiaogang, Yang, Liu, Zhang, Pengpeng, Yang, Dingyu
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Computer Science Machines Manufacturing Mechanical Engineering Processes
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Because it is very harmful, opinion spam, especially that involving untruthful reviews, has attracted much attention in the last decade. However, the lack of annotations, i.e., the ground truth problem, still serves as the key challenge. It is difficult because spammers always deliberately forge their reviews, which cannot be distinguished even by field experts. Considering the obvious intention of spammers, i.e., to promote or demote an items reputation, the opportunity exists to label them by considering crowd psychology. To date, several studies have applied, verified, and presented helpful evidence, including prior, empirical, heuristic, and simulative pseudo truths. In this paper, after investigating both authentic and deceptive reviewers’ diverse motives, we survey state-of-the-art truth by considering two classical roles, e.g., crowdsourcing and expert spammers. For each role, several topics related to spam attacks either with or without disguising and possible outliers are highlighted. Comparison analyses led to some interesting conclusions: 1) data on professional spammers are more challenging to collect and less reliable than data on crowdsourcing spammers; 2) most linguistic evidences are less reliable than behavioral footprints; 3) abnormal activities are as trustworthy as spamming objectives, while they hardly need any extra support, such as the user profile; and 4) the top reliable facts requiring acceptable effort are deviation , burstiness , grouped spamming , deviation over the threshold , review distribution , opinion proportion and spam cost . Moreover, we introduce several promising directions for future research. In general, this survey may shed light on new angles that can be used to understand review spam and to improve the performance of any anti-spam platforms.
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-020-01764-7