A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and Weighted Swarm Support Vector Machines
Online reviews are important information that customers seek when deciding to buy products or services. Also, organizations benefit from these reviews as essential feedback for their products or services. Such information required reliability, especially during the Covid-19 pandemic which showed a m...
Gespeichert in:
Veröffentlicht in: | IEEE access 2023, Vol.11, p.72250-72271 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Online reviews are important information that customers seek when deciding to buy products or services. Also, organizations benefit from these reviews as essential feedback for their products or services. Such information required reliability, especially during the Covid-19 pandemic which showed a massive increase in online reviews due to quarantine and sitting at home. Not only the number of reviews was boosted but also the context and preferences during the pandemic. Therefore, spam reviewers reflect on these changes and improve their deception technique. Spam reviews usually consist of misleading, fake, or fraudulent reviews that tend to deceive customers for the purpose of making money or causing harm to other competitors. Hence, this work presents a Weighted Support Vector Machine (WSVM) and Harris Hawks Optimization (HHO) for spam review detection. The HHO works as an algorithm for optimizing hyperparameters and feature weighting. Three different language corpora have been used as datasets, namely English, Spanish, and Arabic in order to solve the multilingual problem in spam reviews. Moreover, pre-trained word embedding (BERT) has been applied alongside three-word representation methods (NGram-3, TFIDF, and One-hot encoding). Four experiments have been conducted, each focused on solving and demonstrating different aspects. In all experiments, the proposed approach showed excellent results compared with other state-of-the-art algorithms. In other words, the WSVM-HHO achieved an accuracy of 88.163%, 71.913%, 89.565%, and 84.270%, for English, Spanish, Arabic, and Multilingual datasets, respectively. Further, a deep analysis has been conducted to investigate the context of reviews before and after the COVID-19 situation. In addition, it has been generated to create a new dataset with statistical features and merge its previous textual features for improving detection performance. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2023.3293641 |