Optimizing Sentiment Classification for Arabic Opinion Texts

Meanwhile, products and services reviews’ provide a guide for potential customers allowing them to reach real knowledge about such products/services while making decisions. Sentiment classification is the task of analyzing opinions expressed in textual reviews automatically. The efficiency of this t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cognitive computation 2021, Vol.13 (1), p.164-178
Hauptverfasser: Saeed, Radwa M. K., Rady, Sherine, Gharib, Tarek F.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Meanwhile, products and services reviews’ provide a guide for potential customers allowing them to reach real knowledge about such products/services while making decisions. Sentiment classification is the task of analyzing opinions expressed in textual reviews automatically. The efficiency of this task is influenced by the set of representative features extracted from the reviews. Nevertheless, the value of extracted features lies as well in those that highly contribute to the classification process. Here comes the role of dimensionality reduction to eliminate the noise and reduce the feature high space while preserving required accuracies. The Arabic language and its datasets have inherent challenges. Besides, most sentiment classification studies integrating dimensionality reduction have focused on English texts, with only few studies conducted for other languages including Arabic. Massive amounts of Arabic data have been generated due to the huge population of the Arab world, and despite that, the aforementioned technical gaps are still existing for such language. This paper proposes a supervised learning approach for Arabic reviews sentiment classification. This approach utilizes optimized compact features that depend on a well representative feature set coupled with feature reduction techniques, which manages to guarantee high accuracy and time/space savings simultaneously. The employed feature set includes a triple combination of N -gram features and positive/negative N -grams counts features obtained after considering negation handling. The proposed approach examines two different linear transformation methods; principal component analysis (PCA) as an unsupervised transformation method and latent Dirichlet allocation (LDA) as a supervised transformation method. A spam detection process is executed prior to the learning for the purpose of increasing the classifier robustness. The proposed approach has been experimented with five Arabic opinion text datasets, of different domains and varying sizes (1.6 up to 94 K reviews). Experiments have been conducted for two-class (positive/negative sentiments) and three-class (positive/negative/neutral sentiments) classification problems. Accuracy values have been recorded in the range of 95.5–99.8% for the two-class classification problem and 92–97.3% for the three-class classification problem. The LDA feature reduction outperformed PCA by an average of 4.34% and 3.52% in accuracy and F1 Score measures, respecti
ISSN:1866-9956
1866-9964
DOI:10.1007/s12559-020-09771-z