A Generalized Method for Sentiment Analysis across Different Sources

Sentiment analysis is widely used in a variety of applications such as online opinion gathering for policy directives in government, monitoring of customers, and staff satisfactions in corporate bodies, in politics and security structures for public tension monitoring, and so on. In recent times, th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied Computational Intelligence and Soft Computing 2021-12, Vol.2021, p.1-8
1. Verfasser: Ashir, Abubakar M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Sentiment analysis is widely used in a variety of applications such as online opinion gathering for policy directives in government, monitoring of customers, and staff satisfactions in corporate bodies, in politics and security structures for public tension monitoring, and so on. In recent times, the field met with new set of challenges where new algorithms have to contend with highly unstructured sources for sentiment expressions emanating from online social media fora. In this study, a rule and lexical-based procedure is proposed together with unsupervised machine learning to implement sentiment analysis with an improved generalization ability across different sources. To deal with sources devoid of syntactic and grammatical structure, the approach incorporates a ruled-based technique for emoticon detection, word contraction expansion, noise removal, and lexicon-based text preprocessing using lexical features such as part of speech (POS), stop words, and lemmatization for local context analysis. A text is broken into number of tokens with each representing a sentence and then lexicon-dependent features are extracted from each token. The features are merged together using a combining function for a given text before being used to train a machine learning classifier. The proposed combining functions leverage on averaging and information gain concepts. Experimental results with different machine leaning classifiers indicate that improved performance with great deal of generalization capacity across both structured and nonstructured sources can be realized. The finding shows that carefully designed lexical features reinforce learning process in unsupervised learning more than using word embeddings alone as the features. Obtained experimental results from movie review dataset (recall = 74.9%, precision = 70.9%, F1-score = 72.9%, and accuracy = 72.0%) and twitter samples’ datasets (recall = 93.4%, precision = 89.5%, F1-score = 91.4%, and accuracy = 91.1%) show the efficacy of the proposed approach in comparison with other state-of-the-art research studies.
ISSN:1687-9724
1687-9732
DOI:10.1155/2021/2529984