Comparative Analysis of Lexicon and Machine Learning Approach for Sentiment Analysis
Opinion mining or analysis of text are other terms for sentiment analysis. The fundamental objective is to extract meaningful information and data from unstructured text using natural language processing, statistical, and linguistics methodologies. This further is used for deriving qualitative and q...
Gespeichert in:
Veröffentlicht in: | International journal of advanced computer science & applications 2022, Vol.13 (3) |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Opinion mining or analysis of text are other terms for sentiment analysis. The fundamental objective is to extract meaningful information and data from unstructured text using natural language processing, statistical, and linguistics methodologies. This further is used for deriving qualitative and quantitative results on the scale of ‘positive’, ‘neutral’, or ‘negative to get the overall sentiment analysis. In this research, we worked with both approaches, machine learning, and an unsupervised lexicon-based algorithm for sentiment calculation and model performance. Stochastic gradient descent (SGD) is utilized in this work for optimization for support vector machine (SVM) and logistic regression. AFINN and Vader lexicon are used for the lexicon model. Both the feature TF-IDF and bag of a word are used for classification. This dataset includes "Trip advisor hotel reviews". There are around 20k reviews in the dataset. Cleaned and preprocessed data were used in our work. We conducted some training and assessment. A classifier's accuracy is measured using evaluation metrics. In TF-IDF, the Support Vector Machine is the more accurate of the two classifiers used to assess machine learning accuracy. The classification rate in Bag of Words was 95.2 percent and the accuracy in TF-IDF was 96.3 percent on the support vector machine algorithm. VADER outperforms the Lexicon model with an accuracy of 88.7%, whereas AFINN Lexicon has an accuracy of 86.0%. When comparing the Supervised and unsupervised lexicon approaches, support vector machine model outperforms with a TFIDF accuracy of 96.3 percent and a VADER lexicon accuracy of 88.7%. |
---|---|
ISSN: | 2158-107X 2156-5570 |
DOI: | 10.14569/IJACSA.2022.0130312 |