Efficient Hotel Rating Prediction from Reviews Using Ensemble Learning Technique
Predicting hotel ratings from reviews involves natural language processing techniques to extract sentiment and features from text data, then applying machine learning (ML) models like regression or classification to estimate the corresponding rating based on these features. This study proposes an en...
Gespeichert in:
Veröffentlicht in: | Wireless personal communications 2024-07, Vol.137 (2), p.1161-1187 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Predicting hotel ratings from reviews involves natural language processing techniques to extract sentiment and features from text data, then applying machine learning (ML) models like regression or classification to estimate the corresponding rating based on these features. This study proposes an ensemble learning approach for predicting hotel ratings from user reviews. By integrating multiple ML algorithms trained on various textual features, including linguistic, semantic, and sentiment-based attributes, our model achieves superior predictive accuracy. The primary aim of this study is to predict user experience ratings with high accuracy. To achieve this, this research utilizes an Ensemble learning approach known as majority voting to make these predictions effectively. In this investigation, initially, the dataset undergoes cleaning and is subsequently subjected to a series of pre-processing steps using Natural Language Processing (NLP) techniques. The research includes a comparative analysis of various classifiers along with different embedding methods. Seven different types of classifiers are used alongside three embedding techniques i.e., Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words (BoW), and Word2Vec. The classifiers include, Logistic Regression without Cross Validation (LR), LR with Cross Validation (LRCV), Decision Tree Classifier (DTC), Stochastic Gradient Descent Classifier (SGDC), Random Forest Classifier (RFC), Support Vector Classifier (SVC) and K-Nearest Neighbour (KNN). Our proposed methodology demonstrates higher accuracy and robust performance. Accuracy and frequency are utilized as performance metrics for assessing and validating both classifiers and embedding techniques. As per the simulation results, TF-IDF in combination with LRCV achieves an accuracy rate of 61%. |
---|---|
ISSN: | 0929-6212 1572-834X |
DOI: | 10.1007/s11277-024-11457-w |