Efficient Hotel Rating Prediction from Reviews Using Ensemble Learning Technique

Predicting hotel ratings from reviews involves natural language processing techniques to extract sentiment and features from text data, then applying machine learning (ML) models like regression or classification to estimate the corresponding rating based on these features. This study proposes an en...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Wireless personal communications 2024-07, Vol.137 (2), p.1161-1187
Hauptverfasser:	Kumar, Mukesh, Kumar, Chhotelal, Kumar, Naween, Kavitha, S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Communications Engineering Computer Communication Networks Data mining Decision trees Embedding Engineering Ensemble learning Machine learning Natural language processing Networks Performance measurement Predictions Ratings Signal,Image and Speech Processing User experience
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Predicting hotel ratings from reviews involves natural language processing techniques to extract sentiment and features from text data, then applying machine learning (ML) models like regression or classification to estimate the corresponding rating based on these features. This study proposes an ensemble learning approach for predicting hotel ratings from user reviews. By integrating multiple ML algorithms trained on various textual features, including linguistic, semantic, and sentiment-based attributes, our model achieves superior predictive accuracy. The primary aim of this study is to predict user experience ratings with high accuracy. To achieve this, this research utilizes an Ensemble learning approach known as majority voting to make these predictions effectively. In this investigation, initially, the dataset undergoes cleaning and is subsequently subjected to a series of pre-processing steps using Natural Language Processing (NLP) techniques. The research includes a comparative analysis of various classifiers along with different embedding methods. Seven different types of classifiers are used alongside three embedding techniques i.e., Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words (BoW), and Word2Vec. The classifiers include, Logistic Regression without Cross Validation (LR), LR with Cross Validation (LRCV), Decision Tree Classifier (DTC), Stochastic Gradient Descent Classifier (SGDC), Random Forest Classifier (RFC), Support Vector Classifier (SVC) and K-Nearest Neighbour (KNN). Our proposed methodology demonstrates higher accuracy and robust performance. Accuracy and frequency are utilized as performance metrics for assessing and validating both classifiers and embedding techniques. As per the simulation results, TF-IDF in combination with LRCV achieves an accuracy rate of 61%.
ISSN:	0929-6212 1572-834X
DOI:	10.1007/s11277-024-11457-w