An investigation of solutions for handling incomplete online review datasets with missing values

Online review helpfulness prediction is an important research issue in electronic commerce and data mining. However, the collected datasets used for the analysis and prediction of the helpfulness of online reviews often contain some missing attribute values, such as reviewer background and rating in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of experimental & theoretical artificial intelligence 2022-11, Vol.34 (6), p.971-987
Hauptverfasser: Hu, Ya-Han, Tsai, Chih-Fong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Online review helpfulness prediction is an important research issue in electronic commerce and data mining. However, the collected datasets used for the analysis and prediction of the helpfulness of online reviews often contain some missing attribute values, such as reviewer background and rating information. In related literatures, many studies have either used the case deletion approach to remove the data containing missing values or considered the imputation of missing values by the mean/mode method. However, none of them consider the direct handling approach without missing value imputation for online review datasets by decision tree-related techniques. Therefore, in this paper, we investigate the suitability of different types of approaches to solve the incomplete dataset problem of online reviews. Specifically, for missing value imputation, several supervised learning techniques including MICE, KNN, SVM, and CART are examined. Moreover, for the direct handling approach without missing value imputation, CART is also performed for this task. The experimental results based on the TripAdvisor dataset for review helpfulness prediction show that the approach where incomplete online review datasets are handled directly without imputation by CART significantly outperforms the other approaches, including case deletion and missing value imputation approaches.
ISSN:0952-813X
1362-3079
DOI:10.1080/0952813X.2021.1948920