A sampling based sentiment mining approach for e-commerce applications

•We propose 3 vector models by varying the feature size.•We propose an integrative approach for imbalanced datasets.•We analyze the effect of imbalance ratio in sentiment learning.•Proposed method performs more accurately than baseline models. Emerging technologies in online commerce, mobile and cus...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information processing & management 2017-01, Vol.53 (1), p.223-236
Hauptverfasser: Vinodhini, G, Chandrasekaran, RM
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We propose 3 vector models by varying the feature size.•We propose an integrative approach for imbalanced datasets.•We analyze the effect of imbalance ratio in sentiment learning.•Proposed method performs more accurately than baseline models. Emerging technologies in online commerce, mobile and customer experience have transformed the retail industry so as to enable the marketers to boost sales and the customers with the most efficient online shopping. Online reviews significantly influence the purchase decisions of buyers and marketing strategies employed by vendors in e-commerce. However, the vast amount of reviews makes it difficult for the customers to mine sentiments from online reviews. To address this problem, sentiment mining system is needed to organize the online reviews automatically into different sentiment orientation categories (e.g. positive/negative). Due to the imbalanced nature of positive and negative sentiments, the real time sentiment mining is a challenging machine learning task. The main objective of this research work is to investigate the combined effect of machine learning classifiers and sampling methods in sentiment classification under imbalanced data distributions. A modification is proposed in support vector machine based ensemble algorithm which incorporates both oversampling and undersampling to improve the prediction performance. Extensive experimental comparisons are carried out to show the effectiveness of the proposed method with several other classifiers used in terms of receiver operating characteristic curve (ROC), the area under the ROC curve and geometric mean.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2016.08.003