Preprocessing of Unstructured Data Using 2D Coiflet Wavelet-Based Optimized Back-Propagation Neural Network for Opinion Mining

Preprocessing is an important part of any opinion mining method since it prepares the text reviews for classification. In preprocessing, string matching is crucial to remove matched unnecessary text from the input data. The majority of contemporary string matching algorithms use the character compar...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Arabian journal for science and engineering (2011) 2023-02, Vol.48 (2), p.2523-2537
Hauptverfasser: Zakir, H. Mohamed, Jinny, S. Vinila
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Preprocessing is an important part of any opinion mining method since it prepares the text reviews for classification. In preprocessing, string matching is crucial to remove matched unnecessary text from the input data. The majority of contemporary string matching algorithms use the character comparison method, which analyzes each character independently and takes more time. Furthermore, establishing the degree of similarity between the sub-string and pattern text is challenging in approximate matching algorithms that expect a full match. We propose an algorithm, namely ‘ review preprocessing using Coiflet wavelet back-propagation neural network (RPP-COIF-BPN), ’ for effective review preprocessing in order to address these challenges and limitations. This RPP-COIF-BPN algorithm uses a combination of a neural network and an exact string matching technique to filter irrelevant information from the input reviews. The proposed method is driven by the Coiflet wavelet; specifically, the 2D Coiflet process is performed in both directions to provide more energetic stop-word features, which increases string matching accuracy. The exact match comparison is performed only for the words matched in the BPN network, as opposed to traditional exact pattern matching, resulting in a considerable reduction in pattern matching time. The proposed method achieves 97.53% accuracy, and it consumes significantly less time of 2.08  s when compared with other string matching algorithms. The results indicated that the proposed RPP-COIF-BPN -based string matching performance is effective for preprocessing e-commerce reviews for opinion mining in a short amount of time with high testing accuracies.
ISSN:2193-567X
1319-8025
2191-4281
DOI:10.1007/s13369-022-07285-4