A new online field feature selection algorithm based on streaming data

The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of ambient intelligence and humanized computing 2024-02, Vol.15 (2), p.1365-1377
Hauptverfasser: Zhang, Zhenjiang, Song, Fuxing, Zhang, Peng, Chao, Han-Chieh, Zhao, Yingsi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency.
ISSN:1868-5137
1868-5145
DOI:10.1007/s12652-018-0959-0