Method of Feature Reduction in Short Text Classification Based on Feature Clustering

One decisive problem of short text classification is the serious dimensional disaster when utilizing a statistics-based approach to construct vector spaces. Here, a feature reduction method is proposed that is based on two-stage feature clustering (TSFC), which is applied to short text classificatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied sciences 2019-04, Vol.9 (8), p.1578
Hauptverfasser: Li, Fangfang, Yao, Yin, Shi, Jinjing, Mao, Xingliang, Shi, Ronghua
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:One decisive problem of short text classification is the serious dimensional disaster when utilizing a statistics-based approach to construct vector spaces. Here, a feature reduction method is proposed that is based on two-stage feature clustering (TSFC), which is applied to short text classification. Features are semi-loosely clustered by combining spectral clustering with a graph traversal algorithm. Next, intra-cluster feature screening rules are designed to remove outlier feature words, which improves the effect of similar feature clusters. We classify short texts with corresponding similar feature clusters instead of original feature words. Similar feature clusters replace feature words, and the dimension of vector space is significantly reduced. Several classifiers are utilized to evaluate the effectiveness of this method. The results show that the method largely resolves the dimensional disaster and it can significantly improve the accuracy of short text classification.
ISSN:2076-3417
2076-3417
DOI:10.3390/app9081578