Improved Feature Weight Algorithm and Its Application to Text Classification

Text preprocessing is one of the key problems in pattern recognition and plays an important role in the process of text classification. Text preprocessing has two pivotal steps: feature selection and feature weighting. The preprocessing results can directly affect the classifiers’ accuracy and perfo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematical problems in engineering 2016-01, Vol.2016 (2016), p.1-12
Hauptverfasser: Hong, Zhiguo, Shang, Wenqian, Shi, Minyong, Shang, Songtao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Text preprocessing is one of the key problems in pattern recognition and plays an important role in the process of text classification. Text preprocessing has two pivotal steps: feature selection and feature weighting. The preprocessing results can directly affect the classifiers’ accuracy and performance. Therefore, choosing the appropriate algorithm for feature selection and feature weighting to preprocess the document can greatly improve the performance of classifiers. According to the Gini Index theory, this paper proposes an Improved Gini Index algorithm. This algorithm constructs a new feature selection and feature weighting function. The experimental results show that this algorithm can improve the classifiers’ performance effectively. At the same time, this algorithm is applied to a sensitive information identification system and has achieved a good result. The algorithm’s precision and recall are higher than those of traditional ones. It can identify sensitive information on the Internet effectively.
ISSN:1024-123X
1563-5147
DOI:10.1155/2016/7819626