Chinese Text Auto-Categorization on Petro-Chemical Industrial Processes

There is a huge growth in the amount of documents of corporations in recent years. With this paper we aim to improve classification performance and to support the effective management of massive technical material in the domain-specific field. Taking the field of petro-chemical process as a case, we...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cybernetics and information technologies : CIT 2016-12, Vol.16 (6), p.69-82
Hauptverfasser: Ni, Jing, Gao, Ge, Chen, Pengyu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:There is a huge growth in the amount of documents of corporations in recent years. With this paper we aim to improve classification performance and to support the effective management of massive technical material in the domain-specific field. Taking the field of petro-chemical process as a case, we study in detail the influence of parameters on classification accuracy when using Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) Text auto-classification algorithm. Advantages and disadvantages of the two text classification algorithms are presented in the field of petro-chemical processes. Our tests also show that more attention to the professional vocabulary can significantly improve the F1 value of the two algorithms. These results have reference value for the future information classification in related industry fields.
ISSN:1314-4081
1311-9702
1314-4081
DOI:10.1515/cait-2016-0078