Chinese Text Auto-Categorization on Petro-Chemical Industrial Processes
There is a huge growth in the amount of documents of corporations in recent years. With this paper we aim to improve classification performance and to support the effective management of massive technical material in the domain-specific field. Taking the field of petro-chemical process as a case, we...
Gespeichert in:
Veröffentlicht in: | Cybernetics and information technologies : CIT 2016-12, Vol.16 (6), p.69-82 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | There is a huge growth in the amount of documents of corporations in recent years. With this paper we aim to improve classification performance and to support the effective management of massive technical material in the domain-specific field. Taking the field of petro-chemical process as a case, we study in detail the influence of parameters on classification accuracy when using Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) Text auto-classification algorithm. Advantages and disadvantages of the two text classification algorithms are presented in the field of petro-chemical processes. Our tests also show that more attention to the professional vocabulary can significantly improve the F1 value of the two algorithms. These results have reference value for the future information classification in related industry fields. |
---|---|
ISSN: | 1314-4081 1311-9702 1314-4081 |
DOI: | 10.1515/cait-2016-0078 |