Rapid text classification method and device

The invention provides a rapid text classification method and device based on a topic model in combination with linear discrimination. A subject model based on word bag and word frequency vector + PCA + linear discrimination + similarity calculation is combined with linear discrimination to quickly...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: XIANG RONGXIN, LIU XIN, WANG LIECHONG, LI DICHENG, HUANG WEI, ZHAO QINGQI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides a rapid text classification method and device based on a topic model in combination with linear discrimination. A subject model based on word bag and word frequency vector + PCA + linear discrimination + similarity calculation is combined with linear discrimination to quickly and accurately discover a handling department to which new appeal data belongs. According to the method, data preprocessing is mainly carried out based on obtained historical appeal data, and the method mainly comprises vacancy value cleaning and data standardization operation. Comprising the following steps: grouping data according to different handling departments to which standardized data belongs in an actual situation; performing feature word extraction on the grouped data of the handling departments by adopting jieba word segmentation; constructing a bag-of-word and word frequency vector by applying a statistical method; training the data of each department and the overall data by using a PCA method based on