Corpus classification method and system

The embodiment of the invention discloses a corpus classification method and system. A method of combining template matching rough classification and fine adjustment pre-training model fine classification is adopted. According to the method, corpus annotation does not need to be conducted manually,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: NI HEQIANG, BAI ERWEI, SONG ZHI, YAO SHOUBAI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The embodiment of the invention discloses a corpus classification method and system. A method of combining template matching rough classification and fine adjustment pre-training model fine classification is adopted. According to the method, corpus annotation does not need to be conducted manually, meanwhile, the training corpus is subjected to fine classification through the continuous iterationmodel; the classification corpus with high precision can be obtained; the classification accuracy is effectively improved; meanwhile, complex manual annotation is not needed; and time and labor cost is reduced. Obtaining a coarse classification corpus according to the corpus and the keyword template; constructing a first corpus classification model according to the coarse classification corpus; and obtaining a fine classification corpus according to a preset requirement and the first corpus classification model. 本发明实施例公开了一种语料分类的方法及系统,采用了模板匹配粗分类和微调预训练模型细分类相结合的方法,不需要人工进行语料标注,同时由于不断地迭代模型对训练语料进行细分类,能得到精度较高的分类语料,有效的提高了分类的准