Vertical industry text classification method based on corpus
The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance.
本发明公开了基于语料库的垂直行业文本分类方法,通过首先构建一个垂直行业父语料库,然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库,并对各个子语料库中的单词进行聚类,形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度,从而对垂直行业文本进行分类,本方法简单、易于实现,且效率和性能较好。 |
---|