Weak supervision text classification method in combination with relative position information

The invention relates to a weak supervision text classification method in combination with relative position information, which belongs to the field of natural language processing, and comprises the following steps: S1, inputting an initialization seed word and a marked document similar to the initi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: HU LIUHUI, LIU JU, GAN LING, YI AIJUN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to a weak supervision text classification method in combination with relative position information, which belongs to the field of natural language processing, and comprises the following steps: S1, inputting an initialization seed word and a marked document similar to the initialization seed word; s2, generating a pseudo tag; s3, training a Transform text classifier on the basis of the generated pseudo tag; s4, labels are distributed to the unmarked texts through the text classifier; and S5, updating the seed words of each category through a comparison sorting method, and returning to the step S2 for iterative training. According to the invention, the learning ability of the model is improved, and the classification accuracy is improved. 本发明涉及一种结合相对位置信息的弱监督文本分类方法,属于自然语言处理领域,包括以下步骤:S1:输入初始化种子词,以及与初始化种子词同类的为标记文档;S2:生成伪标签;S3:基于生成的伪标签训练Transformer文本分类器;S4:通过文本分类器为未标记的文本分配标签;S5:通过比较排序方法,更新每一个类别的种子词,返回步骤S2进行迭代训练。本发明提升了模型的学习能力,提高了分类的准确率。