Generation of text classifier training data
The application relates to generation of text classifier training data. A method includes receiving an input specifying a term of interest in a document of a corpus of documents, and determining a target context embedding representing a target phrase including the term of interest and context words...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The application relates to generation of text classifier training data. A method includes receiving an input specifying a term of interest in a document of a corpus of documents, and determining a target context embedding representing a target phrase including the term of interest and context words located in the document proximate to the term of interest. The method also includes identifying, from the document corpus, a first candidate phrase that is semantically similar to the target phrase and a second candidate phrase that is semantically non-similar to the target phrase. The method further includes receiving a user input identifying at least a portion of the first candidate phrase as being associated with a first tag and identifying at least a portion of the second candidate phrase as being not associated with the first tag. The method also includes generating tagged training data to train a text classifier based on the user input.
本申请案涉及文本分类器训练数据的产生。一种方法包含:接收指定文档语料库的文档中的所关注术语的输入,及确定代表目标词组的目标上下文嵌入,所述目标词组 |
---|