Generation of text classifier training data

The application relates to generation of text classifier training data. A method includes receiving an input specifying a term of interest in a document of a corpus of documents, and determining a target context embedding representing a target phrase including the term of interest and context words...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: AMRITE JAIDEV, SKILES ERIK, MCNEILL WILLIAM
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The application relates to generation of text classifier training data. A method includes receiving an input specifying a term of interest in a document of a corpus of documents, and determining a target context embedding representing a target phrase including the term of interest and context words located in the document proximate to the term of interest. The method also includes identifying, from the document corpus, a first candidate phrase that is semantically similar to the target phrase and a second candidate phrase that is semantically non-similar to the target phrase. The method further includes receiving a user input identifying at least a portion of the first candidate phrase as being associated with a first tag and identifying at least a portion of the second candidate phrase as being not associated with the first tag. The method also includes generating tagged training data to train a text classifier based on the user input. 本申请案涉及文本分类器训练数据的产生。一种方法包含:接收指定文档语料库的文档中的所关注术语的输入,及确定代表目标词组的目标上下文嵌入,所述目标词组