CLASSIFIER ASSISTANCE USING DOMAIN-TRAINED EMBEDDING

A classifier may be trained with less than all datasets manually annotated with labels. A small subset of verbatims may be manually labeled with topic labels as seeds. Data augmentations can be used to acquire seed verbatim sets for known topics and to assign temporary pseudo labels to the rest of t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: HAZEN, Timothy James, CHAH, Niel, PERAUD, Soyoung, ROCHETTE, Alexandre, DESGARENNES, Gabriel Arien, KUMAR, Abhishek
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A classifier may be trained with less than all datasets manually annotated with labels. A small subset of verbatims may be manually labeled with topic labels as seeds. Data augmentations can be used to acquire seed verbatim sets for known topics and to assign temporary pseudo labels to the rest of the verbatims based on their vector space proximity to the labeled seed verbatims. The training may involve classification epochs during which embeddings are updated with the assumption that the pseudo labels are ground-truth labels. The training may also involve labeling epochs during which the updated embeddings are used to update the vectors corresponding to the verbatims, and pseudo labels are updated based on updated vector coordinates in the vector space. As the training process progresses through the epochs, the embeddings will converge.