CLASSIFIER ASSISTANCE USING DOMAIN-TRAINED EMBEDDING
A classifier may be trained with less than all datasets manually annotated with labels. A small subset of verbatims may be manually labeled with topic labels as seeds. Data augmentations can be used to acquire seed verbatim sets for known topics and to assign temporary pseudo labels to the rest of t...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A classifier may be trained with less than all datasets manually annotated with labels. A small subset of verbatims may be manually labeled with topic labels as seeds. Data augmentations can be used to acquire seed verbatim sets for known topics and to assign temporary pseudo labels to the rest of the verbatims based on their vector space proximity to the labeled seed verbatims. The training may involve classification epochs during which embeddings are updated with the assumption that the pseudo labels are ground-truth labels. The training may also involve labeling epochs during which the updated embeddings are used to update the vectors corresponding to the verbatims, and pseudo labels are updated based on updated vector coordinates in the vector space. As the training process progresses through the epochs, the embeddings will converge. |
---|