Method for Filtering and Semi-Automatically Labeling Training Data
A method is provided for efficiently providing sentiments or other manual labels for textual training data. The method includes using an embedding model to project acquired user text to an embedding vector in an embedding space. Distances (e.g., cosine similarities) between this embedding vector and...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method is provided for efficiently providing sentiments or other manual labels for textual training data. The method includes using an embedding model to project acquired user text to an embedding vector in an embedding space. Distances (e.g., cosine similarities) between this embedding vector and the embedding vectors determined for a plurality of already-label user text training examples are then determined. The already-labeled user text that has the shortest distance is determined and the label thereof is prospectively applied to the acquired user text and presented to a user for approval. The user can approve the prospectively applied label, in which case the newly acquired text is added to the training data with the prospectively applied label associated therewith for later use in training a language model. Alternatively, the user can decline the prospectively applied label and apply an alternative label to the newly acquired text. |
---|