Method for Filtering and Semi-Automatically Labeling Training Data

A method is provided for efficiently providing sentiments or other manual labels for textual training data. The method includes using an embedding model to project acquired user text to an embedding vector in an embedding space. Distances (e.g., cosine similarities) between this embedding vector and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Vohra, Quaizar, Marquez, Orlando, Parmar, Jignesh, Daruru, Srivatsava, Hasan Hashmi, Ziaul, Bechard, Patrice, Purkayastha, Shounak, Madamala, Anil, Parikh, Soham, Nguyen, Olivier, Tiwari, Mitul
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A method is provided for efficiently providing sentiments or other manual labels for textual training data. The method includes using an embedding model to project acquired user text to an embedding vector in an embedding space. Distances (e.g., cosine similarities) between this embedding vector and the embedding vectors determined for a plurality of already-label user text training examples are then determined. The already-labeled user text that has the shortest distance is determined and the label thereof is prospectively applied to the acquired user text and presented to a user for approval. The user can approve the prospectively applied label, in which case the newly acquired text is added to the training data with the prospectively applied label associated therewith for later use in training a language model. Alternatively, the user can decline the prospectively applied label and apply an alternative label to the newly acquired text.