DATASET GENERATION USING LARGE LANGUAGE MODELS

Disclosed are systems and techniques that may generate datasets for training task-oriented dialogue systems. The techniques include generating natural language queries by selecting a template query, sampling one or more tokens from a data store of domain-specific tokens, modifying the selected templ...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Nagaraju, Divija, Parisien, Christopher
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Disclosed are systems and techniques that may generate datasets for training task-oriented dialogue systems. The techniques include generating natural language queries by selecting a template query, sampling one or more tokens from a data store of domain-specific tokens, modifying the selected template query using the one or more sampled tokens to generate a query prompt, and using a natural language generative machine-learning model to generate, based on the query prompt, a respective natural language query of the subset of the plurality of natural language queries, and causing the generated plurality of natural language queries to be provided to a machine-learning model training engine configured to train, using the generated plurality of natural language queries, a conversational machine-learning model to perform a domain-specific conversational task.