MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification
Extending the capabilities of Large Language Models (LLMs) with functions or tools for environment interaction has led to the emergence of the agent paradigm. In industry, training an LLM is not always feasible because of the scarcity of domain data, legal holds on proprietary customer data, rapidly...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Extending the capabilities of Large Language Models (LLMs) with functions or
tools for environment interaction has led to the emergence of the agent
paradigm. In industry, training an LLM is not always feasible because of the
scarcity of domain data, legal holds on proprietary customer data, rapidly
changing business requirements, and the need to prototype new assistants.
Agents provide an elegant solution to the above by relying on the zero-shot
reasoning abilities of the underlying LLM and utilizing tools to explore and
reason over customer data and respond to user requests. However, there are two
concerns here: (I) acquiring large scale customer queries for agent testing is
time-consuming, and (II) high reliance on the tool call sequence (or
trajectory) followed by the agent to respond to user queries may lead to
unexpected or incorrect behavior. To address this, we propose MAG-V, a
multi-agent framework to first generate a dataset of questions that mimic
customer queries; and second, reverse-engineer alternate questions from the
responses for trajectory verification. Initial results indicate that our
synthetic data can improve agent performance on actual customer queries.
Furthermore, our trajectory verification methodology, inspired by distant
supervision and using traditional machine learning (ML) models, outperforms a
GPT-4o judge baseline by 11% accuracy and matches the performance of a GPT-4
judge on our constructed dataset. Overall, our approach is a step towards
unifying diverse task agents into a cohesive framework for achieving an aligned
objective. |
---|---|
DOI: | 10.48550/arxiv.2412.04494 |