An Interactive Agent Foundation Model
The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training para...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The development of artificial intelligence systems is transitioning from
creating static, task-specific models to dynamic, agent-based systems capable
of performing well in a wide range of applications. We propose an Interactive
Agent Foundation Model that uses a novel multi-task agent training paradigm for
training AI agents across a wide range of domains, datasets, and tasks. Our
training paradigm unifies diverse pre-training strategies, including visual
masked auto-encoders, language modeling, and next-action prediction, enabling a
versatile and adaptable AI framework. We demonstrate the performance of our
framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
Our model demonstrates its ability to generate meaningful and contextually
relevant outputs in each area. The strength of our approach lies in its
generality, leveraging a variety of data sources such as robotics sequences,
gameplay data, large-scale video datasets, and textual information for
effective multimodal and multi-task learning. Our approach provides a promising
avenue for developing generalist, action-taking, multimodal systems. |
---|---|
DOI: | 10.48550/arxiv.2402.05929 |