RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents
Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Owing to recent advancements, Large Language Models (LLMs) can now be
deployed as agents for increasingly complex decision-making applications in
areas including robotics, gaming, and API integration. However, reflecting past
experiences in current decision-making processes, an innate human behavior,
continues to pose significant challenges. Addressing this, we propose
Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage
past experiences corresponding to the current situation and context, thereby
enhancing agents' planning capabilities. RAP distinguishes itself by being
versatile: it excels in both text-only and multimodal environments, making it
suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's
effectiveness, where it achieves SOTA performance in textual scenarios and
notably enhances multimodal LLM agents' performance for embodied tasks. These
results highlight RAP's potential in advancing the functionality and
applicability of LLM agents in complex, real-world applications. |
---|---|
DOI: | 10.48550/arxiv.2402.03610 |