MobA: A Two-Level Agent System for Efficient Mobile Task Automation
Current mobile assistants are limited by dependence on system APIs or struggle with complex user instructions and diverse interfaces due to restricted comprehension and decision-making abilities. To address these challenges, we propose MobA, a novel Mobile phone Agent powered by multimodal large lan...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Current mobile assistants are limited by dependence on system APIs or
struggle with complex user instructions and diverse interfaces due to
restricted comprehension and decision-making abilities. To address these
challenges, we propose MobA, a novel Mobile phone Agent powered by multimodal
large language models that enhances comprehension and planning capabilities
through a sophisticated two-level agent architecture. The high-level Global
Agent (GA) is responsible for understanding user commands, tracking history
memories, and planning tasks. The low-level Local Agent (LA) predicts detailed
actions in the form of function calls, guided by sub-tasks and memory from the
GA. Integrating a Reflection Module allows for efficient task completion and
enables the system to handle previously unseen complex tasks. MobA demonstrates
significant improvements in task execution efficiency and completion rate in
real-life evaluations, underscoring the potential of MLLM-empowered mobile
assistants. |
---|---|
DOI: | 10.48550/arxiv.2410.13757 |