Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks

Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI agents must effectively plan, perform multi-step reasoning and actions, respond to novel observations,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-11
Hauptverfasser: Fourney, Adam, Bansal, Gagan, Mozannar, Hussein, Tan, Cheng, Salinas, Eduardo, Erkang, Zhu, Niedtner, Friederike, Proebsting, Grace, Griffin Bassman, Gerrits, Jack, Alber, Jacob, Chang, Peter, Loynd, Ricky, West, Robert, Dibia, Victor, Awadallah, Ahmed, Kamar, Ece, Hosn, Rafah, Amershi, Saleema
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Fourney, Adam
Bansal, Gagan
Mozannar, Hussein
Tan, Cheng
Salinas, Eduardo
Erkang
Zhu
Niedtner, Friederike
Proebsting, Grace
Griffin Bassman
Gerrits, Jack
Alber, Jacob
Chang, Peter
Loynd, Ricky
West, Robert
Dibia, Victor
Awadallah, Ahmed
Kamar, Ece
Hosn, Rafah
Amershi, Saleema
description Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI agents must effectively plan, perform multi-step reasoning and actions, respond to novel observations, and recover from errors, to successfully complete complex tasks across a wide range of scenarios. In this work, we introduce Magentic-One, a high-performing open-source agentic system for solving such tasks. Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator, plans, tracks progress, and re-plans to recover from errors. Throughout task execution, the Orchestrator directs other specialized agents to perform tasks as needed, such as operating a web browser, navigating local files, or writing and executing Python code. We show that Magentic-One achieves statistically competitive performance to the state-of-the-art on three diverse and challenging agentic benchmarks: GAIA, AssistantBench, and WebArena. Magentic-One achieves these results without modification to core agent capabilities or to how they collaborate, demonstrating progress towards generalist agentic systems. Moreover, Magentic-One's modular design allows agents to be added or removed from the team without additional prompt tuning or training, easing development and making it extensible to future scenarios. We provide an open-source implementation of Magentic-One, and we include AutoGenBench, a standalone tool for agentic evaluation. AutoGenBench provides built-in controls for repetition and isolation to run agentic benchmarks in a rigorous and contained manner -- which is important when agents' actions have side-effects. Magentic-One, AutoGenBench and detailed empirical performance evaluations of Magentic-One, including ablations and error analysis are available at https://aka.ms/magentic-one
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3126160123</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3126160123</sourcerecordid><originalsourceid>FETCH-proquest_journals_31261601233</originalsourceid><addsrcrecordid>eNqNjk8LgjAcQEcQJOV3GHQe7E9adBMpu0gHvcuIKbO52X4z6ttn0Afo9A7vHd4CRVwIRg47zlcoBugppTzd8yQRESpK2Skb9I1crTriDBfKKi-NhoDLyQRNsq_H1RuCGnDrPK6ceWrb4dwNo1EvXEu4wwYtW2lAxT-u0fZ8qvMLGb17TApC07vJ21k1gvGUpZTNV_9VH4SoOrE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3126160123</pqid></control><display><type>article</type><title>Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks</title><source>Free E- Journals</source><creator>Fourney, Adam ; Bansal, Gagan ; Mozannar, Hussein ; Tan, Cheng ; Salinas, Eduardo ; Erkang ; Zhu ; Niedtner, Friederike ; Proebsting, Grace ; Griffin Bassman ; Gerrits, Jack ; Alber, Jacob ; Chang, Peter ; Loynd, Ricky ; West, Robert ; Dibia, Victor ; Awadallah, Ahmed ; Kamar, Ece ; Hosn, Rafah ; Amershi, Saleema</creator><creatorcontrib>Fourney, Adam ; Bansal, Gagan ; Mozannar, Hussein ; Tan, Cheng ; Salinas, Eduardo ; Erkang ; Zhu ; Niedtner, Friederike ; Proebsting, Grace ; Griffin Bassman ; Gerrits, Jack ; Alber, Jacob ; Chang, Peter ; Loynd, Ricky ; West, Robert ; Dibia, Victor ; Awadallah, Ahmed ; Kamar, Ece ; Hosn, Rafah ; Amershi, Saleema</creatorcontrib><description>Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI agents must effectively plan, perform multi-step reasoning and actions, respond to novel observations, and recover from errors, to successfully complete complex tasks across a wide range of scenarios. In this work, we introduce Magentic-One, a high-performing open-source agentic system for solving such tasks. Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator, plans, tracks progress, and re-plans to recover from errors. Throughout task execution, the Orchestrator directs other specialized agents to perform tasks as needed, such as operating a web browser, navigating local files, or writing and executing Python code. We show that Magentic-One achieves statistically competitive performance to the state-of-the-art on three diverse and challenging agentic benchmarks: GAIA, AssistantBench, and WebArena. Magentic-One achieves these results without modification to core agent capabilities or to how they collaborate, demonstrating progress towards generalist agentic systems. Moreover, Magentic-One's modular design allows agents to be added or removed from the team without additional prompt tuning or training, easing development and making it extensible to future scenarios. We provide an open-source implementation of Magentic-One, and we include AutoGenBench, a standalone tool for agentic evaluation. AutoGenBench provides built-in controls for repetition and isolation to run agentic benchmarks in a rigorous and contained manner -- which is important when agents' actions have side-effects. Magentic-One, AutoGenBench and detailed empirical performance evaluations of Magentic-One, including ablations and error analysis are available at https://aka.ms/magentic-one</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Ablation ; Benchmarks ; Error analysis ; Modular design ; Multiagent systems ; Open source software ; Performance evaluation ; Task complexity</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Fourney, Adam</creatorcontrib><creatorcontrib>Bansal, Gagan</creatorcontrib><creatorcontrib>Mozannar, Hussein</creatorcontrib><creatorcontrib>Tan, Cheng</creatorcontrib><creatorcontrib>Salinas, Eduardo</creatorcontrib><creatorcontrib>Erkang</creatorcontrib><creatorcontrib>Zhu</creatorcontrib><creatorcontrib>Niedtner, Friederike</creatorcontrib><creatorcontrib>Proebsting, Grace</creatorcontrib><creatorcontrib>Griffin Bassman</creatorcontrib><creatorcontrib>Gerrits, Jack</creatorcontrib><creatorcontrib>Alber, Jacob</creatorcontrib><creatorcontrib>Chang, Peter</creatorcontrib><creatorcontrib>Loynd, Ricky</creatorcontrib><creatorcontrib>West, Robert</creatorcontrib><creatorcontrib>Dibia, Victor</creatorcontrib><creatorcontrib>Awadallah, Ahmed</creatorcontrib><creatorcontrib>Kamar, Ece</creatorcontrib><creatorcontrib>Hosn, Rafah</creatorcontrib><creatorcontrib>Amershi, Saleema</creatorcontrib><title>Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks</title><title>arXiv.org</title><description>Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI agents must effectively plan, perform multi-step reasoning and actions, respond to novel observations, and recover from errors, to successfully complete complex tasks across a wide range of scenarios. In this work, we introduce Magentic-One, a high-performing open-source agentic system for solving such tasks. Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator, plans, tracks progress, and re-plans to recover from errors. Throughout task execution, the Orchestrator directs other specialized agents to perform tasks as needed, such as operating a web browser, navigating local files, or writing and executing Python code. We show that Magentic-One achieves statistically competitive performance to the state-of-the-art on three diverse and challenging agentic benchmarks: GAIA, AssistantBench, and WebArena. Magentic-One achieves these results without modification to core agent capabilities or to how they collaborate, demonstrating progress towards generalist agentic systems. Moreover, Magentic-One's modular design allows agents to be added or removed from the team without additional prompt tuning or training, easing development and making it extensible to future scenarios. We provide an open-source implementation of Magentic-One, and we include AutoGenBench, a standalone tool for agentic evaluation. AutoGenBench provides built-in controls for repetition and isolation to run agentic benchmarks in a rigorous and contained manner -- which is important when agents' actions have side-effects. Magentic-One, AutoGenBench and detailed empirical performance evaluations of Magentic-One, including ablations and error analysis are available at https://aka.ms/magentic-one</description><subject>Ablation</subject><subject>Benchmarks</subject><subject>Error analysis</subject><subject>Modular design</subject><subject>Multiagent systems</subject><subject>Open source software</subject><subject>Performance evaluation</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjk8LgjAcQEcQJOV3GHQe7E9adBMpu0gHvcuIKbO52X4z6ttn0Afo9A7vHd4CRVwIRg47zlcoBugppTzd8yQRESpK2Skb9I1crTriDBfKKi-NhoDLyQRNsq_H1RuCGnDrPK6ceWrb4dwNo1EvXEu4wwYtW2lAxT-u0fZ8qvMLGb17TApC07vJ21k1gvGUpZTNV_9VH4SoOrE</recordid><startdate>20241107</startdate><enddate>20241107</enddate><creator>Fourney, Adam</creator><creator>Bansal, Gagan</creator><creator>Mozannar, Hussein</creator><creator>Tan, Cheng</creator><creator>Salinas, Eduardo</creator><creator>Erkang</creator><creator>Zhu</creator><creator>Niedtner, Friederike</creator><creator>Proebsting, Grace</creator><creator>Griffin Bassman</creator><creator>Gerrits, Jack</creator><creator>Alber, Jacob</creator><creator>Chang, Peter</creator><creator>Loynd, Ricky</creator><creator>West, Robert</creator><creator>Dibia, Victor</creator><creator>Awadallah, Ahmed</creator><creator>Kamar, Ece</creator><creator>Hosn, Rafah</creator><creator>Amershi, Saleema</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241107</creationdate><title>Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks</title><author>Fourney, Adam ; Bansal, Gagan ; Mozannar, Hussein ; Tan, Cheng ; Salinas, Eduardo ; Erkang ; Zhu ; Niedtner, Friederike ; Proebsting, Grace ; Griffin Bassman ; Gerrits, Jack ; Alber, Jacob ; Chang, Peter ; Loynd, Ricky ; West, Robert ; Dibia, Victor ; Awadallah, Ahmed ; Kamar, Ece ; Hosn, Rafah ; Amershi, Saleema</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31261601233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Ablation</topic><topic>Benchmarks</topic><topic>Error analysis</topic><topic>Modular design</topic><topic>Multiagent systems</topic><topic>Open source software</topic><topic>Performance evaluation</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Fourney, Adam</creatorcontrib><creatorcontrib>Bansal, Gagan</creatorcontrib><creatorcontrib>Mozannar, Hussein</creatorcontrib><creatorcontrib>Tan, Cheng</creatorcontrib><creatorcontrib>Salinas, Eduardo</creatorcontrib><creatorcontrib>Erkang</creatorcontrib><creatorcontrib>Zhu</creatorcontrib><creatorcontrib>Niedtner, Friederike</creatorcontrib><creatorcontrib>Proebsting, Grace</creatorcontrib><creatorcontrib>Griffin Bassman</creatorcontrib><creatorcontrib>Gerrits, Jack</creatorcontrib><creatorcontrib>Alber, Jacob</creatorcontrib><creatorcontrib>Chang, Peter</creatorcontrib><creatorcontrib>Loynd, Ricky</creatorcontrib><creatorcontrib>West, Robert</creatorcontrib><creatorcontrib>Dibia, Victor</creatorcontrib><creatorcontrib>Awadallah, Ahmed</creatorcontrib><creatorcontrib>Kamar, Ece</creatorcontrib><creatorcontrib>Hosn, Rafah</creatorcontrib><creatorcontrib>Amershi, Saleema</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fourney, Adam</au><au>Bansal, Gagan</au><au>Mozannar, Hussein</au><au>Tan, Cheng</au><au>Salinas, Eduardo</au><au>Erkang</au><au>Zhu</au><au>Niedtner, Friederike</au><au>Proebsting, Grace</au><au>Griffin Bassman</au><au>Gerrits, Jack</au><au>Alber, Jacob</au><au>Chang, Peter</au><au>Loynd, Ricky</au><au>West, Robert</au><au>Dibia, Victor</au><au>Awadallah, Ahmed</au><au>Kamar, Ece</au><au>Hosn, Rafah</au><au>Amershi, Saleema</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks</atitle><jtitle>arXiv.org</jtitle><date>2024-11-07</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI agents must effectively plan, perform multi-step reasoning and actions, respond to novel observations, and recover from errors, to successfully complete complex tasks across a wide range of scenarios. In this work, we introduce Magentic-One, a high-performing open-source agentic system for solving such tasks. Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator, plans, tracks progress, and re-plans to recover from errors. Throughout task execution, the Orchestrator directs other specialized agents to perform tasks as needed, such as operating a web browser, navigating local files, or writing and executing Python code. We show that Magentic-One achieves statistically competitive performance to the state-of-the-art on three diverse and challenging agentic benchmarks: GAIA, AssistantBench, and WebArena. Magentic-One achieves these results without modification to core agent capabilities or to how they collaborate, demonstrating progress towards generalist agentic systems. Moreover, Magentic-One's modular design allows agents to be added or removed from the team without additional prompt tuning or training, easing development and making it extensible to future scenarios. We provide an open-source implementation of Magentic-One, and we include AutoGenBench, a standalone tool for agentic evaluation. AutoGenBench provides built-in controls for repetition and isolation to run agentic benchmarks in a rigorous and contained manner -- which is important when agents' actions have side-effects. Magentic-One, AutoGenBench and detailed empirical performance evaluations of Magentic-One, including ablations and error analysis are available at https://aka.ms/magentic-one</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_3126160123
source Free E- Journals
subjects Ablation
Benchmarks
Error analysis
Modular design
Multiagent systems
Open source software
Performance evaluation
Task complexity
title Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T03%3A19%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Magentic-One:%20A%20Generalist%20Multi-Agent%20System%20for%20Solving%20Complex%20Tasks&rft.jtitle=arXiv.org&rft.au=Fourney,%20Adam&rft.date=2024-11-07&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3126160123%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3126160123&rft_id=info:pmid/&rfr_iscdi=true