GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning

Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2022-01
Hauptverfasser: Ruan, Jingqing, Du, Yali, Xiong, Xuantang, Xing, Dengpeng, Li, Xiyun, Meng, Linghui, Zhang, Haifeng, Wang, Jun, Xu, Bo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Ruan, Jingqing
Du, Yali
Xiong, Xuantang
Xing, Dengpeng
Li, Xiyun
Meng, Linghui
Zhang, Haifeng
Wang, Jun
Xu, Bo
description Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2621110676</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2621110676</sourcerecordid><originalsourceid>FETCH-proquest_journals_26211106763</originalsourceid><addsrcrecordid>eNqNir0KwjAYRYMgWLTvEHAO5Mem4iZF62AX616i_VpTalKTdPDtreADOF3OOXeGIi4EI9sN5wsUe99RSrlMeZKICBV5Vu5w7tTwIDflocaZta7WRgVtDS6DUwHaN26sw8XYB032LZiAL6DN5O7w_NIZlDPatCs0b1TvIf7tEq2Ph2t2IoOzrxF8qDo7OjOlikvOGKMyleK_1wc3Lz1S</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2621110676</pqid></control><display><type>article</type><title>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</title><source>Free E- Journals</source><creator>Ruan, Jingqing ; Du, Yali ; Xiong, Xuantang ; Xing, Dengpeng ; Li, Xiyun ; Meng, Linghui ; Zhang, Haifeng ; Wang, Jun ; Xu, Bo</creator><creatorcontrib>Ruan, Jingqing ; Du, Yali ; Xiong, Xuantang ; Xing, Dengpeng ; Li, Xiyun ; Meng, Linghui ; Zhang, Haifeng ; Wang, Jun ; Xu, Bo</creatorcontrib><description>Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Coders ; Coordination ; Decentralized control ; Encoders-Decoders ; Graph theory ; Multiagent systems ; Optimization ; Policies</subject><ispartof>arXiv.org, 2022-01</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Ruan, Jingqing</creatorcontrib><creatorcontrib>Du, Yali</creatorcontrib><creatorcontrib>Xiong, Xuantang</creatorcontrib><creatorcontrib>Xing, Dengpeng</creatorcontrib><creatorcontrib>Li, Xiyun</creatorcontrib><creatorcontrib>Meng, Linghui</creatorcontrib><creatorcontrib>Zhang, Haifeng</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Bo</creatorcontrib><title>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</title><title>arXiv.org</title><description>Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.</description><subject>Coders</subject><subject>Coordination</subject><subject>Decentralized control</subject><subject>Encoders-Decoders</subject><subject>Graph theory</subject><subject>Multiagent systems</subject><subject>Optimization</subject><subject>Policies</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNir0KwjAYRYMgWLTvEHAO5Mem4iZF62AX616i_VpTalKTdPDtreADOF3OOXeGIi4EI9sN5wsUe99RSrlMeZKICBV5Vu5w7tTwIDflocaZta7WRgVtDS6DUwHaN26sw8XYB032LZiAL6DN5O7w_NIZlDPatCs0b1TvIf7tEq2Ph2t2IoOzrxF8qDo7OjOlikvOGKMyleK_1wc3Lz1S</recordid><startdate>20220117</startdate><enddate>20220117</enddate><creator>Ruan, Jingqing</creator><creator>Du, Yali</creator><creator>Xiong, Xuantang</creator><creator>Xing, Dengpeng</creator><creator>Li, Xiyun</creator><creator>Meng, Linghui</creator><creator>Zhang, Haifeng</creator><creator>Wang, Jun</creator><creator>Xu, Bo</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220117</creationdate><title>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</title><author>Ruan, Jingqing ; Du, Yali ; Xiong, Xuantang ; Xing, Dengpeng ; Li, Xiyun ; Meng, Linghui ; Zhang, Haifeng ; Wang, Jun ; Xu, Bo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26211106763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Coders</topic><topic>Coordination</topic><topic>Decentralized control</topic><topic>Encoders-Decoders</topic><topic>Graph theory</topic><topic>Multiagent systems</topic><topic>Optimization</topic><topic>Policies</topic><toplevel>online_resources</toplevel><creatorcontrib>Ruan, Jingqing</creatorcontrib><creatorcontrib>Du, Yali</creatorcontrib><creatorcontrib>Xiong, Xuantang</creatorcontrib><creatorcontrib>Xing, Dengpeng</creatorcontrib><creatorcontrib>Li, Xiyun</creatorcontrib><creatorcontrib>Meng, Linghui</creatorcontrib><creatorcontrib>Zhang, Haifeng</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Bo</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ruan, Jingqing</au><au>Du, Yali</au><au>Xiong, Xuantang</au><au>Xing, Dengpeng</au><au>Li, Xiyun</au><au>Meng, Linghui</au><au>Zhang, Haifeng</au><au>Wang, Jun</au><au>Xu, Bo</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</atitle><jtitle>arXiv.org</jtitle><date>2022-01-17</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2022-01
issn 2331-8422
language eng
recordid cdi_proquest_journals_2621110676
source Free E- Journals
subjects Coders
Coordination
Decentralized control
Encoders-Decoders
Graph theory
Multiagent systems
Optimization
Policies
title GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T09%3A08%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=GCS:%20Graph-based%20Coordination%20Strategy%20for%20Multi-Agent%20Reinforcement%20Learning&rft.jtitle=arXiv.org&rft.au=Ruan,%20Jingqing&rft.date=2022-01-17&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2621110676%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2621110676&rft_id=info:pmid/&rfr_iscdi=true