GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning

Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complic...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-01
Hauptverfasser:	Ruan, Jingqing, Du, Yali, Xiong, Xuantang, Xing, Dengpeng, Li, Xiyun, Meng, Linghui, Zhang, Haifeng, Wang, Jun, Xu, Bo
Format:	Artikel
Sprache:	eng
Schlagworte:	Coders Coordination Decentralized control Encoders-Decoders Graph theory Multiagent systems Optimization Policies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Ruan, Jingqing Du, Yali Xiong, Xuantang Xing, Dengpeng Li, Xiyun Meng, Linghui Zhang, Haifeng Wang, Jun Xu, Bo
description	Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2621110676</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2621110676</sourcerecordid><originalsourceid>FETCH-proquest_journals_26211106763</originalsourceid><addsrcrecordid>eNqNir0KwjAYRYMgWLTvEHAO5Mem4iZF62AX616i_VpTalKTdPDtreADOF3OOXeGIi4EI9sN5wsUe99RSrlMeZKICBV5Vu5w7tTwIDflocaZta7WRgVtDS6DUwHaN26sw8XYB032LZiAL6DN5O7w_NIZlDPatCs0b1TvIf7tEq2Ph2t2IoOzrxF8qDo7OjOlikvOGKMyleK_1wc3Lz1S</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2621110676</pqid></control><display><type>article</type><title>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</title><source>Free E- Journals</source><creator>Ruan, Jingqing ; Du, Yali ; Xiong, Xuantang ; Xing, Dengpeng ; Li, Xiyun ; Meng, Linghui ; Zhang, Haifeng ; Wang, Jun ; Xu, Bo</creator><creatorcontrib>Ruan, Jingqing ; Du, Yali ; Xiong, Xuantang ; Xing, Dengpeng ; Li, Xiyun ; Meng, Linghui ; Zhang, Haifeng ; Wang, Jun ; Xu, Bo</creatorcontrib><description>Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Coders ; Coordination ; Decentralized control ; Encoders-Decoders ; Graph theory ; Multiagent systems ; Optimization ; Policies</subject><ispartof>arXiv.org, 2022-01</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Ruan, Jingqing</creatorcontrib><creatorcontrib>Du, Yali</creatorcontrib><creatorcontrib>Xiong, Xuantang</creatorcontrib><creatorcontrib>Xing, Dengpeng</creatorcontrib><creatorcontrib>Li, Xiyun</creatorcontrib><creatorcontrib>Meng, Linghui</creatorcontrib><creatorcontrib>Zhang, Haifeng</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Bo</creatorcontrib><title>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</title><title>arXiv.org</title><description>Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.</description><subject>Coders</subject><subject>Coordination</subject><subject>Decentralized control</subject><subject>Encoders-Decoders</subject><subject>Graph theory</subject><subject>Multiagent systems</subject><subject>Optimization</subject><subject>Policies</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNir0KwjAYRYMgWLTvEHAO5Mem4iZF62AX616i_VpTalKTdPDtreADOF3OOXeGIi4EI9sN5wsUe99RSrlMeZKICBV5Vu5w7tTwIDflocaZta7WRgVtDS6DUwHaN26sw8XYB032LZiAL6DN5O7w_NIZlDPatCs0b1TvIf7tEq2Ph2t2IoOzrxF8qDo7OjOlikvOGKMyleK_1wc3Lz1S</recordid><startdate>20220117</startdate><enddate>20220117</enddate><creator>Ruan, Jingqing</creator><creator>Du, Yali</creator><creator>Xiong, Xuantang</creator><creator>Xing, Dengpeng</creator><creator>Li, Xiyun</creator><creator>Meng, Linghui</creator><creator>Zhang, Haifeng</creator><creator>Wang, Jun</creator><creator>Xu, Bo</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220117</creationdate><title>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</title><author>Ruan, Jingqing ; Du, Yali ; Xiong, Xuantang ; Xing, Dengpeng ; Li, Xiyun ; Meng, Linghui ; Zhang, Haifeng ; Wang, Jun ; Xu, Bo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26211106763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Coders</topic><topic>Coordination</topic><topic>Decentralized control</topic><topic>Encoders-Decoders</topic><topic>Graph theory</topic><topic>Multiagent systems</topic><topic>Optimization</topic><topic>Policies</topic><toplevel>online_resources</toplevel><creatorcontrib>Ruan, Jingqing</creatorcontrib><creatorcontrib>Du, Yali</creatorcontrib><creatorcontrib>Xiong, Xuantang</creatorcontrib><creatorcontrib>Xing, Dengpeng</creatorcontrib><creatorcontrib>Li, Xiyun</creatorcontrib><creatorcontrib>Meng, Linghui</creatorcontrib><creatorcontrib>Zhang, Haifeng</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Bo</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ruan, Jingqing</au><au>Du, Yali</au><au>Xiong, Xuantang</au><au>Xing, Dengpeng</au><au>Li, Xiyun</au><au>Meng, Linghui</au><au>Zhang, Haifeng</au><au>Wang, Jun</au><au>Xu, Bo</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</atitle><jtitle>arXiv.org</jtitle><date>2022-01-17</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-01
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2621110676
source	Free E- Journals
subjects	Coders Coordination Decentralized control Encoders-Decoders Graph theory Multiagent systems Optimization Policies
title	GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T09%3A08%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=GCS:%20Graph-based%20Coordination%20Strategy%20for%20Multi-Agent%20Reinforcement%20Learning&rft.jtitle=arXiv.org&rft.au=Ruan,%20Jingqing&rft.date=2022-01-17&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2621110676%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2621110676&rft_id=info:pmid/&rfr_iscdi=true