GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning
Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complic...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2022-01 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Ruan, Jingqing Du, Yali Xiong, Xuantang Xing, Dengpeng Li, Xiyun Meng, Linghui Zhang, Haifeng Wang, Jun Xu, Bo |
description | Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2621110676</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2621110676</sourcerecordid><originalsourceid>FETCH-proquest_journals_26211106763</originalsourceid><addsrcrecordid>eNqNir0KwjAYRYMgWLTvEHAO5Mem4iZF62AX616i_VpTalKTdPDtreADOF3OOXeGIi4EI9sN5wsUe99RSrlMeZKICBV5Vu5w7tTwIDflocaZta7WRgVtDS6DUwHaN26sw8XYB032LZiAL6DN5O7w_NIZlDPatCs0b1TvIf7tEq2Ph2t2IoOzrxF8qDo7OjOlikvOGKMyleK_1wc3Lz1S</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2621110676</pqid></control><display><type>article</type><title>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</title><source>Free E- Journals</source><creator>Ruan, Jingqing ; Du, Yali ; Xiong, Xuantang ; Xing, Dengpeng ; Li, Xiyun ; Meng, Linghui ; Zhang, Haifeng ; Wang, Jun ; Xu, Bo</creator><creatorcontrib>Ruan, Jingqing ; Du, Yali ; Xiong, Xuantang ; Xing, Dengpeng ; Li, Xiyun ; Meng, Linghui ; Zhang, Haifeng ; Wang, Jun ; Xu, Bo</creatorcontrib><description>Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Coders ; Coordination ; Decentralized control ; Encoders-Decoders ; Graph theory ; Multiagent systems ; Optimization ; Policies</subject><ispartof>arXiv.org, 2022-01</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Ruan, Jingqing</creatorcontrib><creatorcontrib>Du, Yali</creatorcontrib><creatorcontrib>Xiong, Xuantang</creatorcontrib><creatorcontrib>Xing, Dengpeng</creatorcontrib><creatorcontrib>Li, Xiyun</creatorcontrib><creatorcontrib>Meng, Linghui</creatorcontrib><creatorcontrib>Zhang, Haifeng</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Bo</creatorcontrib><title>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</title><title>arXiv.org</title><description>Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.</description><subject>Coders</subject><subject>Coordination</subject><subject>Decentralized control</subject><subject>Encoders-Decoders</subject><subject>Graph theory</subject><subject>Multiagent systems</subject><subject>Optimization</subject><subject>Policies</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNir0KwjAYRYMgWLTvEHAO5Mem4iZF62AX616i_VpTalKTdPDtreADOF3OOXeGIi4EI9sN5wsUe99RSrlMeZKICBV5Vu5w7tTwIDflocaZta7WRgVtDS6DUwHaN26sw8XYB032LZiAL6DN5O7w_NIZlDPatCs0b1TvIf7tEq2Ph2t2IoOzrxF8qDo7OjOlikvOGKMyleK_1wc3Lz1S</recordid><startdate>20220117</startdate><enddate>20220117</enddate><creator>Ruan, Jingqing</creator><creator>Du, Yali</creator><creator>Xiong, Xuantang</creator><creator>Xing, Dengpeng</creator><creator>Li, Xiyun</creator><creator>Meng, Linghui</creator><creator>Zhang, Haifeng</creator><creator>Wang, Jun</creator><creator>Xu, Bo</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220117</creationdate><title>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</title><author>Ruan, Jingqing ; Du, Yali ; Xiong, Xuantang ; Xing, Dengpeng ; Li, Xiyun ; Meng, Linghui ; Zhang, Haifeng ; Wang, Jun ; Xu, Bo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26211106763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Coders</topic><topic>Coordination</topic><topic>Decentralized control</topic><topic>Encoders-Decoders</topic><topic>Graph theory</topic><topic>Multiagent systems</topic><topic>Optimization</topic><topic>Policies</topic><toplevel>online_resources</toplevel><creatorcontrib>Ruan, Jingqing</creatorcontrib><creatorcontrib>Du, Yali</creatorcontrib><creatorcontrib>Xiong, Xuantang</creatorcontrib><creatorcontrib>Xing, Dengpeng</creatorcontrib><creatorcontrib>Li, Xiyun</creatorcontrib><creatorcontrib>Meng, Linghui</creatorcontrib><creatorcontrib>Zhang, Haifeng</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Bo</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ruan, Jingqing</au><au>Du, Yali</au><au>Xiong, Xuantang</au><au>Xing, Dengpeng</au><au>Li, Xiyun</au><au>Meng, Linghui</au><au>Zhang, Haifeng</au><au>Wang, Jun</au><au>Xu, Bo</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning</atitle><jtitle>arXiv.org</jtitle><date>2022-01-17</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2022-01 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2621110676 |
source | Free E- Journals |
subjects | Coders Coordination Decentralized control Encoders-Decoders Graph theory Multiagent systems Optimization Policies |
title | GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T09%3A08%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=GCS:%20Graph-based%20Coordination%20Strategy%20for%20Multi-Agent%20Reinforcement%20Learning&rft.jtitle=arXiv.org&rft.au=Ruan,%20Jingqing&rft.date=2022-01-17&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2621110676%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2621110676&rft_id=info:pmid/&rfr_iscdi=true |