Learning to Share in Multi-Agent Reinforcement Learning

In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where a number of agents are deployed as a partially connected network and each interacts only with nearby agents. Networked MARL requires all agents to make decisions in a decentralized manner to optimize a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yi, Yuxuan, Li, Ge, Wang, Yaowei, Lu, Zongqing
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Computer Science - Multiagent Systems
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Yi, Yuxuan Li, Ge Wang, Yaowei Lu, Zongqing
description	In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where a number of agents are deployed as a partially connected network and each interacts only with nearby agents. Networked MARL requires all agents to make decisions in a decentralized manner to optimize a global objective with restricted communication between neighbors over the network. Inspired by the fact that sharing plays a key role in human's learning of cooperation, we propose LToS, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective through collectives. For each agent, the high-level policy learns how to share reward with neighbors to decompose the global objective, while the low-level policy learns to optimize the local objective induced by the high-level policies in the neighborhood. The two policies form a bi-level optimization and learn alternately. We empirically demonstrate that LToS outperforms existing methods in both social dilemma and networked MARL scenarios across scales.
doi_str_mv	10.48550/arxiv.2112.08702
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2112_08702</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2112_08702</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-669b4e1e8f0a2b53e3668642417a3c6856d2474e853912e7d92684b7572da1aa3</originalsourceid><addsrcrecordid>eNo1j81qAjEUhbNxUbQP0FXzAjNNbpKbzFKkVWGK0Lof7jh3bEBjSadS31786epw4PBxPiGetCptcE69UP6LxxK0hlIFr-BB-Jopp5i2cjjIzy_KLGOS77-7IRbTLadBfnBM_SFveH9p__OJGPW0--HHe47F-u11PVsU9Wq-nE3rgtBDgVi1ljWHXhG0zrBBDGjBak9mg8FhB9ZbDs5UGth3FWCwrXceOtJEZiyeb9jr8-Y7xz3lU3MxaK4G5gyrmz7k</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning to Share in Multi-Agent Reinforcement Learning</title><source>arXiv.org</source><creator>Yi, Yuxuan ; Li, Ge ; Wang, Yaowei ; Lu, Zongqing</creator><creatorcontrib>Yi, Yuxuan ; Li, Ge ; Wang, Yaowei ; Lu, Zongqing</creatorcontrib><description>In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where a number of agents are deployed as a partially connected network and each interacts only with nearby agents. Networked MARL requires all agents to make decisions in a decentralized manner to optimize a global objective with restricted communication between neighbors over the network. Inspired by the fact that sharing plays a key role in human's learning of cooperation, we propose LToS, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective through collectives. For each agent, the high-level policy learns how to share reward with neighbors to decompose the global objective, while the low-level policy learns to optimize the local objective induced by the high-level policies in the neighborhood. The two policies form a bi-level optimization and learn alternately. We empirically demonstrate that LToS outperforms existing methods in both social dilemma and networked MARL scenarios across scales.</description><identifier>DOI: 10.48550/arxiv.2112.08702</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Multiagent Systems</subject><creationdate>2021-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2112.08702$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2112.08702$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Yi, Yuxuan</creatorcontrib><creatorcontrib>Li, Ge</creatorcontrib><creatorcontrib>Wang, Yaowei</creatorcontrib><creatorcontrib>Lu, Zongqing</creatorcontrib><title>Learning to Share in Multi-Agent Reinforcement Learning</title><description>In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where a number of agents are deployed as a partially connected network and each interacts only with nearby agents. Networked MARL requires all agents to make decisions in a decentralized manner to optimize a global objective with restricted communication between neighbors over the network. Inspired by the fact that sharing plays a key role in human's learning of cooperation, we propose LToS, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective through collectives. For each agent, the high-level policy learns how to share reward with neighbors to decompose the global objective, while the low-level policy learns to optimize the local objective induced by the high-level policies in the neighborhood. The two policies form a bi-level optimization and learn alternately. We empirically demonstrate that LToS outperforms existing methods in both social dilemma and networked MARL scenarios across scales.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Multiagent Systems</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo1j81qAjEUhbNxUbQP0FXzAjNNbpKbzFKkVWGK0Lof7jh3bEBjSadS31786epw4PBxPiGetCptcE69UP6LxxK0hlIFr-BB-Jopp5i2cjjIzy_KLGOS77-7IRbTLadBfnBM_SFveH9p__OJGPW0--HHe47F-u11PVsU9Wq-nE3rgtBDgVi1ljWHXhG0zrBBDGjBak9mg8FhB9ZbDs5UGth3FWCwrXceOtJEZiyeb9jr8-Y7xz3lU3MxaK4G5gyrmz7k</recordid><startdate>20211216</startdate><enddate>20211216</enddate><creator>Yi, Yuxuan</creator><creator>Li, Ge</creator><creator>Wang, Yaowei</creator><creator>Lu, Zongqing</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20211216</creationdate><title>Learning to Share in Multi-Agent Reinforcement Learning</title><author>Yi, Yuxuan ; Li, Ge ; Wang, Yaowei ; Lu, Zongqing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-669b4e1e8f0a2b53e3668642417a3c6856d2474e853912e7d92684b7572da1aa3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Multiagent Systems</topic><toplevel>online_resources</toplevel><creatorcontrib>Yi, Yuxuan</creatorcontrib><creatorcontrib>Li, Ge</creatorcontrib><creatorcontrib>Wang, Yaowei</creatorcontrib><creatorcontrib>Lu, Zongqing</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yi, Yuxuan</au><au>Li, Ge</au><au>Wang, Yaowei</au><au>Lu, Zongqing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning to Share in Multi-Agent Reinforcement Learning</atitle><date>2021-12-16</date><risdate>2021</risdate><abstract>In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where a number of agents are deployed as a partially connected network and each interacts only with nearby agents. Networked MARL requires all agents to make decisions in a decentralized manner to optimize a global objective with restricted communication between neighbors over the network. Inspired by the fact that sharing plays a key role in human's learning of cooperation, we propose LToS, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective through collectives. For each agent, the high-level policy learns how to share reward with neighbors to decompose the global objective, while the low-level policy learns to optimize the local objective induced by the high-level policies in the neighborhood. The two policies form a bi-level optimization and learn alternately. We empirically demonstrate that LToS outperforms existing methods in both social dilemma and networked MARL scenarios across scales.</abstract><doi>10.48550/arxiv.2112.08702</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2112.08702
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2112_08702
source	arXiv.org
subjects	Computer Science - Learning Computer Science - Multiagent Systems
title	Learning to Share in Multi-Agent Reinforcement Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T00%3A37%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20to%20Share%20in%20Multi-Agent%20Reinforcement%20Learning&rft.au=Yi,%20Yuxuan&rft.date=2021-12-16&rft_id=info:doi/10.48550/arxiv.2112.08702&rft_dat=%3Carxiv_GOX%3E2112_08702%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true