Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization

Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wang, Xiangsen, Zhan, Xianyuan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Computer Science - Multiagent Systems
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wang, Xiangsen Zhan, Xianyuan
description	Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.
doi_str_mv	10.48550/arxiv.2306.08900
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_08900</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_08900</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-2d10a10385be5302e35a6e1b970b4d34036ba5ee080af919e8f6e503f56eb4223</originalsourceid><addsrcrecordid>eNotz71OwzAUhmEvDKjlApjwDSSc2LHrjFVEARFUCVWs0XFz3Fpynco4_F09aun06V0-6WHstoKyNkrBPaZv_1kKCboE0wBcs5e1c8FH4q9TyL5Y7ihm_kY-ujFt6XCqjjBFH3f8y-c9b8fpGGjg7xgm4ivc5jH5X8x-jHN25TB80M1lZ2yzeti0T0W3fnxul12BegGFGCrACqRRlpQEQVKhpso2C7D1IGuQ2qIiAgPomqoh4zQpkE5psrUQcsbu_m_Pmv6Y_AHTT39S9WeV_AOQ-Ed7</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization</title><source>arXiv.org</source><creator>Wang, Xiangsen ; Zhan, Xianyuan</creator><creatorcontrib>Wang, Xiangsen ; Zhan, Xianyuan</creatorcontrib><description>Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.</description><identifier>DOI: 10.48550/arxiv.2306.08900</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Multiagent Systems</subject><creationdate>2023-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.08900$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.08900$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Xiangsen</creatorcontrib><creatorcontrib>Zhan, Xianyuan</creatorcontrib><title>Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization</title><description>Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Multiagent Systems</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUhmEvDKjlApjwDSSc2LHrjFVEARFUCVWs0XFz3Fpynco4_F09aun06V0-6WHstoKyNkrBPaZv_1kKCboE0wBcs5e1c8FH4q9TyL5Y7ihm_kY-ujFt6XCqjjBFH3f8y-c9b8fpGGjg7xgm4ivc5jH5X8x-jHN25TB80M1lZ2yzeti0T0W3fnxul12BegGFGCrACqRRlpQEQVKhpso2C7D1IGuQ2qIiAgPomqoh4zQpkE5psrUQcsbu_m_Pmv6Y_AHTT39S9WeV_AOQ-Ed7</recordid><startdate>20230615</startdate><enddate>20230615</enddate><creator>Wang, Xiangsen</creator><creator>Zhan, Xianyuan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230615</creationdate><title>Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization</title><author>Wang, Xiangsen ; Zhan, Xianyuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-2d10a10385be5302e35a6e1b970b4d34036ba5ee080af919e8f6e503f56eb4223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Multiagent Systems</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Xiangsen</creatorcontrib><creatorcontrib>Zhan, Xianyuan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Xiangsen</au><au>Zhan, Xianyuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization</atitle><date>2023-06-15</date><risdate>2023</risdate><abstract>Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.</abstract><doi>10.48550/arxiv.2306.08900</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2306.08900
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2306_08900
source	arXiv.org
subjects	Computer Science - Learning Computer Science - Multiagent Systems
title	Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T14%3A28%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Offline%20Multi-Agent%20Reinforcement%20Learning%20with%20Coupled%20Value%20Factorization&rft.au=Wang,%20Xiangsen&rft.date=2023-06-15&rft_id=info:doi/10.48550/arxiv.2306.08900&rft_dat=%3Carxiv_GOX%3E2306_08900%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true