Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization

Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wang, Xiangsen, Zhan, Xianyuan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Wang, Xiangsen
Zhan, Xianyuan
description Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.
doi_str_mv 10.48550/arxiv.2306.08900
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_08900</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_08900</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-2d10a10385be5302e35a6e1b970b4d34036ba5ee080af919e8f6e503f56eb4223</originalsourceid><addsrcrecordid>eNotz71OwzAUhmEvDKjlApjwDSSc2LHrjFVEARFUCVWs0XFz3Fpynco4_F09aun06V0-6WHstoKyNkrBPaZv_1kKCboE0wBcs5e1c8FH4q9TyL5Y7ihm_kY-ujFt6XCqjjBFH3f8y-c9b8fpGGjg7xgm4ivc5jH5X8x-jHN25TB80M1lZ2yzeti0T0W3fnxul12BegGFGCrACqRRlpQEQVKhpso2C7D1IGuQ2qIiAgPomqoh4zQpkE5psrUQcsbu_m_Pmv6Y_AHTT39S9WeV_AOQ-Ed7</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization</title><source>arXiv.org</source><creator>Wang, Xiangsen ; Zhan, Xianyuan</creator><creatorcontrib>Wang, Xiangsen ; Zhan, Xianyuan</creatorcontrib><description>Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.</description><identifier>DOI: 10.48550/arxiv.2306.08900</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Multiagent Systems</subject><creationdate>2023-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.08900$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.08900$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Xiangsen</creatorcontrib><creatorcontrib>Zhan, Xianyuan</creatorcontrib><title>Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization</title><description>Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Multiagent Systems</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUhmEvDKjlApjwDSSc2LHrjFVEARFUCVWs0XFz3Fpynco4_F09aun06V0-6WHstoKyNkrBPaZv_1kKCboE0wBcs5e1c8FH4q9TyL5Y7ihm_kY-ujFt6XCqjjBFH3f8y-c9b8fpGGjg7xgm4ivc5jH5X8x-jHN25TB80M1lZ2yzeti0T0W3fnxul12BegGFGCrACqRRlpQEQVKhpso2C7D1IGuQ2qIiAgPomqoh4zQpkE5psrUQcsbu_m_Pmv6Y_AHTT39S9WeV_AOQ-Ed7</recordid><startdate>20230615</startdate><enddate>20230615</enddate><creator>Wang, Xiangsen</creator><creator>Zhan, Xianyuan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230615</creationdate><title>Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization</title><author>Wang, Xiangsen ; Zhan, Xianyuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-2d10a10385be5302e35a6e1b970b4d34036ba5ee080af919e8f6e503f56eb4223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Multiagent Systems</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Xiangsen</creatorcontrib><creatorcontrib>Zhan, Xianyuan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Xiangsen</au><au>Zhan, Xianyuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization</atitle><date>2023-06-15</date><risdate>2023</risdate><abstract>Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.</abstract><doi>10.48550/arxiv.2306.08900</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2306.08900
ispartof
issn
language eng
recordid cdi_arxiv_primary_2306_08900
source arXiv.org
subjects Computer Science - Learning
Computer Science - Multiagent Systems
title Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T14%3A28%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Offline%20Multi-Agent%20Reinforcement%20Learning%20with%20Coupled%20Value%20Factorization&rft.au=Wang,%20Xiangsen&rft.date=2023-06-15&rft_id=info:doi/10.48550/arxiv.2306.08900&rft_dat=%3Carxiv_GOX%3E2306_08900%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true