Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization

This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each indivi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE control systems letters 2024, Vol.8, p.1217-1222
Hauptverfasser: Sun, Youbang, Liu, Tao, Kumar, P. R., Shahrampour, Shahin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1222
container_issue
container_start_page 1217
container_title IEEE control systems letters
container_volume 8
creator Sun, Youbang
Liu, Tao
Kumar, P. R.
Shahrampour, Shahin
description This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all agents in the multi-agent system, leading to a game between agents. All agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies that agents are more rational and behave closer to Nash policies. On the other hand, with larger regularization agents tend to act randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium (NE), our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.
doi_str_mv 10.1109/LCSYS.2024.3410149
format Article
fullrecord <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10549978</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10549978</ieee_id><sourcerecordid>10_1109_LCSYS_2024_3410149</sourcerecordid><originalsourceid>FETCH-LOGICAL-c198t-ac0b195e7a27b19b5011affc5ea33874972c9e706ea8731645596d474b6cc3013</originalsourceid><addsrcrecordid>eNpNkM1Kw0AUhQdRsNS-gLiYF0id30xmKaHWQlCxirgx3E5v6kg6KZNUiE9va7vo5tzDhe8sPkKuORtzzuxtkc8_5mPBhBpLxRlX9owMhDI64Uqn5yf9koza9psxxjNhmLAD8ln4gBBp3oQfjCsMDmlT0VlY4gZ3ETr6CN02Qk2fm9q7nk4jLP3-7wOdwhpb-u67LzoJXWw2PX3B1baG6H-h8024IhcV1C2OjndI3u4nr_lDUjxNZ_ldkThusy4BxxbcajQgzK4sNOMcqsppBCkzo6wRzqJhKUJmJE-V1jZdKqMWqXOScTkk4rDrYtO2EatyE_0aYl9yVu4llf-Syr2k8ihpB90cII-IJ4BW1ppM_gE4X2Pu</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization</title><source>IEEE Electronic Library (IEL)</source><creator>Sun, Youbang ; Liu, Tao ; Kumar, P. R. ; Shahrampour, Shahin</creator><creatorcontrib>Sun, Youbang ; Liu, Tao ; Kumar, P. R. ; Shahrampour, Shahin</creatorcontrib><description>This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all agents in the multi-agent system, leading to a game between agents. All agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies that agents are more rational and behave closer to Nash policies. On the other hand, with larger regularization agents tend to act randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium (NE), our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.</description><identifier>ISSN: 2475-1456</identifier><identifier>EISSN: 2475-1456</identifier><identifier>DOI: 10.1109/LCSYS.2024.3410149</identifier><identifier>CODEN: ICSLBO</identifier><language>eng</language><publisher>IEEE</publisher><subject>Approximation algorithms ; Convergence ; Entropy ; Game theory ; Games ; Gradient methods ; multi-agent reinforcement learning ; Nash equilibrium ; natural policy gradient ; quantal response equilibrium ; Reinforcement learning</subject><ispartof>IEEE control systems letters, 2024, Vol.8, p.1217-1222</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c198t-ac0b195e7a27b19b5011affc5ea33874972c9e706ea8731645596d474b6cc3013</citedby><cites>FETCH-LOGICAL-c198t-ac0b195e7a27b19b5011affc5ea33874972c9e706ea8731645596d474b6cc3013</cites><orcidid>0000-0001-7879-5315 ; 0000-0003-2494-8552 ; 0000-0003-3093-8510 ; 0000-0003-0389-5367</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10549978$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,4010,27900,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10549978$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sun, Youbang</creatorcontrib><creatorcontrib>Liu, Tao</creatorcontrib><creatorcontrib>Kumar, P. R.</creatorcontrib><creatorcontrib>Shahrampour, Shahin</creatorcontrib><title>Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization</title><title>IEEE control systems letters</title><addtitle>LCSYS</addtitle><description>This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all agents in the multi-agent system, leading to a game between agents. All agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies that agents are more rational and behave closer to Nash policies. On the other hand, with larger regularization agents tend to act randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium (NE), our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.</description><subject>Approximation algorithms</subject><subject>Convergence</subject><subject>Entropy</subject><subject>Game theory</subject><subject>Games</subject><subject>Gradient methods</subject><subject>multi-agent reinforcement learning</subject><subject>Nash equilibrium</subject><subject>natural policy gradient</subject><subject>quantal response equilibrium</subject><subject>Reinforcement learning</subject><issn>2475-1456</issn><issn>2475-1456</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1Kw0AUhQdRsNS-gLiYF0id30xmKaHWQlCxirgx3E5v6kg6KZNUiE9va7vo5tzDhe8sPkKuORtzzuxtkc8_5mPBhBpLxRlX9owMhDI64Uqn5yf9koza9psxxjNhmLAD8ln4gBBp3oQfjCsMDmlT0VlY4gZ3ETr6CN02Qk2fm9q7nk4jLP3-7wOdwhpb-u67LzoJXWw2PX3B1baG6H-h8024IhcV1C2OjndI3u4nr_lDUjxNZ_ldkThusy4BxxbcajQgzK4sNOMcqsppBCkzo6wRzqJhKUJmJE-V1jZdKqMWqXOScTkk4rDrYtO2EatyE_0aYl9yVu4llf-Syr2k8ihpB90cII-IJ4BW1ppM_gE4X2Pu</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Sun, Youbang</creator><creator>Liu, Tao</creator><creator>Kumar, P. R.</creator><creator>Shahrampour, Shahin</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-7879-5315</orcidid><orcidid>https://orcid.org/0000-0003-2494-8552</orcidid><orcidid>https://orcid.org/0000-0003-3093-8510</orcidid><orcidid>https://orcid.org/0000-0003-0389-5367</orcidid></search><sort><creationdate>2024</creationdate><title>Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization</title><author>Sun, Youbang ; Liu, Tao ; Kumar, P. R. ; Shahrampour, Shahin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c198t-ac0b195e7a27b19b5011affc5ea33874972c9e706ea8731645596d474b6cc3013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Approximation algorithms</topic><topic>Convergence</topic><topic>Entropy</topic><topic>Game theory</topic><topic>Games</topic><topic>Gradient methods</topic><topic>multi-agent reinforcement learning</topic><topic>Nash equilibrium</topic><topic>natural policy gradient</topic><topic>quantal response equilibrium</topic><topic>Reinforcement learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Youbang</creatorcontrib><creatorcontrib>Liu, Tao</creatorcontrib><creatorcontrib>Kumar, P. R.</creatorcontrib><creatorcontrib>Shahrampour, Shahin</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE control systems letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sun, Youbang</au><au>Liu, Tao</au><au>Kumar, P. R.</au><au>Shahrampour, Shahin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization</atitle><jtitle>IEEE control systems letters</jtitle><stitle>LCSYS</stitle><date>2024</date><risdate>2024</risdate><volume>8</volume><spage>1217</spage><epage>1222</epage><pages>1217-1222</pages><issn>2475-1456</issn><eissn>2475-1456</eissn><coden>ICSLBO</coden><abstract>This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all agents in the multi-agent system, leading to a game between agents. All agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies that agents are more rational and behave closer to Nash policies. On the other hand, with larger regularization agents tend to act randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium (NE), our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.</abstract><pub>IEEE</pub><doi>10.1109/LCSYS.2024.3410149</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0001-7879-5315</orcidid><orcidid>https://orcid.org/0000-0003-2494-8552</orcidid><orcidid>https://orcid.org/0000-0003-3093-8510</orcidid><orcidid>https://orcid.org/0000-0003-0389-5367</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2475-1456
ispartof IEEE control systems letters, 2024, Vol.8, p.1217-1222
issn 2475-1456
2475-1456
language eng
recordid cdi_ieee_primary_10549978
source IEEE Electronic Library (IEL)
subjects Approximation algorithms
Convergence
Entropy
Game theory
Games
Gradient methods
multi-agent reinforcement learning
Nash equilibrium
natural policy gradient
quantal response equilibrium
Reinforcement learning
title Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T18%3A10%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Linear%20Convergence%20of%20Independent%20Natural%20Policy%20Gradient%20in%20Games%20With%20Entropy%20Regularization&rft.jtitle=IEEE%20control%20systems%20letters&rft.au=Sun,%20Youbang&rft.date=2024&rft.volume=8&rft.spage=1217&rft.epage=1222&rft.pages=1217-1222&rft.issn=2475-1456&rft.eissn=2475-1456&rft.coden=ICSLBO&rft_id=info:doi/10.1109/LCSYS.2024.3410149&rft_dat=%3Ccrossref_RIE%3E10_1109_LCSYS_2024_3410149%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10549978&rfr_iscdi=true