Score-Based Equilibrium Learning in Multi-Player Finite Games with Imperfect Information

Real-world games, which concern imperfect information, multiple players, and simultaneous moves, are less frequently discussed in the existing literature of game theory. While reinforcement learning (RL) provides a general framework to extend the game theoretical algorithms, the assumptions that gua...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lu, Runyu, Zhu, Yuanheng, Zhao, Dongbin
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Science and Game Theory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Lu, Runyu Zhu, Yuanheng Zhao, Dongbin
description	Real-world games, which concern imperfect information, multiple players, and simultaneous moves, are less frequently discussed in the existing literature of game theory. While reinforcement learning (RL) provides a general framework to extend the game theoretical algorithms, the assumptions that guarantee their convergence towards Nash equilibria may no longer hold in real-world games. Starting from the definition of the Nash distribution, we construct a continuous-time dynamic named imperfect-information exponential-decay score-based learning (IESL) to find approximate Nash equilibria in games with the above-mentioned features. Theoretical analysis demonstrates that IESL yields equilibrium-approaching policies in imperfect information simultaneous games with the basic assumption of concavity. Experimental results show that IESL manages to find approximate Nash equilibria in four canonical poker scenarios and significantly outperforms three other representative algorithms in 3-player Leduc poker, manifesting its equilibrium-finding ability even in practical sequential games. Furthermore, related to the concept of game hypomonotonicity, a trade-off between the convergence of the IESL dynamic and the ultimate NashConv of the convergent policies is observed from the perspectives of both theory and experiment.
doi_str_mv	10.48550/arxiv.2306.00350
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_00350</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_00350</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-924b8a535efd1269d1d44ca3477bf1376822529246de8bbb53407212ee7a07933</originalsourceid><addsrcrecordid>eNotz81Og0AUhuHZuDDVC3Dl3AA4vwwstWkrCUYTu3BHzsBBTwJDHUDt3avV1bd58yUPY1dSpCa3VtxA_KKPVGmRpUJoK87Zy3MzRkzuYMKWb94X6slHWgZeIcRA4ZVT4A9LP1Py1MMRI99SoBn5Dgac-CfNb7wcDhg7bGZehm6MA8w0hgt21kE_4eX_rth-u9mv75PqcVeub6sEMieSQhmfg9UWu1aqrGhla0wD2jjnO6ldlitl1U-VtZh77602wimpEB0IV2i9Ytd_tydafYg0QDzWv8T6RNTfpL5LXw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Score-Based Equilibrium Learning in Multi-Player Finite Games with Imperfect Information</title><source>arXiv.org</source><creator>Lu, Runyu ; Zhu, Yuanheng ; Zhao, Dongbin</creator><creatorcontrib>Lu, Runyu ; Zhu, Yuanheng ; Zhao, Dongbin</creatorcontrib><description>Real-world games, which concern imperfect information, multiple players, and simultaneous moves, are less frequently discussed in the existing literature of game theory. While reinforcement learning (RL) provides a general framework to extend the game theoretical algorithms, the assumptions that guarantee their convergence towards Nash equilibria may no longer hold in real-world games. Starting from the definition of the Nash distribution, we construct a continuous-time dynamic named imperfect-information exponential-decay score-based learning (IESL) to find approximate Nash equilibria in games with the above-mentioned features. Theoretical analysis demonstrates that IESL yields equilibrium-approaching policies in imperfect information simultaneous games with the basic assumption of concavity. Experimental results show that IESL manages to find approximate Nash equilibria in four canonical poker scenarios and significantly outperforms three other representative algorithms in 3-player Leduc poker, manifesting its equilibrium-finding ability even in practical sequential games. Furthermore, related to the concept of game hypomonotonicity, a trade-off between the convergence of the IESL dynamic and the ultimate NashConv of the convergent policies is observed from the perspectives of both theory and experiment.</description><identifier>DOI: 10.48550/arxiv.2306.00350</identifier><language>eng</language><subject>Computer Science - Computer Science and Game Theory</subject><creationdate>2023-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.00350$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.00350$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lu, Runyu</creatorcontrib><creatorcontrib>Zhu, Yuanheng</creatorcontrib><creatorcontrib>Zhao, Dongbin</creatorcontrib><title>Score-Based Equilibrium Learning in Multi-Player Finite Games with Imperfect Information</title><description>Real-world games, which concern imperfect information, multiple players, and simultaneous moves, are less frequently discussed in the existing literature of game theory. While reinforcement learning (RL) provides a general framework to extend the game theoretical algorithms, the assumptions that guarantee their convergence towards Nash equilibria may no longer hold in real-world games. Starting from the definition of the Nash distribution, we construct a continuous-time dynamic named imperfect-information exponential-decay score-based learning (IESL) to find approximate Nash equilibria in games with the above-mentioned features. Theoretical analysis demonstrates that IESL yields equilibrium-approaching policies in imperfect information simultaneous games with the basic assumption of concavity. Experimental results show that IESL manages to find approximate Nash equilibria in four canonical poker scenarios and significantly outperforms three other representative algorithms in 3-player Leduc poker, manifesting its equilibrium-finding ability even in practical sequential games. Furthermore, related to the concept of game hypomonotonicity, a trade-off between the convergence of the IESL dynamic and the ultimate NashConv of the convergent policies is observed from the perspectives of both theory and experiment.</description><subject>Computer Science - Computer Science and Game Theory</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81Og0AUhuHZuDDVC3Dl3AA4vwwstWkrCUYTu3BHzsBBTwJDHUDt3avV1bd58yUPY1dSpCa3VtxA_KKPVGmRpUJoK87Zy3MzRkzuYMKWb94X6slHWgZeIcRA4ZVT4A9LP1Py1MMRI99SoBn5Dgac-CfNb7wcDhg7bGZehm6MA8w0hgt21kE_4eX_rth-u9mv75PqcVeub6sEMieSQhmfg9UWu1aqrGhla0wD2jjnO6ldlitl1U-VtZh77602wimpEB0IV2i9Ytd_tydafYg0QDzWv8T6RNTfpL5LXw</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Lu, Runyu</creator><creator>Zhu, Yuanheng</creator><creator>Zhao, Dongbin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230601</creationdate><title>Score-Based Equilibrium Learning in Multi-Player Finite Games with Imperfect Information</title><author>Lu, Runyu ; Zhu, Yuanheng ; Zhao, Dongbin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-924b8a535efd1269d1d44ca3477bf1376822529246de8bbb53407212ee7a07933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Science and Game Theory</topic><toplevel>online_resources</toplevel><creatorcontrib>Lu, Runyu</creatorcontrib><creatorcontrib>Zhu, Yuanheng</creatorcontrib><creatorcontrib>Zhao, Dongbin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lu, Runyu</au><au>Zhu, Yuanheng</au><au>Zhao, Dongbin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Score-Based Equilibrium Learning in Multi-Player Finite Games with Imperfect Information</atitle><date>2023-06-01</date><risdate>2023</risdate><abstract>Real-world games, which concern imperfect information, multiple players, and simultaneous moves, are less frequently discussed in the existing literature of game theory. While reinforcement learning (RL) provides a general framework to extend the game theoretical algorithms, the assumptions that guarantee their convergence towards Nash equilibria may no longer hold in real-world games. Starting from the definition of the Nash distribution, we construct a continuous-time dynamic named imperfect-information exponential-decay score-based learning (IESL) to find approximate Nash equilibria in games with the above-mentioned features. Theoretical analysis demonstrates that IESL yields equilibrium-approaching policies in imperfect information simultaneous games with the basic assumption of concavity. Experimental results show that IESL manages to find approximate Nash equilibria in four canonical poker scenarios and significantly outperforms three other representative algorithms in 3-player Leduc poker, manifesting its equilibrium-finding ability even in practical sequential games. Furthermore, related to the concept of game hypomonotonicity, a trade-off between the convergence of the IESL dynamic and the ultimate NashConv of the convergent policies is observed from the perspectives of both theory and experiment.</abstract><doi>10.48550/arxiv.2306.00350</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2306.00350
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2306_00350
source	arXiv.org
subjects	Computer Science - Computer Science and Game Theory
title	Score-Based Equilibrium Learning in Multi-Player Finite Games with Imperfect Information
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T02%3A42%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Score-Based%20Equilibrium%20Learning%20in%20Multi-Player%20Finite%20Games%20with%20Imperfect%20Information&rft.au=Lu,%20Runyu&rft.date=2023-06-01&rft_id=info:doi/10.48550/arxiv.2306.00350&rft_dat=%3Carxiv_GOX%3E2306_00350%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true