Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games
Often -- for example in war games, strategy video games, and financial simulations -- the game is given to us only as a black-box simulator in which we can play it. In these settings, since the game may have unknown nature action distributions (from which we can only obtain samples) and/or be too la...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Zhang, Brian Hu Sandholm, Tuomas |
description | Often -- for example in war games, strategy video games, and financial
simulations -- the game is given to us only as a black-box simulator in which
we can play it. In these settings, since the game may have unknown nature
action distributions (from which we can only obtain samples) and/or be too
large to expand fully, it can be difficult to compute strategies with
guarantees on exploitability. Recent work \cite{Zhang20:Small} resulted in a
notion of certificate for extensive-form games that allows exploitability
guarantees while not expanding the full game tree. However, that work assumed
that the black box could sample or expand arbitrary nodes of the game tree at
any time, and that a series of exact game solves (via, for example, linear
programming) can be conducted to compute the certificate. Each of those two
assumptions severely restricts the practical applicability of that method. In
this work, we relax both of the assumptions. We show that high-probability
certificates can be obtained with a black box that can do nothing more than
play through games, using only a regret minimizer as a subroutine. As a bonus,
we obtain an equilibrium-finding algorithm with $\tilde O(1/\sqrt{T})$
convergence rate in the extensive-form game setting that does not rely on a
sampling strategy with lower-bounded reach probabilities (which MCCFR assumes).
We demonstrate experimentally that, in the black-box setting, our methods are
able to provide nontrivial exploitability guarantees while expanding only a
small fraction of the game tree. |
doi_str_mv | 10.48550/arxiv.2009.07384 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2009_07384</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2009_07384</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-2966eb3565518c65ac2bb3fb212729fb973729fe95b9bd41744022ca94059bd23</originalsourceid><addsrcrecordid>eNotj71OwzAYRb0woMIDMOERBgfHv_FIo6YgVVSC7tHnxKksErdyrCp9e5rS6eje4eoehJ5ymolCSvoGcfKnjFFqMqp5Ie7Rd-VD68MeQ2hx6WLy3XmOL18OInndHpMfoMc_KUJye-9G7ANe9tD8kuVhwqspuTD6kyPVIQ54DYMbH9BdB_3oHm9coF212pUfZLNdf5bvGwJKC8KMUs5yqaTMi0ZJaJi1vLMsZ5qZzhrNZzojrbGtyLUQlLEGjKDyUjC-QM__s1ep-hgvR-O5nuXqqxz_Aw30SJY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games</title><source>arXiv.org</source><creator>Zhang, Brian Hu ; Sandholm, Tuomas</creator><creatorcontrib>Zhang, Brian Hu ; Sandholm, Tuomas</creatorcontrib><description>Often -- for example in war games, strategy video games, and financial
simulations -- the game is given to us only as a black-box simulator in which
we can play it. In these settings, since the game may have unknown nature
action distributions (from which we can only obtain samples) and/or be too
large to expand fully, it can be difficult to compute strategies with
guarantees on exploitability. Recent work \cite{Zhang20:Small} resulted in a
notion of certificate for extensive-form games that allows exploitability
guarantees while not expanding the full game tree. However, that work assumed
that the black box could sample or expand arbitrary nodes of the game tree at
any time, and that a series of exact game solves (via, for example, linear
programming) can be conducted to compute the certificate. Each of those two
assumptions severely restricts the practical applicability of that method. In
this work, we relax both of the assumptions. We show that high-probability
certificates can be obtained with a black box that can do nothing more than
play through games, using only a regret minimizer as a subroutine. As a bonus,
we obtain an equilibrium-finding algorithm with $\tilde O(1/\sqrt{T})$
convergence rate in the extensive-form game setting that does not rely on a
sampling strategy with lower-bounded reach probabilities (which MCCFR assumes).
We demonstrate experimentally that, in the black-box setting, our methods are
able to provide nontrivial exploitability guarantees while expanding only a
small fraction of the game tree.</description><identifier>DOI: 10.48550/arxiv.2009.07384</identifier><language>eng</language><subject>Computer Science - Computer Science and Game Theory</subject><creationdate>2020-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2009.07384$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2009.07384$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Brian Hu</creatorcontrib><creatorcontrib>Sandholm, Tuomas</creatorcontrib><title>Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games</title><description>Often -- for example in war games, strategy video games, and financial
simulations -- the game is given to us only as a black-box simulator in which
we can play it. In these settings, since the game may have unknown nature
action distributions (from which we can only obtain samples) and/or be too
large to expand fully, it can be difficult to compute strategies with
guarantees on exploitability. Recent work \cite{Zhang20:Small} resulted in a
notion of certificate for extensive-form games that allows exploitability
guarantees while not expanding the full game tree. However, that work assumed
that the black box could sample or expand arbitrary nodes of the game tree at
any time, and that a series of exact game solves (via, for example, linear
programming) can be conducted to compute the certificate. Each of those two
assumptions severely restricts the practical applicability of that method. In
this work, we relax both of the assumptions. We show that high-probability
certificates can be obtained with a black box that can do nothing more than
play through games, using only a regret minimizer as a subroutine. As a bonus,
we obtain an equilibrium-finding algorithm with $\tilde O(1/\sqrt{T})$
convergence rate in the extensive-form game setting that does not rely on a
sampling strategy with lower-bounded reach probabilities (which MCCFR assumes).
We demonstrate experimentally that, in the black-box setting, our methods are
able to provide nontrivial exploitability guarantees while expanding only a
small fraction of the game tree.</description><subject>Computer Science - Computer Science and Game Theory</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj71OwzAYRb0woMIDMOERBgfHv_FIo6YgVVSC7tHnxKksErdyrCp9e5rS6eje4eoehJ5ymolCSvoGcfKnjFFqMqp5Ie7Rd-VD68MeQ2hx6WLy3XmOL18OInndHpMfoMc_KUJye-9G7ANe9tD8kuVhwqspuTD6kyPVIQ54DYMbH9BdB_3oHm9coF212pUfZLNdf5bvGwJKC8KMUs5yqaTMi0ZJaJi1vLMsZ5qZzhrNZzojrbGtyLUQlLEGjKDyUjC-QM__s1ep-hgvR-O5nuXqqxz_Aw30SJY</recordid><startdate>20200915</startdate><enddate>20200915</enddate><creator>Zhang, Brian Hu</creator><creator>Sandholm, Tuomas</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200915</creationdate><title>Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games</title><author>Zhang, Brian Hu ; Sandholm, Tuomas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-2966eb3565518c65ac2bb3fb212729fb973729fe95b9bd41744022ca94059bd23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computer Science and Game Theory</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Brian Hu</creatorcontrib><creatorcontrib>Sandholm, Tuomas</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Brian Hu</au><au>Sandholm, Tuomas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games</atitle><date>2020-09-15</date><risdate>2020</risdate><abstract>Often -- for example in war games, strategy video games, and financial
simulations -- the game is given to us only as a black-box simulator in which
we can play it. In these settings, since the game may have unknown nature
action distributions (from which we can only obtain samples) and/or be too
large to expand fully, it can be difficult to compute strategies with
guarantees on exploitability. Recent work \cite{Zhang20:Small} resulted in a
notion of certificate for extensive-form games that allows exploitability
guarantees while not expanding the full game tree. However, that work assumed
that the black box could sample or expand arbitrary nodes of the game tree at
any time, and that a series of exact game solves (via, for example, linear
programming) can be conducted to compute the certificate. Each of those two
assumptions severely restricts the practical applicability of that method. In
this work, we relax both of the assumptions. We show that high-probability
certificates can be obtained with a black box that can do nothing more than
play through games, using only a regret minimizer as a subroutine. As a bonus,
we obtain an equilibrium-finding algorithm with $\tilde O(1/\sqrt{T})$
convergence rate in the extensive-form game setting that does not rely on a
sampling strategy with lower-bounded reach probabilities (which MCCFR assumes).
We demonstrate experimentally that, in the black-box setting, our methods are
able to provide nontrivial exploitability guarantees while expanding only a
small fraction of the game tree.</abstract><doi>10.48550/arxiv.2009.07384</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2009.07384 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2009_07384 |
source | arXiv.org |
subjects | Computer Science - Computer Science and Game Theory |
title | Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T19%3A50%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Finding%20and%20Certifying%20(Near-)Optimal%20Strategies%20in%20Black-Box%20Extensive-Form%20Games&rft.au=Zhang,%20Brian%20Hu&rft.date=2020-09-15&rft_id=info:doi/10.48550/arxiv.2009.07384&rft_dat=%3Carxiv_GOX%3E2009_07384%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |