Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek

Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently l...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Vu, Dong Quan, Loiseau, Patrick, Silva, Alonso, Tran-Thanh, Long
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Vu, Dong Quan
Loiseau, Patrick
Silva, Alonso
Tran-Thanh, Long
description Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expected-regret bound of EXP3-OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of EXP3-OE. We illustrate the benefit of using EXP3-OE in SOPPP by applying it to the online CB and HS games.
doi_str_mv 10.48550/arxiv.1905.11151
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1905_11151</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1905_11151</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-badb09385c63e90b572a32d8adb63425e32c04002c99c53cffba109ebaf43933</originalsourceid><addsrcrecordid>eNotj11LwzAYhXPjhUx_gFfmD6Tmo-maSynqhOEKFbwsb9K3Lpilkozp_r3d5tWBh3MOPITcCV6Utdb8AdKvPxTCcF0IIbS4Jm8t7Le0DRCjj5-0TZMNuMv0x8-48wPSjc2YDrD3U8zsY4uRNlOYIoZ8mh3pai4xiAPrEL9uyNUIIePtfy5I9_z03qzYevPy2jyuGVRLwSwMlhtVa1cpNNzqpQQlh3rGlSqlRiUdLzmXzhinlRtHC4IbtDCWyii1IPeX17NP_538DtKxP3n1Zy_1B_rLSBM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek</title><source>arXiv.org</source><creator>Vu, Dong Quan ; Loiseau, Patrick ; Silva, Alonso ; Tran-Thanh, Long</creator><creatorcontrib>Vu, Dong Quan ; Loiseau, Patrick ; Silva, Alonso ; Tran-Thanh, Long</creatorcontrib><description>Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expected-regret bound of EXP3-OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of EXP3-OE. We illustrate the benefit of using EXP3-OE in SOPPP by applying it to the online CB and HS games.</description><identifier>DOI: 10.48550/arxiv.1905.11151</identifier><language>eng</language><subject>Computer Science - Computer Science and Game Theory ; Computer Science - Learning</subject><creationdate>2019-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1905.11151$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1905.11151$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Vu, Dong Quan</creatorcontrib><creatorcontrib>Loiseau, Patrick</creatorcontrib><creatorcontrib>Silva, Alonso</creatorcontrib><creatorcontrib>Tran-Thanh, Long</creatorcontrib><title>Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek</title><description>Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expected-regret bound of EXP3-OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of EXP3-OE. We illustrate the benefit of using EXP3-OE in SOPPP by applying it to the online CB and HS games.</description><subject>Computer Science - Computer Science and Game Theory</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj11LwzAYhXPjhUx_gFfmD6Tmo-maSynqhOEKFbwsb9K3Lpilkozp_r3d5tWBh3MOPITcCV6Utdb8AdKvPxTCcF0IIbS4Jm8t7Le0DRCjj5-0TZMNuMv0x8-48wPSjc2YDrD3U8zsY4uRNlOYIoZ8mh3pai4xiAPrEL9uyNUIIePtfy5I9_z03qzYevPy2jyuGVRLwSwMlhtVa1cpNNzqpQQlh3rGlSqlRiUdLzmXzhinlRtHC4IbtDCWyii1IPeX17NP_538DtKxP3n1Zy_1B_rLSBM</recordid><startdate>20190527</startdate><enddate>20190527</enddate><creator>Vu, Dong Quan</creator><creator>Loiseau, Patrick</creator><creator>Silva, Alonso</creator><creator>Tran-Thanh, Long</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190527</creationdate><title>Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek</title><author>Vu, Dong Quan ; Loiseau, Patrick ; Silva, Alonso ; Tran-Thanh, Long</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-badb09385c63e90b572a32d8adb63425e32c04002c99c53cffba109ebaf43933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Computer Science and Game Theory</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Vu, Dong Quan</creatorcontrib><creatorcontrib>Loiseau, Patrick</creatorcontrib><creatorcontrib>Silva, Alonso</creatorcontrib><creatorcontrib>Tran-Thanh, Long</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vu, Dong Quan</au><au>Loiseau, Patrick</au><au>Silva, Alonso</au><au>Tran-Thanh, Long</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek</atitle><date>2019-05-27</date><risdate>2019</risdate><abstract>Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expected-regret bound of EXP3-OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of EXP3-OE. We illustrate the benefit of using EXP3-OE in SOPPP by applying it to the online CB and HS games.</abstract><doi>10.48550/arxiv.1905.11151</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1905.11151
ispartof
issn
language eng
recordid cdi_arxiv_primary_1905_11151
source arXiv.org
subjects Computer Science - Computer Science and Game Theory
Computer Science - Learning
title Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T17%3A07%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Path%20Planning%20Problems%20with%20Side%20Observations-When%20Colonels%20Play%20Hide-and-Seek&rft.au=Vu,%20Dong%20Quan&rft.date=2019-05-27&rft_id=info:doi/10.48550/arxiv.1905.11151&rft_dat=%3Carxiv_GOX%3E1905_11151%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true