Approximate information maximization for bandit games

Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hid...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Barbier-Chebbah, Alex, Vestergaard, Christian L, Masson, Jean-Baptiste, Boursier, Etienne
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Machine Learning Statistics Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Barbier-Chebbah, Alex Vestergaard, Christian L Masson, Jean-Baptiste Boursier, Etienne
description	Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.
doi_str_mv	10.48550/arxiv.2310.12563
format	Article
fullrecord	<record><control><sourceid>hal_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_12563</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_HAL_hal_04246907v4</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1013-ad3b9f59e6b3b879ba3f4dc5f0992aa1c715fc0c447a2ed2c10232ab82023f193</originalsourceid><addsrcrecordid>eNo9jz1PwzAQhr0woMIPYCIrQ4p9ZyfxGFVAkSKxwGydHRssNR9yq6rw63EbxPTePXp1uoexO8HXslGKP1I6xeMaMAMBqsJrptp5TtMpDnTwRRzDlPIUp7EYKMP4sywZF5bGPh6KTxr8_oZdBdrt_e1frtjH89P7Zlt2by-vm7YrSXCBJfVodVDaVxZtU2tLGGTvVOBaA5FwtVDBcSdlTeB7cIIDAtkGcgahccUelrtftDNzyl-mbzNRNNu2M2fGJchK8_ooc_d-6V4k_9tnWXORxV_0hE5s</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Approximate information maximization for bandit games</title><source>arXiv.org</source><creator>Barbier-Chebbah, Alex ; Vestergaard, Christian L ; Masson, Jean-Baptiste ; Boursier, Etienne</creator><creatorcontrib>Barbier-Chebbah, Alex ; Vestergaard, Christian L ; Masson, Jean-Baptiste ; Boursier, Etienne</creatorcontrib><description>Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.</description><identifier>DOI: 10.48550/arxiv.2310.12563</identifier><language>eng</language><subject>Computer Science - Learning ; Machine Learning ; Statistics ; Statistics - Machine Learning</subject><creationdate>2023-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><rights>Attribution</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-5484-9056 ; 0000-0001-5329-475X ; 0000-0002-7575-8575</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.12563$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.12563$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://hal.science/hal-04246907$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Barbier-Chebbah, Alex</creatorcontrib><creatorcontrib>Vestergaard, Christian L</creatorcontrib><creatorcontrib>Masson, Jean-Baptiste</creatorcontrib><creatorcontrib>Boursier, Etienne</creatorcontrib><title>Approximate information maximization for bandit games</title><description>Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.</description><subject>Computer Science - Learning</subject><subject>Machine Learning</subject><subject>Statistics</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo9jz1PwzAQhr0woMIPYCIrQ4p9ZyfxGFVAkSKxwGydHRssNR9yq6rw63EbxPTePXp1uoexO8HXslGKP1I6xeMaMAMBqsJrptp5TtMpDnTwRRzDlPIUp7EYKMP4sywZF5bGPh6KTxr8_oZdBdrt_e1frtjH89P7Zlt2by-vm7YrSXCBJfVodVDaVxZtU2tLGGTvVOBaA5FwtVDBcSdlTeB7cIIDAtkGcgahccUelrtftDNzyl-mbzNRNNu2M2fGJchK8_ooc_d-6V4k_9tnWXORxV_0hE5s</recordid><startdate>20231019</startdate><enddate>20231019</enddate><creator>Barbier-Chebbah, Alex</creator><creator>Vestergaard, Christian L</creator><creator>Masson, Jean-Baptiste</creator><creator>Boursier, Etienne</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-5484-9056</orcidid><orcidid>https://orcid.org/0000-0001-5329-475X</orcidid><orcidid>https://orcid.org/0000-0002-7575-8575</orcidid></search><sort><creationdate>20231019</creationdate><title>Approximate information maximization for bandit games</title><author>Barbier-Chebbah, Alex ; Vestergaard, Christian L ; Masson, Jean-Baptiste ; Boursier, Etienne</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1013-ad3b9f59e6b3b879ba3f4dc5f0992aa1c715fc0c447a2ed2c10232ab82023f193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><topic>Machine Learning</topic><topic>Statistics</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Barbier-Chebbah, Alex</creatorcontrib><creatorcontrib>Vestergaard, Christian L</creatorcontrib><creatorcontrib>Masson, Jean-Baptiste</creatorcontrib><creatorcontrib>Boursier, Etienne</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Barbier-Chebbah, Alex</au><au>Vestergaard, Christian L</au><au>Masson, Jean-Baptiste</au><au>Boursier, Etienne</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Approximate information maximization for bandit games</atitle><date>2023-10-19</date><risdate>2023</risdate><abstract>Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.</abstract><doi>10.48550/arxiv.2310.12563</doi><orcidid>https://orcid.org/0000-0002-5484-9056</orcidid><orcidid>https://orcid.org/0000-0001-5329-475X</orcidid><orcidid>https://orcid.org/0000-0002-7575-8575</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2310.12563
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2310_12563
source	arXiv.org
subjects	Computer Science - Learning Machine Learning Statistics Statistics - Machine Learning
title	Approximate information maximization for bandit games
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T04%3A42%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Approximate%20information%20maximization%20for%20bandit%20games&rft.au=Barbier-Chebbah,%20Alex&rft.date=2023-10-19&rft_id=info:doi/10.48550/arxiv.2310.12563&rft_dat=%3Chal_GOX%3Eoai_HAL_hal_04246907v4%3C/hal_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true