Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm

We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods with different a priori unknown efficiencies. One should determine the most efficient method and provide its predominant application. To this end, we use the m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Automation and remote control 2022-08, Vol.83 (8), p.1288-1307
Hauptverfasser:	Kolnogorov, A. V., Nazin, A. V., Shiyan, D. N.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms CAE) and Design Calculus of Variations and Optimal Control Optimization Computer-Aided Engineering (CAD Control Data processing Mathematical Game Theory and Applications Mathematics Mathematics and Statistics Mechanical Engineering Mechatronics Minimax technique Packets (communication) Parallel processing Risk Robotics Systems Theory
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1307
container_issue	8
container_start_page	1288
container_title	Automation and remote control
container_volume	83
creator	Kolnogorov, A. V. Nazin, A. V. Shiyan, D. N.
description	We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods with different a priori unknown efficiencies. One should determine the most efficient method and provide its predominant application. To this end, we use the mirror descent algorithm (MDA). It is well known that the corresponding minimax risk has the order of , where is the amount of processed data, and this bound is order sharp. We propose a batch version of the MDA which allows processing data by packets; this is especially important if parallel data processing can be provided. In this case, the processing time is determined by the number of batches rather than the total amount of data. Unexpectedly, it has turned out that the batch version behaves unlike the ordinary one even if the number of packets is large. Moreover, the batch version provides a considerably lower minimax risk; i.e., it substantially improves the control performance. We explain this result by considering another batch modification of the MDA whose behavior is close to the behavior of the ordinary version and the minimax risk is close as well. Our estimates use invariant descriptions of the algorithms based on Gaussian approximations of income in the batches of data in the domain of “close” distributions and are obtained by Monte-Carlo simulation.
doi_str_mv	10.1134/S0005117922080100
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2715131766</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2715131766</sourcerecordid><originalsourceid>FETCH-LOGICAL-c198t-1d422fc8fdb0a25f26c035f7a08492a4e8f10765242ffd0f75ff04d9a78ec2cd3</originalsourceid><addsrcrecordid>eNp1kM1LAzEQxYMoWKt_gLeA59WZZHeTPdZq_aCiYPW6pPlot3Q3NUkR_3u3VPAgnoaZ93tv4BFyjnCJyPOrVwAoEEXFGEhAgAMywBJkxoGzQzLYydlOPyYnMa4AEIHxAXmcffpsFFpr6LXqTJPoS_DztW1pv_WnpJf03YbY-I56R9PS0qcmBB_ojY3adomO1gsfmrRsT8mRU-toz37mkLxNbmfj-2z6fPcwHk0zjZVMGZqcMaelM3NQrHCs1MALJxTIvGIqt9IhiLJgOXPOgBOFc5CbSglpNdOGD8nFPncT_MfWxlSv_DZ0_cuaCSyQoyjLnsI9pYOPMVhXb0LTqvBVI9S7yuo_lfUetvfEnu0WNvwm_2_6BjJaa7w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2715131766</pqid></control><display><type>article</type><title>Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm</title><source>Springer Nature - Complete Springer Journals</source><creator>Kolnogorov, A. V. ; Nazin, A. V. ; Shiyan, D. N.</creator><creatorcontrib>Kolnogorov, A. V. ; Nazin, A. V. ; Shiyan, D. N.</creatorcontrib><description>We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods with different a priori unknown efficiencies. One should determine the most efficient method and provide its predominant application. To this end, we use the mirror descent algorithm (MDA). It is well known that the corresponding minimax risk has the order of , where is the amount of processed data, and this bound is order sharp. We propose a batch version of the MDA which allows processing data by packets; this is especially important if parallel data processing can be provided. In this case, the processing time is determined by the number of batches rather than the total amount of data. Unexpectedly, it has turned out that the batch version behaves unlike the ordinary one even if the number of packets is large. Moreover, the batch version provides a considerably lower minimax risk; i.e., it substantially improves the control performance. We explain this result by considering another batch modification of the MDA whose behavior is close to the behavior of the ordinary version and the minimax risk is close as well. Our estimates use invariant descriptions of the algorithms based on Gaussian approximations of income in the batches of data in the domain of “close” distributions and are obtained by Monte-Carlo simulation.</description><identifier>ISSN: 0005-1179</identifier><identifier>EISSN: 1608-3032</identifier><identifier>DOI: 10.1134/S0005117922080100</identifier><language>eng</language><publisher>Moscow: Pleiades Publishing</publisher><subject>Algorithms ; CAE) and Design ; Calculus of Variations and Optimal Control; Optimization ; Computer-Aided Engineering (CAD ; Control ; Data processing ; Mathematical Game Theory and Applications ; Mathematics ; Mathematics and Statistics ; Mechanical Engineering ; Mechatronics ; Minimax technique ; Packets (communication) ; Parallel processing ; Risk ; Robotics ; Systems Theory</subject><ispartof>Automation and remote control, 2022-08, Vol.83 (8), p.1288-1307</ispartof><rights>Pleiades Publishing, Ltd. 2022</rights><rights>Pleiades Publishing, Ltd. 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c198t-1d422fc8fdb0a25f26c035f7a08492a4e8f10765242ffd0f75ff04d9a78ec2cd3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1134/S0005117922080100$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1134/S0005117922080100$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Kolnogorov, A. V.</creatorcontrib><creatorcontrib>Nazin, A. V.</creatorcontrib><creatorcontrib>Shiyan, D. N.</creatorcontrib><title>Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm</title><title>Automation and remote control</title><addtitle>Autom Remote Control</addtitle><description>We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods with different a priori unknown efficiencies. One should determine the most efficient method and provide its predominant application. To this end, we use the mirror descent algorithm (MDA). It is well known that the corresponding minimax risk has the order of , where is the amount of processed data, and this bound is order sharp. We propose a batch version of the MDA which allows processing data by packets; this is especially important if parallel data processing can be provided. In this case, the processing time is determined by the number of batches rather than the total amount of data. Unexpectedly, it has turned out that the batch version behaves unlike the ordinary one even if the number of packets is large. Moreover, the batch version provides a considerably lower minimax risk; i.e., it substantially improves the control performance. We explain this result by considering another batch modification of the MDA whose behavior is close to the behavior of the ordinary version and the minimax risk is close as well. Our estimates use invariant descriptions of the algorithms based on Gaussian approximations of income in the batches of data in the domain of “close” distributions and are obtained by Monte-Carlo simulation.</description><subject>Algorithms</subject><subject>CAE) and Design</subject><subject>Calculus of Variations and Optimal Control; Optimization</subject><subject>Computer-Aided Engineering (CAD</subject><subject>Control</subject><subject>Data processing</subject><subject>Mathematical Game Theory and Applications</subject><subject>Mathematics</subject><subject>Mathematics and Statistics</subject><subject>Mechanical Engineering</subject><subject>Mechatronics</subject><subject>Minimax technique</subject><subject>Packets (communication)</subject><subject>Parallel processing</subject><subject>Risk</subject><subject>Robotics</subject><subject>Systems Theory</subject><issn>0005-1179</issn><issn>1608-3032</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp1kM1LAzEQxYMoWKt_gLeA59WZZHeTPdZq_aCiYPW6pPlot3Q3NUkR_3u3VPAgnoaZ93tv4BFyjnCJyPOrVwAoEEXFGEhAgAMywBJkxoGzQzLYydlOPyYnMa4AEIHxAXmcffpsFFpr6LXqTJPoS_DztW1pv_WnpJf03YbY-I56R9PS0qcmBB_ojY3adomO1gsfmrRsT8mRU-toz37mkLxNbmfj-2z6fPcwHk0zjZVMGZqcMaelM3NQrHCs1MALJxTIvGIqt9IhiLJgOXPOgBOFc5CbSglpNdOGD8nFPncT_MfWxlSv_DZ0_cuaCSyQoyjLnsI9pYOPMVhXb0LTqvBVI9S7yuo_lfUetvfEnu0WNvwm_2_6BjJaa7w</recordid><startdate>20220801</startdate><enddate>20220801</enddate><creator>Kolnogorov, A. V.</creator><creator>Nazin, A. V.</creator><creator>Shiyan, D. N.</creator><general>Pleiades Publishing</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20220801</creationdate><title>Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm</title><author>Kolnogorov, A. V. ; Nazin, A. V. ; Shiyan, D. N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c198t-1d422fc8fdb0a25f26c035f7a08492a4e8f10765242ffd0f75ff04d9a78ec2cd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>CAE) and Design</topic><topic>Calculus of Variations and Optimal Control; Optimization</topic><topic>Computer-Aided Engineering (CAD</topic><topic>Control</topic><topic>Data processing</topic><topic>Mathematical Game Theory and Applications</topic><topic>Mathematics</topic><topic>Mathematics and Statistics</topic><topic>Mechanical Engineering</topic><topic>Mechatronics</topic><topic>Minimax technique</topic><topic>Packets (communication)</topic><topic>Parallel processing</topic><topic>Risk</topic><topic>Robotics</topic><topic>Systems Theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kolnogorov, A. V.</creatorcontrib><creatorcontrib>Nazin, A. V.</creatorcontrib><creatorcontrib>Shiyan, D. N.</creatorcontrib><collection>CrossRef</collection><jtitle>Automation and remote control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kolnogorov, A. V.</au><au>Nazin, A. V.</au><au>Shiyan, D. N.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm</atitle><jtitle>Automation and remote control</jtitle><stitle>Autom Remote Control</stitle><date>2022-08-01</date><risdate>2022</risdate><volume>83</volume><issue>8</issue><spage>1288</spage><epage>1307</epage><pages>1288-1307</pages><issn>0005-1179</issn><eissn>1608-3032</eissn><abstract>We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods with different a priori unknown efficiencies. One should determine the most efficient method and provide its predominant application. To this end, we use the mirror descent algorithm (MDA). It is well known that the corresponding minimax risk has the order of , where is the amount of processed data, and this bound is order sharp. We propose a batch version of the MDA which allows processing data by packets; this is especially important if parallel data processing can be provided. In this case, the processing time is determined by the number of batches rather than the total amount of data. Unexpectedly, it has turned out that the batch version behaves unlike the ordinary one even if the number of packets is large. Moreover, the batch version provides a considerably lower minimax risk; i.e., it substantially improves the control performance. We explain this result by considering another batch modification of the MDA whose behavior is close to the behavior of the ordinary version and the minimax risk is close as well. Our estimates use invariant descriptions of the algorithms based on Gaussian approximations of income in the batches of data in the domain of “close” distributions and are obtained by Monte-Carlo simulation.</abstract><cop>Moscow</cop><pub>Pleiades Publishing</pub><doi>10.1134/S0005117922080100</doi><tpages>20</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0005-1179
ispartof	Automation and remote control, 2022-08, Vol.83 (8), p.1288-1307
issn	0005-1179 1608-3032
language	eng
recordid	cdi_proquest_journals_2715131766
source	Springer Nature - Complete Springer Journals
subjects	Algorithms CAE) and Design Calculus of Variations and Optimal Control Optimization Computer-Aided Engineering (CAD Control Data processing Mathematical Game Theory and Applications Mathematics Mathematics and Statistics Mechanical Engineering Mechatronics Minimax technique Packets (communication) Parallel processing Risk Robotics Systems Theory
title	Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T22%3A36%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Two-Armed%20Bandit%20Problem%20and%20Batch%20Version%20of%20the%20Mirror%20Descent%20Algorithm&rft.jtitle=Automation%20and%20remote%20control&rft.au=Kolnogorov,%20A.%20V.&rft.date=2022-08-01&rft.volume=83&rft.issue=8&rft.spage=1288&rft.epage=1307&rft.pages=1288-1307&rft.issn=0005-1179&rft.eissn=1608-3032&rft_id=info:doi/10.1134/S0005117922080100&rft_dat=%3Cproquest_cross%3E2715131766%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2715131766&rft_id=info:pmid/&rfr_iscdi=true