Online Network Revenue Management Using Thompson Sampling

Thompson sampling is a randomized Bayesian machine learning method, whose original motivation was to sequentially evaluate treatments in clinical trials. In recent years, this method has drawn wide attention, as Internet companies have successfully implemented it for online ad display. In “Online ne...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Operations research 2018-11, Vol.66 (6), p.1586-1602
Hauptverfasser:	Ferreira, Kris Johnson, Simchi-Levi, David, Wang, He
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analysis Contextual Areas Demand Demand (Economics) Demand analysis demand learning dynamic pricing Exploitation Exploration Influence Inventory Inventory control Machine learning Management multiarmed bandit Operations research Parameters Prices Pricing Resource management Retailing industry Revenue Revenue management Sampling Social networks Thompson sampling Tradeoffs
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1602
container_issue	6
container_start_page	1586
container_title	Operations research
container_volume	66
creator	Ferreira, Kris Johnson Simchi-Levi, David Wang, He
description	Thompson sampling is a randomized Bayesian machine learning method, whose original motivation was to sequentially evaluate treatments in clinical trials. In recent years, this method has drawn wide attention, as Internet companies have successfully implemented it for online ad display. In “Online network revenue management using Thompson sampling,” K. Ferreira, D. Simchi-Levi, and H. Wang propose using Thompson sampling for a revenue management problem where the demand function is unknown. A main challenge to adopt Thompson sampling for revenue management is that the original method does not incorporate inventory constraints. However, the authors show that Thompson sampling can be naturally combined with a linear program formulation to include inventory constraints. The result is a dynamic pricing algorithm that incorporates domain knowledge and has strong theoretical performance guarantees as well as promising numerical performance results. Interestingly, the authors demonstrate that Thompson sampling achieves poor performance when it does not take into account domain knowledge. Finally, the proposed dynamic pricing algorithm is highly flexible and is applicable in a range of industries, from airlines and internet advertising all the way to online retailing. We consider a price-based network revenue management problem in which a retailer aims to maximize revenue from multiple products with limited inventory over a finite selling season. As is common in practice, we assume the demand function contains unknown parameters that must be learned from sales data. In the presence of these unknown demand parameters, the retailer faces a trade-off commonly referred to as the “exploration-exploitation trade-off.” Toward the beginning of the selling season, the retailer may offer several different prices to try to learn demand at each price (“exploration” objective). Over time, the retailer can use this knowledge to set a price that maximizes revenue throughout the remainder of the selling season (“exploitation” objective). We propose a class of dynamic pricing algorithms that builds on the simple, yet powerful, machine learning technique known as “Thompson sampling” to address the challenge of balancing the exploration-exploitation trade-off under the presence of inventory constraints. Our algorithms have both strong theoretical performance guarantees and promising numerical performance results when compared with other algorithms developed for similar settings. More
doi_str_mv	10.1287/opre.2018.1755
format	Article
fullrecord	<record><control><sourceid>gale_infor</sourceid><recordid>TN_cdi_gale_infotracmisc_A569756835</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A569756835</galeid><jstor_id>48748620</jstor_id><sourcerecordid>A569756835</sourcerecordid><originalsourceid>FETCH-LOGICAL-c671t-30da24131a843cbb4168b3961551ea0106dc001d37ea2e9ce0780d130f69bcca3</originalsourceid><addsrcrecordid>eNqFkV2L1DAUhoMoOK7eeicUBK_seJI0SXu5LH7B6oLugnchTU-7HadJTVI__r0pI64DAxJIIDzPOcl5CXlKYUtZrV75OeCWAa23VAlxj2yoYLIUleT3yQaAQ8ll9eUheRTjDgAaIcWGNFduPzosPmL64cPX4hN-R7dg8cE4M-CELhU3cXRDcX3rpzl6V3w205yV4TF50Jt9xCd_zjNy8-b19cW78vLq7fuL88vSSkVTyaEzrKKcmrritm0rKuuWN5IKQdEABdlZANpxhYZhYxFUDR3l0MumtdbwM_L8UHcO_tuCMemdX4LLLTWjSlFWQcXuqMHsUY-u9ykYO43R6nMhGyVkzUWmyhPUgA6D2XuH_Zivj_jtCT6vDqfRnhReHAmZSfgzDWaJUR-DL_8B2yXPGGPe4jjcpnjgTz3EBh9jwF7PYZxM-KUp6DV9vaav1_T1mn4Wnh2EXUw-_KWrWlW1ZHA3ifVTYYr_q_cbjua3XA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2177124042</pqid></control><display><type>article</type><title>Online Network Revenue Management Using Thompson Sampling</title><source>Jstor Complete Legacy</source><source>Informs</source><source>EBSCO Business Source Complete</source><creator>Ferreira, Kris Johnson ; Simchi-Levi, David ; Wang, He</creator><creatorcontrib>Ferreira, Kris Johnson ; Simchi-Levi, David ; Wang, He</creatorcontrib><description>Thompson sampling is a randomized Bayesian machine learning method, whose original motivation was to sequentially evaluate treatments in clinical trials. In recent years, this method has drawn wide attention, as Internet companies have successfully implemented it for online ad display. In “Online network revenue management using Thompson sampling,” K. Ferreira, D. Simchi-Levi, and H. Wang propose using Thompson sampling for a revenue management problem where the demand function is unknown. A main challenge to adopt Thompson sampling for revenue management is that the original method does not incorporate inventory constraints. However, the authors show that Thompson sampling can be naturally combined with a linear program formulation to include inventory constraints. The result is a dynamic pricing algorithm that incorporates domain knowledge and has strong theoretical performance guarantees as well as promising numerical performance results. Interestingly, the authors demonstrate that Thompson sampling achieves poor performance when it does not take into account domain knowledge. Finally, the proposed dynamic pricing algorithm is highly flexible and is applicable in a range of industries, from airlines and internet advertising all the way to online retailing. We consider a price-based network revenue management problem in which a retailer aims to maximize revenue from multiple products with limited inventory over a finite selling season. As is common in practice, we assume the demand function contains unknown parameters that must be learned from sales data. In the presence of these unknown demand parameters, the retailer faces a trade-off commonly referred to as the “exploration-exploitation trade-off.” Toward the beginning of the selling season, the retailer may offer several different prices to try to learn demand at each price (“exploration” objective). Over time, the retailer can use this knowledge to set a price that maximizes revenue throughout the remainder of the selling season (“exploitation” objective). We propose a class of dynamic pricing algorithms that builds on the simple, yet powerful, machine learning technique known as “Thompson sampling” to address the challenge of balancing the exploration-exploitation trade-off under the presence of inventory constraints. Our algorithms have both strong theoretical performance guarantees and promising numerical performance results when compared with other algorithms developed for similar settings. Moreover, we show how our algorithms can be extended for use in general multiarmed bandit problems with resource constraints as well as in applications in other revenue management settings and beyond. The online appendix is available at https://doi.org/10.1287/opre.2018.1755 .</description><identifier>ISSN: 0030-364X</identifier><identifier>EISSN: 1526-5463</identifier><identifier>DOI: 10.1287/opre.2018.1755</identifier><language>eng</language><publisher>Linthicum: INFORMS</publisher><subject>Algorithms ; Analysis ; Contextual Areas ; Demand ; Demand (Economics) ; Demand analysis ; demand learning ; dynamic pricing ; Exploitation ; Exploration ; Influence ; Inventory ; Inventory control ; Machine learning ; Management ; multiarmed bandit ; Operations research ; Parameters ; Prices ; Pricing ; Resource management ; Retailing industry ; Revenue ; Revenue management ; Sampling ; Social networks ; Thompson sampling ; Tradeoffs</subject><ispartof>Operations research, 2018-11, Vol.66 (6), p.1586-1602</ispartof><rights>2018 INFORMS</rights><rights>COPYRIGHT 2018 Institute for Operations Research and the Management Sciences</rights><rights>Copyright Institute for Operations Research and the Management Sciences Nov/Dec 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c671t-30da24131a843cbb4168b3961551ea0106dc001d37ea2e9ce0780d130f69bcca3</citedby><cites>FETCH-LOGICAL-c671t-30da24131a843cbb4168b3961551ea0106dc001d37ea2e9ce0780d130f69bcca3</cites><orcidid>0000-0002-7089-9387 ; 0000-0001-7444-2053 ; 0000-0002-4650-1519</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/48748620$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://pubsonline.informs.org/doi/full/10.1287/opre.2018.1755$$EHTML$$P50$$Ginforms$$H</linktohtml><link.rule.ids>314,776,780,799,3679,27901,27902,57992,58225,62589</link.rule.ids></links><search><creatorcontrib>Ferreira, Kris Johnson</creatorcontrib><creatorcontrib>Simchi-Levi, David</creatorcontrib><creatorcontrib>Wang, He</creatorcontrib><title>Online Network Revenue Management Using Thompson Sampling</title><title>Operations research</title><description>Thompson sampling is a randomized Bayesian machine learning method, whose original motivation was to sequentially evaluate treatments in clinical trials. In recent years, this method has drawn wide attention, as Internet companies have successfully implemented it for online ad display. In “Online network revenue management using Thompson sampling,” K. Ferreira, D. Simchi-Levi, and H. Wang propose using Thompson sampling for a revenue management problem where the demand function is unknown. A main challenge to adopt Thompson sampling for revenue management is that the original method does not incorporate inventory constraints. However, the authors show that Thompson sampling can be naturally combined with a linear program formulation to include inventory constraints. The result is a dynamic pricing algorithm that incorporates domain knowledge and has strong theoretical performance guarantees as well as promising numerical performance results. Interestingly, the authors demonstrate that Thompson sampling achieves poor performance when it does not take into account domain knowledge. Finally, the proposed dynamic pricing algorithm is highly flexible and is applicable in a range of industries, from airlines and internet advertising all the way to online retailing. We consider a price-based network revenue management problem in which a retailer aims to maximize revenue from multiple products with limited inventory over a finite selling season. As is common in practice, we assume the demand function contains unknown parameters that must be learned from sales data. In the presence of these unknown demand parameters, the retailer faces a trade-off commonly referred to as the “exploration-exploitation trade-off.” Toward the beginning of the selling season, the retailer may offer several different prices to try to learn demand at each price (“exploration” objective). Over time, the retailer can use this knowledge to set a price that maximizes revenue throughout the remainder of the selling season (“exploitation” objective). We propose a class of dynamic pricing algorithms that builds on the simple, yet powerful, machine learning technique known as “Thompson sampling” to address the challenge of balancing the exploration-exploitation trade-off under the presence of inventory constraints. Our algorithms have both strong theoretical performance guarantees and promising numerical performance results when compared with other algorithms developed for similar settings. Moreover, we show how our algorithms can be extended for use in general multiarmed bandit problems with resource constraints as well as in applications in other revenue management settings and beyond. The online appendix is available at https://doi.org/10.1287/opre.2018.1755 .</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Contextual Areas</subject><subject>Demand</subject><subject>Demand (Economics)</subject><subject>Demand analysis</subject><subject>demand learning</subject><subject>dynamic pricing</subject><subject>Exploitation</subject><subject>Exploration</subject><subject>Influence</subject><subject>Inventory</subject><subject>Inventory control</subject><subject>Machine learning</subject><subject>Management</subject><subject>multiarmed bandit</subject><subject>Operations research</subject><subject>Parameters</subject><subject>Prices</subject><subject>Pricing</subject><subject>Resource management</subject><subject>Retailing industry</subject><subject>Revenue</subject><subject>Revenue management</subject><subject>Sampling</subject><subject>Social networks</subject><subject>Thompson sampling</subject><subject>Tradeoffs</subject><issn>0030-364X</issn><issn>1526-5463</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>N95</sourceid><recordid>eNqFkV2L1DAUhoMoOK7eeicUBK_seJI0SXu5LH7B6oLugnchTU-7HadJTVI__r0pI64DAxJIIDzPOcl5CXlKYUtZrV75OeCWAa23VAlxj2yoYLIUleT3yQaAQ8ll9eUheRTjDgAaIcWGNFduPzosPmL64cPX4hN-R7dg8cE4M-CELhU3cXRDcX3rpzl6V3w205yV4TF50Jt9xCd_zjNy8-b19cW78vLq7fuL88vSSkVTyaEzrKKcmrritm0rKuuWN5IKQdEABdlZANpxhYZhYxFUDR3l0MumtdbwM_L8UHcO_tuCMemdX4LLLTWjSlFWQcXuqMHsUY-u9ykYO43R6nMhGyVkzUWmyhPUgA6D2XuH_Zivj_jtCT6vDqfRnhReHAmZSfgzDWaJUR-DL_8B2yXPGGPe4jjcpnjgTz3EBh9jwF7PYZxM-KUp6DV9vaav1_T1mn4Wnh2EXUw-_KWrWlW1ZHA3ifVTYYr_q_cbjua3XA</recordid><startdate>20181101</startdate><enddate>20181101</enddate><creator>Ferreira, Kris Johnson</creator><creator>Simchi-Levi, David</creator><creator>Wang, He</creator><general>INFORMS</general><general>Institute for Operations Research and the Management Sciences</general><scope>AAYXX</scope><scope>CITATION</scope><scope>N95</scope><scope>XI7</scope><scope>JQ2</scope><scope>K9.</scope><orcidid>https://orcid.org/0000-0002-7089-9387</orcidid><orcidid>https://orcid.org/0000-0001-7444-2053</orcidid><orcidid>https://orcid.org/0000-0002-4650-1519</orcidid></search><sort><creationdate>20181101</creationdate><title>Online Network Revenue Management Using Thompson Sampling</title><author>Ferreira, Kris Johnson ; Simchi-Levi, David ; Wang, He</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c671t-30da24131a843cbb4168b3961551ea0106dc001d37ea2e9ce0780d130f69bcca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Contextual Areas</topic><topic>Demand</topic><topic>Demand (Economics)</topic><topic>Demand analysis</topic><topic>demand learning</topic><topic>dynamic pricing</topic><topic>Exploitation</topic><topic>Exploration</topic><topic>Influence</topic><topic>Inventory</topic><topic>Inventory control</topic><topic>Machine learning</topic><topic>Management</topic><topic>multiarmed bandit</topic><topic>Operations research</topic><topic>Parameters</topic><topic>Prices</topic><topic>Pricing</topic><topic>Resource management</topic><topic>Retailing industry</topic><topic>Revenue</topic><topic>Revenue management</topic><topic>Sampling</topic><topic>Social networks</topic><topic>Thompson sampling</topic><topic>Tradeoffs</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ferreira, Kris Johnson</creatorcontrib><creatorcontrib>Simchi-Levi, David</creatorcontrib><creatorcontrib>Wang, He</creatorcontrib><collection>CrossRef</collection><collection>Gale Business: Insights</collection><collection>Business Insights: Essentials</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><jtitle>Operations research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ferreira, Kris Johnson</au><au>Simchi-Levi, David</au><au>Wang, He</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Online Network Revenue Management Using Thompson Sampling</atitle><jtitle>Operations research</jtitle><date>2018-11-01</date><risdate>2018</risdate><volume>66</volume><issue>6</issue><spage>1586</spage><epage>1602</epage><pages>1586-1602</pages><issn>0030-364X</issn><eissn>1526-5463</eissn><abstract>Thompson sampling is a randomized Bayesian machine learning method, whose original motivation was to sequentially evaluate treatments in clinical trials. In recent years, this method has drawn wide attention, as Internet companies have successfully implemented it for online ad display. In “Online network revenue management using Thompson sampling,” K. Ferreira, D. Simchi-Levi, and H. Wang propose using Thompson sampling for a revenue management problem where the demand function is unknown. A main challenge to adopt Thompson sampling for revenue management is that the original method does not incorporate inventory constraints. However, the authors show that Thompson sampling can be naturally combined with a linear program formulation to include inventory constraints. The result is a dynamic pricing algorithm that incorporates domain knowledge and has strong theoretical performance guarantees as well as promising numerical performance results. Interestingly, the authors demonstrate that Thompson sampling achieves poor performance when it does not take into account domain knowledge. Finally, the proposed dynamic pricing algorithm is highly flexible and is applicable in a range of industries, from airlines and internet advertising all the way to online retailing. We consider a price-based network revenue management problem in which a retailer aims to maximize revenue from multiple products with limited inventory over a finite selling season. As is common in practice, we assume the demand function contains unknown parameters that must be learned from sales data. In the presence of these unknown demand parameters, the retailer faces a trade-off commonly referred to as the “exploration-exploitation trade-off.” Toward the beginning of the selling season, the retailer may offer several different prices to try to learn demand at each price (“exploration” objective). Over time, the retailer can use this knowledge to set a price that maximizes revenue throughout the remainder of the selling season (“exploitation” objective). We propose a class of dynamic pricing algorithms that builds on the simple, yet powerful, machine learning technique known as “Thompson sampling” to address the challenge of balancing the exploration-exploitation trade-off under the presence of inventory constraints. Our algorithms have both strong theoretical performance guarantees and promising numerical performance results when compared with other algorithms developed for similar settings. Moreover, we show how our algorithms can be extended for use in general multiarmed bandit problems with resource constraints as well as in applications in other revenue management settings and beyond. The online appendix is available at https://doi.org/10.1287/opre.2018.1755 .</abstract><cop>Linthicum</cop><pub>INFORMS</pub><doi>10.1287/opre.2018.1755</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-7089-9387</orcidid><orcidid>https://orcid.org/0000-0001-7444-2053</orcidid><orcidid>https://orcid.org/0000-0002-4650-1519</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0030-364X
ispartof	Operations research, 2018-11, Vol.66 (6), p.1586-1602
issn	0030-364X 1526-5463
language	eng
recordid	cdi_gale_infotracmisc_A569756835
source	Jstor Complete Legacy; Informs; EBSCO Business Source Complete
subjects	Algorithms Analysis Contextual Areas Demand Demand (Economics) Demand analysis demand learning dynamic pricing Exploitation Exploration Influence Inventory Inventory control Machine learning Management multiarmed bandit Operations research Parameters Prices Pricing Resource management Retailing industry Revenue Revenue management Sampling Social networks Thompson sampling Tradeoffs
title	Online Network Revenue Management Using Thompson Sampling
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T20%3A11%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_infor&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Online%20Network%20Revenue%20Management%20Using%20Thompson%20Sampling&rft.jtitle=Operations%20research&rft.au=Ferreira,%20Kris%20Johnson&rft.date=2018-11-01&rft.volume=66&rft.issue=6&rft.spage=1586&rft.epage=1602&rft.pages=1586-1602&rft.issn=0030-364X&rft.eissn=1526-5463&rft_id=info:doi/10.1287/opre.2018.1755&rft_dat=%3Cgale_infor%3EA569756835%3C/gale_infor%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2177124042&rft_id=info:pmid/&rft_galeid=A569756835&rft_jstor_id=48748620&rfr_iscdi=true