Online Learning and Pricing for Service Systems with Reusable Resources

We consider a price-based revenue management problem with finite reusable resources over a finite time horizon T. Customers arrive following a price-dependent Poisson process, and each customer requests one unit of c homogeneous reusable resources. If there is an available unit, the customer gets se...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Operations research 2022-11, Vol.72 (3)
Hauptverfasser:	Jia, Huiwen, Shi, Cong, Shen, Siqian
Format:	Artikel
Sprache:	eng
Schlagworte:	coupling analysis learning MATHEMATICS AND COMPUTING multiarmed bandit pricing reusable resources service systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	3
container_start_page
container_title	Operations research
container_volume	72
creator	Jia, Huiwen Shi, Cong Shen, Siqian
description	We consider a price-based revenue management problem with finite reusable resources over a finite time horizon T. Customers arrive following a price-dependent Poisson process, and each customer requests one unit of c homogeneous reusable resources. If there is an available unit, the customer gets served within a price-dependent exponentially distributed service time; otherwise, the customer waits in a queue until the next available unit. In this paper, we assume that the firm does not know how the arrival and service rates depend on posted prices, and thus it makes adaptive pricing decisions in each period based only on past observations to maximize the cumulative revenue. Given a discrete price set with cardinality P, we propose two online learning algorithms, termed batch upper confidence bound (BUCB) and batch Thompson sampling (BTS), and prove that the cumulative regret upper bound is O˜(√PT) , which matches the regret lower bound. In establishing the regret, we bound the transient system performance upon price changes via a novel coupling argument, and also generalize bandits to accommodate subexponential rewards. Here, we also extend our approach to models with balking and reneging customers and discuss a continuous price setting. Our numerical experiments demonstrate the efficacy of the proposed BUCB and BTS algorithms.
format	Article
fullrecord	<record><control><sourceid>osti</sourceid><recordid>TN_cdi_osti_scitechconnect_1995050</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1995050</sourcerecordid><originalsourceid>FETCH-osti_scitechconnect_19950503</originalsourceid><addsrcrecordid>eNqNyrsKwjAUgOEgCtbLOwT3wqlpAp3FyyAo1sGtxHhqIzWBnFTx7VXwAZz-b_h7LMnkXKUyV6LPEgABqVD5achGRDcAKKSSCVvvXGsd8i3q4Ky7cu0ufB-s-br2gZcYHtYgL18U8U78aWPDD9iRPrf4AfkuGKQJG9S6JZz-Omaz1fK42KSeoq3I2IimMd45NLHKikKCBPHX9AaHfjzp</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Online Learning and Pricing for Service Systems with Reusable Resources</title><source>INFORMS PubsOnLine</source><creator>Jia, Huiwen ; Shi, Cong ; Shen, Siqian</creator><creatorcontrib>Jia, Huiwen ; Shi, Cong ; Shen, Siqian ; Univ. of Michigan, Ann Arbor, MI (United States)</creatorcontrib><description>We consider a price-based revenue management problem with finite reusable resources over a finite time horizon T. Customers arrive following a price-dependent Poisson process, and each customer requests one unit of c homogeneous reusable resources. If there is an available unit, the customer gets served within a price-dependent exponentially distributed service time; otherwise, the customer waits in a queue until the next available unit. In this paper, we assume that the firm does not know how the arrival and service rates depend on posted prices, and thus it makes adaptive pricing decisions in each period based only on past observations to maximize the cumulative revenue. Given a discrete price set with cardinality P, we propose two online learning algorithms, termed batch upper confidence bound (BUCB) and batch Thompson sampling (BTS), and prove that the cumulative regret upper bound is O˜(√PT) , which matches the regret lower bound. In establishing the regret, we bound the transient system performance upon price changes via a novel coupling argument, and also generalize bandits to accommodate subexponential rewards. Here, we also extend our approach to models with balking and reneging customers and discuss a continuous price setting. Our numerical experiments demonstrate the efficacy of the proposed BUCB and BTS algorithms.</description><identifier>ISSN: 0030-364X</identifier><identifier>EISSN: 1526-5463</identifier><language>eng</language><publisher>United States: Institute for Operations Research and the Management Sciences (INFORMS)</publisher><subject>coupling analysis ; learning ; MATHEMATICS AND COMPUTING ; multiarmed bandit ; pricing ; reusable resources ; service systems</subject><ispartof>Operations research, 2022-11, Vol.72 (3)</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000000226339278 ; 000000022854163X ; 0000000335643391</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,776,780,881</link.rule.ids><backlink>$$Uhttps://www.osti.gov/servlets/purl/1995050$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Jia, Huiwen</creatorcontrib><creatorcontrib>Shi, Cong</creatorcontrib><creatorcontrib>Shen, Siqian</creatorcontrib><creatorcontrib>Univ. of Michigan, Ann Arbor, MI (United States)</creatorcontrib><title>Online Learning and Pricing for Service Systems with Reusable Resources</title><title>Operations research</title><description>We consider a price-based revenue management problem with finite reusable resources over a finite time horizon T. Customers arrive following a price-dependent Poisson process, and each customer requests one unit of c homogeneous reusable resources. If there is an available unit, the customer gets served within a price-dependent exponentially distributed service time; otherwise, the customer waits in a queue until the next available unit. In this paper, we assume that the firm does not know how the arrival and service rates depend on posted prices, and thus it makes adaptive pricing decisions in each period based only on past observations to maximize the cumulative revenue. Given a discrete price set with cardinality P, we propose two online learning algorithms, termed batch upper confidence bound (BUCB) and batch Thompson sampling (BTS), and prove that the cumulative regret upper bound is O˜(√PT) , which matches the regret lower bound. In establishing the regret, we bound the transient system performance upon price changes via a novel coupling argument, and also generalize bandits to accommodate subexponential rewards. Here, we also extend our approach to models with balking and reneging customers and discuss a continuous price setting. Our numerical experiments demonstrate the efficacy of the proposed BUCB and BTS algorithms.</description><subject>coupling analysis</subject><subject>learning</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>multiarmed bandit</subject><subject>pricing</subject><subject>reusable resources</subject><subject>service systems</subject><issn>0030-364X</issn><issn>1526-5463</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNqNyrsKwjAUgOEgCtbLOwT3wqlpAp3FyyAo1sGtxHhqIzWBnFTx7VXwAZz-b_h7LMnkXKUyV6LPEgABqVD5achGRDcAKKSSCVvvXGsd8i3q4Ky7cu0ufB-s-br2gZcYHtYgL18U8U78aWPDD9iRPrf4AfkuGKQJG9S6JZz-Omaz1fK42KSeoq3I2IimMd45NLHKikKCBPHX9AaHfjzp</recordid><startdate>20221110</startdate><enddate>20221110</enddate><creator>Jia, Huiwen</creator><creator>Shi, Cong</creator><creator>Shen, Siqian</creator><general>Institute for Operations Research and the Management Sciences (INFORMS)</general><scope>OIOZB</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000000226339278</orcidid><orcidid>https://orcid.org/000000022854163X</orcidid><orcidid>https://orcid.org/0000000335643391</orcidid></search><sort><creationdate>20221110</creationdate><title>Online Learning and Pricing for Service Systems with Reusable Resources</title><author>Jia, Huiwen ; Shi, Cong ; Shen, Siqian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-osti_scitechconnect_19950503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>coupling analysis</topic><topic>learning</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>multiarmed bandit</topic><topic>pricing</topic><topic>reusable resources</topic><topic>service systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jia, Huiwen</creatorcontrib><creatorcontrib>Shi, Cong</creatorcontrib><creatorcontrib>Shen, Siqian</creatorcontrib><creatorcontrib>Univ. of Michigan, Ann Arbor, MI (United States)</creatorcontrib><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>Operations research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jia, Huiwen</au><au>Shi, Cong</au><au>Shen, Siqian</au><aucorp>Univ. of Michigan, Ann Arbor, MI (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Online Learning and Pricing for Service Systems with Reusable Resources</atitle><jtitle>Operations research</jtitle><date>2022-11-10</date><risdate>2022</risdate><volume>72</volume><issue>3</issue><issn>0030-364X</issn><eissn>1526-5463</eissn><abstract>We consider a price-based revenue management problem with finite reusable resources over a finite time horizon T. Customers arrive following a price-dependent Poisson process, and each customer requests one unit of c homogeneous reusable resources. If there is an available unit, the customer gets served within a price-dependent exponentially distributed service time; otherwise, the customer waits in a queue until the next available unit. In this paper, we assume that the firm does not know how the arrival and service rates depend on posted prices, and thus it makes adaptive pricing decisions in each period based only on past observations to maximize the cumulative revenue. Given a discrete price set with cardinality P, we propose two online learning algorithms, termed batch upper confidence bound (BUCB) and batch Thompson sampling (BTS), and prove that the cumulative regret upper bound is O˜(√PT) , which matches the regret lower bound. In establishing the regret, we bound the transient system performance upon price changes via a novel coupling argument, and also generalize bandits to accommodate subexponential rewards. Here, we also extend our approach to models with balking and reneging customers and discuss a continuous price setting. Our numerical experiments demonstrate the efficacy of the proposed BUCB and BTS algorithms.</abstract><cop>United States</cop><pub>Institute for Operations Research and the Management Sciences (INFORMS)</pub><orcidid>https://orcid.org/0000000226339278</orcidid><orcidid>https://orcid.org/000000022854163X</orcidid><orcidid>https://orcid.org/0000000335643391</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0030-364X
ispartof	Operations research, 2022-11, Vol.72 (3)
issn	0030-364X 1526-5463
language	eng
recordid	cdi_osti_scitechconnect_1995050
source	INFORMS PubsOnLine
subjects	coupling analysis learning MATHEMATICS AND COMPUTING multiarmed bandit pricing reusable resources service systems
title	Online Learning and Pricing for Service Systems with Reusable Resources
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T00%3A22%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-osti&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Online%20Learning%20and%20Pricing%20for%20Service%20Systems%20with%20Reusable%20Resources&rft.jtitle=Operations%20research&rft.au=Jia,%20Huiwen&rft.aucorp=Univ.%20of%20Michigan,%20Ann%20Arbor,%20MI%20(United%20States)&rft.date=2022-11-10&rft.volume=72&rft.issue=3&rft.issn=0030-364X&rft.eissn=1526-5463&rft_id=info:doi/&rft_dat=%3Costi%3E1995050%3C/osti%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true