Online Matching: A Real-time Bandit System for Large-scale Recommendations

The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-07
Hauptverfasser:	Yi, Xinyang, Shao-Chuan, Wang, He, Ruining, Chandrasekaran, Hariharan, Wu, Charles, Heldt, Lukasz, Hong, Lichan, Chen, Minmin, Chi, Ed H
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Closed loops Computer architecture Deep learning Distance learning Machine learning Matching Mathematical models Multi-armed bandit problems Parameters Real time Recommender systems User experience
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Yi, Xinyang Shao-Chuan, Wang He, Ruining Chandrasekaran, Hariharan Wu, Charles Heldt, Lukasz Hong, Lichan Chen, Minmin Chi, Ed H
description	The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce Online Matching: a scalable closed-loop bandit system learning from users' direct feedback on items in real time. We present a hybrid "offline + online" approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB -- a novel extension of the LinUCB algorithm -- to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2844447900</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2844447900</sourcerecordid><originalsourceid>FETCH-proquest_journals_28444479003</originalsourceid><addsrcrecordid>eNqNir0KwjAYAIMgWLTvEHAOxKS11U1FEVEEdS8h_VpT8qNJOvj2dvABvOWGuxFKGOcLUmaMTVAaQkcpZcuC5TlP0OlqtbKALyLKp7LtGm_wDYQmURnAW2FrFfH9EyIY3DiPz8K3QIIUGoZPOmPA1iIqZ8MMjRuhA6Q_T9H8sH_sjuTl3buHEKvO9d4OqWJlNlCsKOX_XV9G8jv0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2844447900</pqid></control><display><type>article</type><title>Online Matching: A Real-time Bandit System for Large-scale Recommendations</title><source>Free E- Journals</source><creator>Yi, Xinyang ; Shao-Chuan, Wang ; He, Ruining ; Chandrasekaran, Hariharan ; Wu, Charles ; Heldt, Lukasz ; Hong, Lichan ; Chen, Minmin ; Chi, Ed H</creator><creatorcontrib>Yi, Xinyang ; Shao-Chuan, Wang ; He, Ruining ; Chandrasekaran, Hariharan ; Wu, Charles ; Heldt, Lukasz ; Hong, Lichan ; Chen, Minmin ; Chi, Ed H</creatorcontrib><description>The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce Online Matching: a scalable closed-loop bandit system learning from users' direct feedback on items in real time. We present a hybrid "offline + online" approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB -- a novel extension of the LinUCB algorithm -- to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Closed loops ; Computer architecture ; Deep learning ; Distance learning ; Machine learning ; Matching ; Mathematical models ; Multi-armed bandit problems ; Parameters ; Real time ; Recommender systems ; User experience</subject><ispartof>arXiv.org, 2023-07</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Yi, Xinyang</creatorcontrib><creatorcontrib>Shao-Chuan, Wang</creatorcontrib><creatorcontrib>He, Ruining</creatorcontrib><creatorcontrib>Chandrasekaran, Hariharan</creatorcontrib><creatorcontrib>Wu, Charles</creatorcontrib><creatorcontrib>Heldt, Lukasz</creatorcontrib><creatorcontrib>Hong, Lichan</creatorcontrib><creatorcontrib>Chen, Minmin</creatorcontrib><creatorcontrib>Chi, Ed H</creatorcontrib><title>Online Matching: A Real-time Bandit System for Large-scale Recommendations</title><title>arXiv.org</title><description>The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce Online Matching: a scalable closed-loop bandit system learning from users' direct feedback on items in real time. We present a hybrid "offline + online" approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB -- a novel extension of the LinUCB algorithm -- to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.</description><subject>Algorithms</subject><subject>Closed loops</subject><subject>Computer architecture</subject><subject>Deep learning</subject><subject>Distance learning</subject><subject>Machine learning</subject><subject>Matching</subject><subject>Mathematical models</subject><subject>Multi-armed bandit problems</subject><subject>Parameters</subject><subject>Real time</subject><subject>Recommender systems</subject><subject>User experience</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNir0KwjAYAIMgWLTvEHAOxKS11U1FEVEEdS8h_VpT8qNJOvj2dvABvOWGuxFKGOcLUmaMTVAaQkcpZcuC5TlP0OlqtbKALyLKp7LtGm_wDYQmURnAW2FrFfH9EyIY3DiPz8K3QIIUGoZPOmPA1iIqZ8MMjRuhA6Q_T9H8sH_sjuTl3buHEKvO9d4OqWJlNlCsKOX_XV9G8jv0</recordid><startdate>20230729</startdate><enddate>20230729</enddate><creator>Yi, Xinyang</creator><creator>Shao-Chuan, Wang</creator><creator>He, Ruining</creator><creator>Chandrasekaran, Hariharan</creator><creator>Wu, Charles</creator><creator>Heldt, Lukasz</creator><creator>Hong, Lichan</creator><creator>Chen, Minmin</creator><creator>Chi, Ed H</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope></search><sort><creationdate>20230729</creationdate><title>Online Matching: A Real-time Bandit System for Large-scale Recommendations</title><author>Yi, Xinyang ; Shao-Chuan, Wang ; He, Ruining ; Chandrasekaran, Hariharan ; Wu, Charles ; Heldt, Lukasz ; Hong, Lichan ; Chen, Minmin ; Chi, Ed H</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28444479003</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Closed loops</topic><topic>Computer architecture</topic><topic>Deep learning</topic><topic>Distance learning</topic><topic>Machine learning</topic><topic>Matching</topic><topic>Mathematical models</topic><topic>Multi-armed bandit problems</topic><topic>Parameters</topic><topic>Real time</topic><topic>Recommender systems</topic><topic>User experience</topic><toplevel>online_resources</toplevel><creatorcontrib>Yi, Xinyang</creatorcontrib><creatorcontrib>Shao-Chuan, Wang</creatorcontrib><creatorcontrib>He, Ruining</creatorcontrib><creatorcontrib>Chandrasekaran, Hariharan</creatorcontrib><creatorcontrib>Wu, Charles</creatorcontrib><creatorcontrib>Heldt, Lukasz</creatorcontrib><creatorcontrib>Hong, Lichan</creatorcontrib><creatorcontrib>Chen, Minmin</creatorcontrib><creatorcontrib>Chi, Ed H</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yi, Xinyang</au><au>Shao-Chuan, Wang</au><au>He, Ruining</au><au>Chandrasekaran, Hariharan</au><au>Wu, Charles</au><au>Heldt, Lukasz</au><au>Hong, Lichan</au><au>Chen, Minmin</au><au>Chi, Ed H</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Online Matching: A Real-time Bandit System for Large-scale Recommendations</atitle><jtitle>arXiv.org</jtitle><date>2023-07-29</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce Online Matching: a scalable closed-loop bandit system learning from users' direct feedback on items in real time. We present a hybrid "offline + online" approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB -- a novel extension of the LinUCB algorithm -- to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-07
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2844447900
source	Free E- Journals
subjects	Algorithms Closed loops Computer architecture Deep learning Distance learning Machine learning Matching Mathematical models Multi-armed bandit problems Parameters Real time Recommender systems User experience
title	Online Matching: A Real-time Bandit System for Large-scale Recommendations
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T22%3A38%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Online%20Matching:%20A%20Real-time%20Bandit%20System%20for%20Large-scale%20Recommendations&rft.jtitle=arXiv.org&rft.au=Yi,%20Xinyang&rft.date=2023-07-29&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2844447900%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2844447900&rft_id=info:pmid/&rfr_iscdi=true