Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective

We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for sh...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-07
Hauptverfasser:	Lucas Maystre, Russo, Daniel, Zhao, Yu
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Optimization Recommender systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Lucas Maystre Russo, Daniel Zhao, Yu
description	We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2774362740</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2774362740</sourcerecordid><originalsourceid>FETCH-proquest_journals_27743627403</originalsourceid><addsrcrecordid>eNqNzEELgjAYxvERBEn5HQadhbVNF90kig5CER66ieirTdpm2-zQp29BH6DTc_j_eGYoooxtki2ndIFi5wZCCM0ETVMWodt59FLJt9Q9zqdWGnyFxigFuq29NNrhzljs74ALo_ukBKt2OA9I6hAaCNDjAmqrvw8XsG6ExssXrNC8qx8O4t8u0fp4KPenZLTmOYHz1WAmq0OqqBCcZVRwwv5TH6ENQfg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2774362740</pqid></control><display><type>article</type><title>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</title><source>Free E- Journals</source><creator>Lucas Maystre ; Russo, Daniel ; Zhao, Yu</creator><creatorcontrib>Lucas Maystre ; Russo, Daniel ; Zhao, Yu</creatorcontrib><description>We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Optimization ; Recommender systems</subject><ispartof>arXiv.org, 2024-07</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Lucas Maystre</creatorcontrib><creatorcontrib>Russo, Daniel</creatorcontrib><creatorcontrib>Zhao, Yu</creatorcontrib><title>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</title><title>arXiv.org</title><description>We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.</description><subject>Algorithms</subject><subject>Optimization</subject><subject>Recommender systems</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNzEELgjAYxvERBEn5HQadhbVNF90kig5CER66ieirTdpm2-zQp29BH6DTc_j_eGYoooxtki2ndIFi5wZCCM0ETVMWodt59FLJt9Q9zqdWGnyFxigFuq29NNrhzljs74ALo_ukBKt2OA9I6hAaCNDjAmqrvw8XsG6ExssXrNC8qx8O4t8u0fp4KPenZLTmOYHz1WAmq0OqqBCcZVRwwv5TH6ENQfg</recordid><startdate>20240727</startdate><enddate>20240727</enddate><creator>Lucas Maystre</creator><creator>Russo, Daniel</creator><creator>Zhao, Yu</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240727</creationdate><title>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</title><author>Lucas Maystre ; Russo, Daniel ; Zhao, Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27743627403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Optimization</topic><topic>Recommender systems</topic><toplevel>online_resources</toplevel><creatorcontrib>Lucas Maystre</creatorcontrib><creatorcontrib>Russo, Daniel</creatorcontrib><creatorcontrib>Zhao, Yu</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lucas Maystre</au><au>Russo, Daniel</au><au>Zhao, Yu</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</atitle><jtitle>arXiv.org</jtitle><date>2024-07-27</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-07
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2774362740
source	Free E- Journals
subjects	Algorithms Optimization Recommender systems
title	Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T08%3A11%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Optimizing%20Audio%20Recommendations%20for%20the%20Long-Term:%20A%20Reinforcement%20Learning%20Perspective&rft.jtitle=arXiv.org&rft.au=Lucas%20Maystre&rft.date=2024-07-27&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2774362740%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2774362740&rft_id=info:pmid/&rfr_iscdi=true