Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective

We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for sh...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-07
Hauptverfasser: Lucas Maystre, Russo, Daniel, Zhao, Yu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Lucas Maystre
Russo, Daniel
Zhao, Yu
description We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2774362740</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2774362740</sourcerecordid><originalsourceid>FETCH-proquest_journals_27743627403</originalsourceid><addsrcrecordid>eNqNzEELgjAYxvERBEn5HQadhbVNF90kig5CER66ieirTdpm2-zQp29BH6DTc_j_eGYoooxtki2ndIFi5wZCCM0ETVMWodt59FLJt9Q9zqdWGnyFxigFuq29NNrhzljs74ALo_ukBKt2OA9I6hAaCNDjAmqrvw8XsG6ExssXrNC8qx8O4t8u0fp4KPenZLTmOYHz1WAmq0OqqBCcZVRwwv5TH6ENQfg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2774362740</pqid></control><display><type>article</type><title>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</title><source>Free E- Journals</source><creator>Lucas Maystre ; Russo, Daniel ; Zhao, Yu</creator><creatorcontrib>Lucas Maystre ; Russo, Daniel ; Zhao, Yu</creatorcontrib><description>We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Optimization ; Recommender systems</subject><ispartof>arXiv.org, 2024-07</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Lucas Maystre</creatorcontrib><creatorcontrib>Russo, Daniel</creatorcontrib><creatorcontrib>Zhao, Yu</creatorcontrib><title>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</title><title>arXiv.org</title><description>We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.</description><subject>Algorithms</subject><subject>Optimization</subject><subject>Recommender systems</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNzEELgjAYxvERBEn5HQadhbVNF90kig5CER66ieirTdpm2-zQp29BH6DTc_j_eGYoooxtki2ndIFi5wZCCM0ETVMWodt59FLJt9Q9zqdWGnyFxigFuq29NNrhzljs74ALo_ukBKt2OA9I6hAaCNDjAmqrvw8XsG6ExssXrNC8qx8O4t8u0fp4KPenZLTmOYHz1WAmq0OqqBCcZVRwwv5TH6ENQfg</recordid><startdate>20240727</startdate><enddate>20240727</enddate><creator>Lucas Maystre</creator><creator>Russo, Daniel</creator><creator>Zhao, Yu</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240727</creationdate><title>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</title><author>Lucas Maystre ; Russo, Daniel ; Zhao, Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27743627403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Optimization</topic><topic>Recommender systems</topic><toplevel>online_resources</toplevel><creatorcontrib>Lucas Maystre</creatorcontrib><creatorcontrib>Russo, Daniel</creatorcontrib><creatorcontrib>Zhao, Yu</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lucas Maystre</au><au>Russo, Daniel</au><au>Zhao, Yu</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</atitle><jtitle>arXiv.org</jtitle><date>2024-07-27</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-07
issn 2331-8422
language eng
recordid cdi_proquest_journals_2774362740
source Free E- Journals
subjects Algorithms
Optimization
Recommender systems
title Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T08%3A11%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Optimizing%20Audio%20Recommendations%20for%20the%20Long-Term:%20A%20Reinforcement%20Learning%20Perspective&rft.jtitle=arXiv.org&rft.au=Lucas%20Maystre&rft.date=2024-07-27&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2774362740%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2774362740&rft_id=info:pmid/&rfr_iscdi=true