Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective
We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for sh...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-07 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Lucas Maystre Russo, Daniel Zhao, Yu |
description | We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2774362740</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2774362740</sourcerecordid><originalsourceid>FETCH-proquest_journals_27743627403</originalsourceid><addsrcrecordid>eNqNzEELgjAYxvERBEn5HQadhbVNF90kig5CER66ieirTdpm2-zQp29BH6DTc_j_eGYoooxtki2ndIFi5wZCCM0ETVMWodt59FLJt9Q9zqdWGnyFxigFuq29NNrhzljs74ALo_ukBKt2OA9I6hAaCNDjAmqrvw8XsG6ExssXrNC8qx8O4t8u0fp4KPenZLTmOYHz1WAmq0OqqBCcZVRwwv5TH6ENQfg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2774362740</pqid></control><display><type>article</type><title>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</title><source>Free E- Journals</source><creator>Lucas Maystre ; Russo, Daniel ; Zhao, Yu</creator><creatorcontrib>Lucas Maystre ; Russo, Daniel ; Zhao, Yu</creatorcontrib><description>We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Optimization ; Recommender systems</subject><ispartof>arXiv.org, 2024-07</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Lucas Maystre</creatorcontrib><creatorcontrib>Russo, Daniel</creatorcontrib><creatorcontrib>Zhao, Yu</creatorcontrib><title>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</title><title>arXiv.org</title><description>We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.</description><subject>Algorithms</subject><subject>Optimization</subject><subject>Recommender systems</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNzEELgjAYxvERBEn5HQadhbVNF90kig5CER66ieirTdpm2-zQp29BH6DTc_j_eGYoooxtki2ndIFi5wZCCM0ETVMWodt59FLJt9Q9zqdWGnyFxigFuq29NNrhzljs74ALo_ukBKt2OA9I6hAaCNDjAmqrvw8XsG6ExssXrNC8qx8O4t8u0fp4KPenZLTmOYHz1WAmq0OqqBCcZVRwwv5TH6ENQfg</recordid><startdate>20240727</startdate><enddate>20240727</enddate><creator>Lucas Maystre</creator><creator>Russo, Daniel</creator><creator>Zhao, Yu</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240727</creationdate><title>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</title><author>Lucas Maystre ; Russo, Daniel ; Zhao, Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27743627403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Optimization</topic><topic>Recommender systems</topic><toplevel>online_resources</toplevel><creatorcontrib>Lucas Maystre</creatorcontrib><creatorcontrib>Russo, Daniel</creatorcontrib><creatorcontrib>Zhao, Yu</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lucas Maystre</au><au>Russo, Daniel</au><au>Zhao, Yu</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective</atitle><jtitle>arXiv.org</jtitle><date>2024-07-27</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-07 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2774362740 |
source | Free E- Journals |
subjects | Algorithms Optimization Recommender systems |
title | Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T08%3A11%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Optimizing%20Audio%20Recommendations%20for%20the%20Long-Term:%20A%20Reinforcement%20Learning%20Perspective&rft.jtitle=arXiv.org&rft.au=Lucas%20Maystre&rft.date=2024-07-27&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2774362740%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2774362740&rft_id=info:pmid/&rfr_iscdi=true |