MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search
Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-07 |
---|---|
Hauptverfasser: | , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Cheng, Peng Wang, Huimu Zhao, Jinyuan Wang, Yihao Xu, Enqiang Zhao, Yu Xiao, Zhuojian Wang, Songlin Tang, Guoyu Liu, Lin Xu, Sulong |
description | Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic allocation, whereas approaches of reinforcement learning suffer from balancing multiple objectives and the difficulties of cold starts within realworld data environments. To address the aforementioned issues, this paper propose a multi-objective deep reinforcement learning framework consisting of multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on the cross-entropy method(CEM), and a progressive data augmentation system(PDA). Specifically. MOQ constructs ensemble RL models, each dedicated to an objective, such as click-through rate, conversion rate, etc. These models individually determine the position of items as actions, aiming to estimate the long-term value of multiple objectives from an individual perspective. Then we employ DFM to dynamically adjust weights among objectives to maximize long-term value, addressing temporal dynamics in objective preferences in e-commerce scenarios. Initially, PDA trained MOQ with simulated data from offline logs. As experiments progressed, it strategically integrated real user interaction data, ultimately replacing the simulated dataset to alleviate distributional shifts and the cold start problem. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of MODRL-TA, and we have successfully deployed MODRL-TA on an e-commerce search platform. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3083764421</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3083764421</sourcerecordid><originalsourceid>FETCH-proquest_journals_30837644213</originalsourceid><addsrcrecordid>eNqNi0sKwjAUAIMgKNo7PHAdqEl_uCt-cGEpaPclhldNTRNNU72-XXgAV7OYmQmZM87XNIsYm5Gg79swDFmSsjjmc6KLcnc-0Srf5FAM2itaXluUXr0RdohPOKMyjXUSOzQeTiicUeYGByc6_Fj3gFFC5UTTKAm51lYKr6wBZWBPt7brcHzhMn7yviTTRugegx8XZHXYV9sjfTr7GrD3dWsHZ0ZV8zDjaRJFbM3_q75_OEeu</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3083764421</pqid></control><display><type>article</type><title>MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search</title><source>Freely Accessible Journals</source><creator>Cheng, Peng ; Wang, Huimu ; Zhao, Jinyuan ; Wang, Yihao ; Xu, Enqiang ; Zhao, Yu ; Xiao, Zhuojian ; Wang, Songlin ; Tang, Guoyu ; Liu, Lin ; Xu, Sulong</creator><creatorcontrib>Cheng, Peng ; Wang, Huimu ; Zhao, Jinyuan ; Wang, Yihao ; Xu, Enqiang ; Zhao, Yu ; Xiao, Zhuojian ; Wang, Songlin ; Tang, Guoyu ; Liu, Lin ; Xu, Sulong</creatorcontrib><description>Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic allocation, whereas approaches of reinforcement learning suffer from balancing multiple objectives and the difficulties of cold starts within realworld data environments. To address the aforementioned issues, this paper propose a multi-objective deep reinforcement learning framework consisting of multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on the cross-entropy method(CEM), and a progressive data augmentation system(PDA). Specifically. MOQ constructs ensemble RL models, each dedicated to an objective, such as click-through rate, conversion rate, etc. These models individually determine the position of items as actions, aiming to estimate the long-term value of multiple objectives from an individual perspective. Then we employ DFM to dynamically adjust weights among objectives to maximize long-term value, addressing temporal dynamics in objective preferences in e-commerce scenarios. Initially, PDA trained MOQ with simulated data from offline logs. As experiments progressed, it strategically integrated real user interaction data, ultimately replacing the simulated dataset to alleviate distributional shifts and the cold start problem. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of MODRL-TA, and we have successfully deployed MODRL-TA on an e-commerce search platform.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Augmentation systems ; Data augmentation ; Deep learning ; Electronic commerce ; Entropy (Information theory) ; Machine learning ; Searching</subject><ispartof>arXiv.org, 2024-07</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Cheng, Peng</creatorcontrib><creatorcontrib>Wang, Huimu</creatorcontrib><creatorcontrib>Zhao, Jinyuan</creatorcontrib><creatorcontrib>Wang, Yihao</creatorcontrib><creatorcontrib>Xu, Enqiang</creatorcontrib><creatorcontrib>Zhao, Yu</creatorcontrib><creatorcontrib>Xiao, Zhuojian</creatorcontrib><creatorcontrib>Wang, Songlin</creatorcontrib><creatorcontrib>Tang, Guoyu</creatorcontrib><creatorcontrib>Liu, Lin</creatorcontrib><creatorcontrib>Xu, Sulong</creatorcontrib><title>MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search</title><title>arXiv.org</title><description>Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic allocation, whereas approaches of reinforcement learning suffer from balancing multiple objectives and the difficulties of cold starts within realworld data environments. To address the aforementioned issues, this paper propose a multi-objective deep reinforcement learning framework consisting of multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on the cross-entropy method(CEM), and a progressive data augmentation system(PDA). Specifically. MOQ constructs ensemble RL models, each dedicated to an objective, such as click-through rate, conversion rate, etc. These models individually determine the position of items as actions, aiming to estimate the long-term value of multiple objectives from an individual perspective. Then we employ DFM to dynamically adjust weights among objectives to maximize long-term value, addressing temporal dynamics in objective preferences in e-commerce scenarios. Initially, PDA trained MOQ with simulated data from offline logs. As experiments progressed, it strategically integrated real user interaction data, ultimately replacing the simulated dataset to alleviate distributional shifts and the cold start problem. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of MODRL-TA, and we have successfully deployed MODRL-TA on an e-commerce search platform.</description><subject>Algorithms</subject><subject>Augmentation systems</subject><subject>Data augmentation</subject><subject>Deep learning</subject><subject>Electronic commerce</subject><subject>Entropy (Information theory)</subject><subject>Machine learning</subject><subject>Searching</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNi0sKwjAUAIMgKNo7PHAdqEl_uCt-cGEpaPclhldNTRNNU72-XXgAV7OYmQmZM87XNIsYm5Gg79swDFmSsjjmc6KLcnc-0Srf5FAM2itaXluUXr0RdohPOKMyjXUSOzQeTiicUeYGByc6_Fj3gFFC5UTTKAm51lYKr6wBZWBPt7brcHzhMn7yviTTRugegx8XZHXYV9sjfTr7GrD3dWsHZ0ZV8zDjaRJFbM3_q75_OEeu</recordid><startdate>20240722</startdate><enddate>20240722</enddate><creator>Cheng, Peng</creator><creator>Wang, Huimu</creator><creator>Zhao, Jinyuan</creator><creator>Wang, Yihao</creator><creator>Xu, Enqiang</creator><creator>Zhao, Yu</creator><creator>Xiao, Zhuojian</creator><creator>Wang, Songlin</creator><creator>Tang, Guoyu</creator><creator>Liu, Lin</creator><creator>Xu, Sulong</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PKEHL</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240722</creationdate><title>MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search</title><author>Cheng, Peng ; Wang, Huimu ; Zhao, Jinyuan ; Wang, Yihao ; Xu, Enqiang ; Zhao, Yu ; Xiao, Zhuojian ; Wang, Songlin ; Tang, Guoyu ; Liu, Lin ; Xu, Sulong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30837644213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Augmentation systems</topic><topic>Data augmentation</topic><topic>Deep learning</topic><topic>Electronic commerce</topic><topic>Entropy (Information theory)</topic><topic>Machine learning</topic><topic>Searching</topic><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Peng</creatorcontrib><creatorcontrib>Wang, Huimu</creatorcontrib><creatorcontrib>Zhao, Jinyuan</creatorcontrib><creatorcontrib>Wang, Yihao</creatorcontrib><creatorcontrib>Xu, Enqiang</creatorcontrib><creatorcontrib>Zhao, Yu</creatorcontrib><creatorcontrib>Xiao, Zhuojian</creatorcontrib><creatorcontrib>Wang, Songlin</creatorcontrib><creatorcontrib>Tang, Guoyu</creatorcontrib><creatorcontrib>Liu, Lin</creatorcontrib><creatorcontrib>Xu, Sulong</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied & Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cheng, Peng</au><au>Wang, Huimu</au><au>Zhao, Jinyuan</au><au>Wang, Yihao</au><au>Xu, Enqiang</au><au>Zhao, Yu</au><au>Xiao, Zhuojian</au><au>Wang, Songlin</au><au>Tang, Guoyu</au><au>Liu, Lin</au><au>Xu, Sulong</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search</atitle><jtitle>arXiv.org</jtitle><date>2024-07-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic allocation, whereas approaches of reinforcement learning suffer from balancing multiple objectives and the difficulties of cold starts within realworld data environments. To address the aforementioned issues, this paper propose a multi-objective deep reinforcement learning framework consisting of multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on the cross-entropy method(CEM), and a progressive data augmentation system(PDA). Specifically. MOQ constructs ensemble RL models, each dedicated to an objective, such as click-through rate, conversion rate, etc. These models individually determine the position of items as actions, aiming to estimate the long-term value of multiple objectives from an individual perspective. Then we employ DFM to dynamically adjust weights among objectives to maximize long-term value, addressing temporal dynamics in objective preferences in e-commerce scenarios. Initially, PDA trained MOQ with simulated data from offline logs. As experiments progressed, it strategically integrated real user interaction data, ultimately replacing the simulated dataset to alleviate distributional shifts and the cold start problem. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of MODRL-TA, and we have successfully deployed MODRL-TA on an e-commerce search platform.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-07 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3083764421 |
source | Freely Accessible Journals |
subjects | Algorithms Augmentation systems Data augmentation Deep learning Electronic commerce Entropy (Information theory) Machine learning Searching |
title | MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T02%3A48%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=MODRL-TA:A%20Multi-Objective%20Deep%20Reinforcement%20Learning%20Framework%20for%20Traffic%20Allocation%20in%20E-Commerce%20Search&rft.jtitle=arXiv.org&rft.au=Cheng,%20Peng&rft.date=2024-07-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3083764421%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3083764421&rft_id=info:pmid/&rfr_iscdi=true |