MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search

Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-07
Hauptverfasser:	Cheng, Peng, Wang, Huimu, Zhao, Jinyuan, Wang, Yihao, Xu, Enqiang, Zhao, Yu, Xiao, Zhuojian, Wang, Songlin, Tang, Guoyu, Liu, Lin, Xu, Sulong
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Augmentation systems Data augmentation Deep learning Electronic commerce Entropy (Information theory) Machine learning Searching
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Cheng, Peng Wang, Huimu Zhao, Jinyuan Wang, Yihao Xu, Enqiang Zhao, Yu Xiao, Zhuojian Wang, Songlin Tang, Guoyu Liu, Lin Xu, Sulong
description	Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic allocation, whereas approaches of reinforcement learning suffer from balancing multiple objectives and the difficulties of cold starts within realworld data environments. To address the aforementioned issues, this paper propose a multi-objective deep reinforcement learning framework consisting of multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on the cross-entropy method(CEM), and a progressive data augmentation system(PDA). Specifically. MOQ constructs ensemble RL models, each dedicated to an objective, such as click-through rate, conversion rate, etc. These models individually determine the position of items as actions, aiming to estimate the long-term value of multiple objectives from an individual perspective. Then we employ DFM to dynamically adjust weights among objectives to maximize long-term value, addressing temporal dynamics in objective preferences in e-commerce scenarios. Initially, PDA trained MOQ with simulated data from offline logs. As experiments progressed, it strategically integrated real user interaction data, ultimately replacing the simulated dataset to alleviate distributional shifts and the cold start problem. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of MODRL-TA, and we have successfully deployed MODRL-TA on an e-commerce search platform.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3083764421</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3083764421</sourcerecordid><originalsourceid>FETCH-proquest_journals_30837644213</originalsourceid><addsrcrecordid>eNqNi0sKwjAUAIMgKNo7PHAdqEl_uCt-cGEpaPclhldNTRNNU72-XXgAV7OYmQmZM87XNIsYm5Gg79swDFmSsjjmc6KLcnc-0Srf5FAM2itaXluUXr0RdohPOKMyjXUSOzQeTiicUeYGByc6_Fj3gFFC5UTTKAm51lYKr6wBZWBPt7brcHzhMn7yviTTRugegx8XZHXYV9sjfTr7GrD3dWsHZ0ZV8zDjaRJFbM3_q75_OEeu</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3083764421</pqid></control><display><type>article</type><title>MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search</title><source>Freely Accessible Journals</source><creator>Cheng, Peng ; Wang, Huimu ; Zhao, Jinyuan ; Wang, Yihao ; Xu, Enqiang ; Zhao, Yu ; Xiao, Zhuojian ; Wang, Songlin ; Tang, Guoyu ; Liu, Lin ; Xu, Sulong</creator><creatorcontrib>Cheng, Peng ; Wang, Huimu ; Zhao, Jinyuan ; Wang, Yihao ; Xu, Enqiang ; Zhao, Yu ; Xiao, Zhuojian ; Wang, Songlin ; Tang, Guoyu ; Liu, Lin ; Xu, Sulong</creatorcontrib><description>Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic allocation, whereas approaches of reinforcement learning suffer from balancing multiple objectives and the difficulties of cold starts within realworld data environments. To address the aforementioned issues, this paper propose a multi-objective deep reinforcement learning framework consisting of multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on the cross-entropy method(CEM), and a progressive data augmentation system(PDA). Specifically. MOQ constructs ensemble RL models, each dedicated to an objective, such as click-through rate, conversion rate, etc. These models individually determine the position of items as actions, aiming to estimate the long-term value of multiple objectives from an individual perspective. Then we employ DFM to dynamically adjust weights among objectives to maximize long-term value, addressing temporal dynamics in objective preferences in e-commerce scenarios. Initially, PDA trained MOQ with simulated data from offline logs. As experiments progressed, it strategically integrated real user interaction data, ultimately replacing the simulated dataset to alleviate distributional shifts and the cold start problem. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of MODRL-TA, and we have successfully deployed MODRL-TA on an e-commerce search platform.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Augmentation systems ; Data augmentation ; Deep learning ; Electronic commerce ; Entropy (Information theory) ; Machine learning ; Searching</subject><ispartof>arXiv.org, 2024-07</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Cheng, Peng</creatorcontrib><creatorcontrib>Wang, Huimu</creatorcontrib><creatorcontrib>Zhao, Jinyuan</creatorcontrib><creatorcontrib>Wang, Yihao</creatorcontrib><creatorcontrib>Xu, Enqiang</creatorcontrib><creatorcontrib>Zhao, Yu</creatorcontrib><creatorcontrib>Xiao, Zhuojian</creatorcontrib><creatorcontrib>Wang, Songlin</creatorcontrib><creatorcontrib>Tang, Guoyu</creatorcontrib><creatorcontrib>Liu, Lin</creatorcontrib><creatorcontrib>Xu, Sulong</creatorcontrib><title>MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search</title><title>arXiv.org</title><description>Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic allocation, whereas approaches of reinforcement learning suffer from balancing multiple objectives and the difficulties of cold starts within realworld data environments. To address the aforementioned issues, this paper propose a multi-objective deep reinforcement learning framework consisting of multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on the cross-entropy method(CEM), and a progressive data augmentation system(PDA). Specifically. MOQ constructs ensemble RL models, each dedicated to an objective, such as click-through rate, conversion rate, etc. These models individually determine the position of items as actions, aiming to estimate the long-term value of multiple objectives from an individual perspective. Then we employ DFM to dynamically adjust weights among objectives to maximize long-term value, addressing temporal dynamics in objective preferences in e-commerce scenarios. Initially, PDA trained MOQ with simulated data from offline logs. As experiments progressed, it strategically integrated real user interaction data, ultimately replacing the simulated dataset to alleviate distributional shifts and the cold start problem. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of MODRL-TA, and we have successfully deployed MODRL-TA on an e-commerce search platform.</description><subject>Algorithms</subject><subject>Augmentation systems</subject><subject>Data augmentation</subject><subject>Deep learning</subject><subject>Electronic commerce</subject><subject>Entropy (Information theory)</subject><subject>Machine learning</subject><subject>Searching</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNi0sKwjAUAIMgKNo7PHAdqEl_uCt-cGEpaPclhldNTRNNU72-XXgAV7OYmQmZM87XNIsYm5Gg79swDFmSsjjmc6KLcnc-0Srf5FAM2itaXluUXr0RdohPOKMyjXUSOzQeTiicUeYGByc6_Fj3gFFC5UTTKAm51lYKr6wBZWBPt7brcHzhMn7yviTTRugegx8XZHXYV9sjfTr7GrD3dWsHZ0ZV8zDjaRJFbM3_q75_OEeu</recordid><startdate>20240722</startdate><enddate>20240722</enddate><creator>Cheng, Peng</creator><creator>Wang, Huimu</creator><creator>Zhao, Jinyuan</creator><creator>Wang, Yihao</creator><creator>Xu, Enqiang</creator><creator>Zhao, Yu</creator><creator>Xiao, Zhuojian</creator><creator>Wang, Songlin</creator><creator>Tang, Guoyu</creator><creator>Liu, Lin</creator><creator>Xu, Sulong</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PKEHL</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240722</creationdate><title>MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search</title><author>Cheng, Peng ; Wang, Huimu ; Zhao, Jinyuan ; Wang, Yihao ; Xu, Enqiang ; Zhao, Yu ; Xiao, Zhuojian ; Wang, Songlin ; Tang, Guoyu ; Liu, Lin ; Xu, Sulong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30837644213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Augmentation systems</topic><topic>Data augmentation</topic><topic>Deep learning</topic><topic>Electronic commerce</topic><topic>Entropy (Information theory)</topic><topic>Machine learning</topic><topic>Searching</topic><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Peng</creatorcontrib><creatorcontrib>Wang, Huimu</creatorcontrib><creatorcontrib>Zhao, Jinyuan</creatorcontrib><creatorcontrib>Wang, Yihao</creatorcontrib><creatorcontrib>Xu, Enqiang</creatorcontrib><creatorcontrib>Zhao, Yu</creatorcontrib><creatorcontrib>Xiao, Zhuojian</creatorcontrib><creatorcontrib>Wang, Songlin</creatorcontrib><creatorcontrib>Tang, Guoyu</creatorcontrib><creatorcontrib>Liu, Lin</creatorcontrib><creatorcontrib>Xu, Sulong</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied & Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cheng, Peng</au><au>Wang, Huimu</au><au>Zhao, Jinyuan</au><au>Wang, Yihao</au><au>Xu, Enqiang</au><au>Zhao, Yu</au><au>Xiao, Zhuojian</au><au>Wang, Songlin</au><au>Tang, Guoyu</au><au>Liu, Lin</au><au>Xu, Sulong</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search</atitle><jtitle>arXiv.org</jtitle><date>2024-07-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic allocation, whereas approaches of reinforcement learning suffer from balancing multiple objectives and the difficulties of cold starts within realworld data environments. To address the aforementioned issues, this paper propose a multi-objective deep reinforcement learning framework consisting of multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on the cross-entropy method(CEM), and a progressive data augmentation system(PDA). Specifically. MOQ constructs ensemble RL models, each dedicated to an objective, such as click-through rate, conversion rate, etc. These models individually determine the position of items as actions, aiming to estimate the long-term value of multiple objectives from an individual perspective. Then we employ DFM to dynamically adjust weights among objectives to maximize long-term value, addressing temporal dynamics in objective preferences in e-commerce scenarios. Initially, PDA trained MOQ with simulated data from offline logs. As experiments progressed, it strategically integrated real user interaction data, ultimately replacing the simulated dataset to alleviate distributional shifts and the cold start problem. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of MODRL-TA, and we have successfully deployed MODRL-TA on an e-commerce search platform.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-07
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3083764421
source	Freely Accessible Journals
subjects	Algorithms Augmentation systems Data augmentation Deep learning Electronic commerce Entropy (Information theory) Machine learning Searching
title	MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T02%3A48%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=MODRL-TA:A%20Multi-Objective%20Deep%20Reinforcement%20Learning%20Framework%20for%20Traffic%20Allocation%20in%20E-Commerce%20Search&rft.jtitle=arXiv.org&rft.au=Cheng,%20Peng&rft.date=2024-07-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3083764421%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3083764421&rft_id=info:pmid/&rfr_iscdi=true