Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models

Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-r...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-10
Hauptverfasser:	Song, Jinbo, Huang, Ruoran, Wang, Xinyang, Huang, Wei, Yu, Qian, Chen, Mingming, Yao, Yafei, Fan, Chaosheng, Peng, Changping, Lin, Zhangang, Hu, Jinghe, Shao, Jingping
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Information Retrieval Computer Science - Learning Matching Neural networks Ranking Recommender systems Regularization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Song, Jinbo Huang, Ruoran Wang, Xinyang Huang, Wei Yu, Qian Chen, Mingming Yao, Yafei Fan, Chaosheng Peng, Changping Lin, Zhangang Hu, Jinghe Shao, Jingping
description	Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with $L0$ regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.
doi_str_mv	10.48550/arxiv.2310.08039
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2310_08039</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2876762186</sourcerecordid><originalsourceid>FETCH-LOGICAL-a526-75979093aaa6401295687a13c5532c0140cbe630ba4f44f51a46682e5e32c5a3</originalsourceid><addsrcrecordid>eNotj9tKw0AQhhdBsNQ-gFcGvN46e954J6EeMKJY78M03bSpOdTdVOzbu229Guabn-H_CLliMJVWKbhF_1v_TLmIACyI9IyMuBCMWsn5BZmEsAEArg1XSozIy4cb1nX3VXerJEe_cjSU2Ljk3Tvq8cTn-zC49i6ZdUMdcbnGuksy34dAl317WF77pWvCJTmvsAlu8j_HZP4w-8yeaP72-Jzd5xQV19So1KSQCkTUEhhPlbYGmShjH14Ck1AunBawQFlJWSmGUmvLnXLxrFCMyfXp61G02Pq6Rb8vDsLFUTgmbk6Jre-_dy4Mxabf-S5WKrg12mjOrBZ_FlBW6A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2876762186</pqid></control><display><type>article</type><title>Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Song, Jinbo ; Huang, Ruoran ; Wang, Xinyang ; Huang, Wei ; Yu, Qian ; Chen, Mingming ; Yao, Yafei ; Fan, Chaosheng ; Peng, Changping ; Lin, Zhangang ; Hu, Jinghe ; Shao, Jingping</creator><creatorcontrib>Song, Jinbo ; Huang, Ruoran ; Wang, Xinyang ; Huang, Wei ; Yu, Qian ; Chen, Mingming ; Yao, Yafei ; Fan, Chaosheng ; Peng, Changping ; Lin, Zhangang ; Hu, Jinghe ; Shao, Jingping</creatorcontrib><description>Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with $L0$ regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2310.08039</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Artificial Intelligence ; Computer Science - Information Retrieval ; Computer Science - Learning ; Matching ; Neural networks ; Ranking ; Recommender systems ; Regularization</subject><ispartof>arXiv.org, 2023-10</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27904</link.rule.ids><backlink>$$Uhttps://doi.org/10.1145/3511808.3557683$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.08039$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Song, Jinbo</creatorcontrib><creatorcontrib>Huang, Ruoran</creatorcontrib><creatorcontrib>Wang, Xinyang</creatorcontrib><creatorcontrib>Huang, Wei</creatorcontrib><creatorcontrib>Yu, Qian</creatorcontrib><creatorcontrib>Chen, Mingming</creatorcontrib><creatorcontrib>Yao, Yafei</creatorcontrib><creatorcontrib>Fan, Chaosheng</creatorcontrib><creatorcontrib>Peng, Changping</creatorcontrib><creatorcontrib>Lin, Zhangang</creatorcontrib><creatorcontrib>Hu, Jinghe</creatorcontrib><creatorcontrib>Shao, Jingping</creatorcontrib><title>Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models</title><title>arXiv.org</title><description>Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with $L0$ regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Learning</subject><subject>Matching</subject><subject>Neural networks</subject><subject>Ranking</subject><subject>Recommender systems</subject><subject>Regularization</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj9tKw0AQhhdBsNQ-gFcGvN46e954J6EeMKJY78M03bSpOdTdVOzbu229Guabn-H_CLliMJVWKbhF_1v_TLmIACyI9IyMuBCMWsn5BZmEsAEArg1XSozIy4cb1nX3VXerJEe_cjSU2Ljk3Tvq8cTn-zC49i6ZdUMdcbnGuksy34dAl317WF77pWvCJTmvsAlu8j_HZP4w-8yeaP72-Jzd5xQV19So1KSQCkTUEhhPlbYGmShjH14Ck1AunBawQFlJWSmGUmvLnXLxrFCMyfXp61G02Pq6Rb8vDsLFUTgmbk6Jre-_dy4Mxabf-S5WKrg12mjOrBZ_FlBW6A</recordid><startdate>20231012</startdate><enddate>20231012</enddate><creator>Song, Jinbo</creator><creator>Huang, Ruoran</creator><creator>Wang, Xinyang</creator><creator>Huang, Wei</creator><creator>Yu, Qian</creator><creator>Chen, Mingming</creator><creator>Yao, Yafei</creator><creator>Fan, Chaosheng</creator><creator>Peng, Changping</creator><creator>Lin, Zhangang</creator><creator>Hu, Jinghe</creator><creator>Shao, Jingping</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231012</creationdate><title>Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models</title><author>Song, Jinbo ; Huang, Ruoran ; Wang, Xinyang ; Huang, Wei ; Yu, Qian ; Chen, Mingming ; Yao, Yafei ; Fan, Chaosheng ; Peng, Changping ; Lin, Zhangang ; Hu, Jinghe ; Shao, Jingping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a526-75979093aaa6401295687a13c5532c0140cbe630ba4f44f51a46682e5e32c5a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Learning</topic><topic>Matching</topic><topic>Neural networks</topic><topic>Ranking</topic><topic>Recommender systems</topic><topic>Regularization</topic><toplevel>online_resources</toplevel><creatorcontrib>Song, Jinbo</creatorcontrib><creatorcontrib>Huang, Ruoran</creatorcontrib><creatorcontrib>Wang, Xinyang</creatorcontrib><creatorcontrib>Huang, Wei</creatorcontrib><creatorcontrib>Yu, Qian</creatorcontrib><creatorcontrib>Chen, Mingming</creatorcontrib><creatorcontrib>Yao, Yafei</creatorcontrib><creatorcontrib>Fan, Chaosheng</creatorcontrib><creatorcontrib>Peng, Changping</creatorcontrib><creatorcontrib>Lin, Zhangang</creatorcontrib><creatorcontrib>Hu, Jinghe</creatorcontrib><creatorcontrib>Shao, Jingping</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Song, Jinbo</au><au>Huang, Ruoran</au><au>Wang, Xinyang</au><au>Huang, Wei</au><au>Yu, Qian</au><au>Chen, Mingming</au><au>Yao, Yafei</au><au>Fan, Chaosheng</au><au>Peng, Changping</au><au>Lin, Zhangang</au><au>Hu, Jinghe</au><au>Shao, Jingping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models</atitle><jtitle>arXiv.org</jtitle><date>2023-10-12</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with $L0$ regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2310.08039</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-10
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2310_08039
source	arXiv.org; Free E- Journals
subjects	Computer Science - Artificial Intelligence Computer Science - Information Retrieval Computer Science - Learning Matching Neural networks Ranking Recommender systems Regularization
title	Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T07%3A35%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Rethinking%20Large-scale%20Pre-ranking%20System:%20Entire-chain%20Cross-domain%20Models&rft.jtitle=arXiv.org&rft.au=Song,%20Jinbo&rft.date=2023-10-12&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2310.08039&rft_dat=%3Cproquest_arxiv%3E2876762186%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2876762186&rft_id=info:pmid/&rfr_iscdi=true