MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks

We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevance documents from billions of candidates. The common framework is to train two encoding models based on neural embedding which learn the distributed representations of q...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2020-07
Hauptverfasser: Zhang, Yusi, Liu, Chuanjie, Luo, Angen, Xue, Hui, Shan, Xuan, Luo, Yuxiang, Xia, Yiqian, Yuanchi Yan, Wang, Haidong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Zhang, Yusi
Liu, Chuanjie
Luo, Angen
Xue, Hui
Shan, Xuan
Luo, Yuxiang
Xia, Yiqian
Yuanchi Yan
Wang, Haidong
description We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevance documents from billions of candidates. The common framework is to train two encoding models based on neural embedding which learn the distributed representations of queries and documents separately and match them in the latent semantic space. However, all the exiting encoding models only leverage the information of the document itself, which is often not sufficient in practice when matching with query terms, especially for the hard tail queries. In this work we aim to leverage the additional information for each document from its co-click neighbour to help document retrieval. The challenges include how to effectively extract information and eliminate noise when involving co-click information in deep model while meet the demands of billion-scale data size for real time online inference. To handle the noise in co-click relations, we firstly propose a web-scale Multi-Intention Co-click document Graph(MICG) which builds the co-click connections between documents on click intention level but not on document level. Then we present an encoding framework MIRA based on Bert and graph attention networks which leverages a two-factor attention mechanism to aggregate neighbours. To meet the online latency requirements, we only involve neighbour information in document side, which can save the time-consuming query neighbor search in real time serving. We conduct extensive offline experiments on both public dataset and private web-scale dataset from two major commercial search engines demonstrating the effectiveness and scalability of the proposed method compared with several baselines. And a further case study reveals that co-click relations mainly help improve web search quality from two aspects: key concept enhancing and query term complementary.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2420333003</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2420333003</sourcerecordid><originalsourceid>FETCH-proquest_journals_24203330033</originalsourceid><addsrcrecordid>eNqNjMsKwjAURIMgKNp_uOA6EJP6wJ34wIK6KIJLieFWojGpeejvW8UPcDMDZw7TIl0uxJBOc847JAvhyhjj4wkfjUSX1LuinM9gi0_08qLtBXbJRE0LG9FG7SwsHFVGqxsUtnL-Lr9QWzjimQYlDcLSqXRvbCgxeo1PaSCFz9USsYY9Jt-QPcaX87fQJ-1KmoDZr3tksF4dFhtae_dIGOLp6pK3zXTiOWdCCNbEf9YbOkpJ3g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2420333003</pqid></control><display><type>article</type><title>MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks</title><source>Free E- Journals</source><creator>Zhang, Yusi ; Liu, Chuanjie ; Luo, Angen ; Xue, Hui ; Shan, Xuan ; Luo, Yuxiang ; Xia, Yiqian ; Yuanchi Yan ; Wang, Haidong</creator><creatorcontrib>Zhang, Yusi ; Liu, Chuanjie ; Luo, Angen ; Xue, Hui ; Shan, Xuan ; Luo, Yuxiang ; Xia, Yiqian ; Yuanchi Yan ; Wang, Haidong</creatorcontrib><description>We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevance documents from billions of candidates. The common framework is to train two encoding models based on neural embedding which learn the distributed representations of queries and documents separately and match them in the latent semantic space. However, all the exiting encoding models only leverage the information of the document itself, which is often not sufficient in practice when matching with query terms, especially for the hard tail queries. In this work we aim to leverage the additional information for each document from its co-click neighbour to help document retrieval. The challenges include how to effectively extract information and eliminate noise when involving co-click information in deep model while meet the demands of billion-scale data size for real time online inference. To handle the noise in co-click relations, we firstly propose a web-scale Multi-Intention Co-click document Graph(MICG) which builds the co-click connections between documents on click intention level but not on document level. Then we present an encoding framework MIRA based on Bert and graph attention networks which leverages a two-factor attention mechanism to aggregate neighbours. To meet the online latency requirements, we only involve neighbour information in document side, which can save the time-consuming query neighbor search in real time serving. We conduct extensive offline experiments on both public dataset and private web-scale dataset from two major commercial search engines demonstrating the effectiveness and scalability of the proposed method compared with several baselines. And a further case study reveals that co-click relations mainly help improve web search quality from two aspects: key concept enhancing and query term complementary.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Datasets ; Information retrieval ; Queries ; Real time ; Search engines</subject><ispartof>arXiv.org, 2020-07</ispartof><rights>2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Zhang, Yusi</creatorcontrib><creatorcontrib>Liu, Chuanjie</creatorcontrib><creatorcontrib>Luo, Angen</creatorcontrib><creatorcontrib>Xue, Hui</creatorcontrib><creatorcontrib>Shan, Xuan</creatorcontrib><creatorcontrib>Luo, Yuxiang</creatorcontrib><creatorcontrib>Xia, Yiqian</creatorcontrib><creatorcontrib>Yuanchi Yan</creatorcontrib><creatorcontrib>Wang, Haidong</creatorcontrib><title>MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks</title><title>arXiv.org</title><description>We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevance documents from billions of candidates. The common framework is to train two encoding models based on neural embedding which learn the distributed representations of queries and documents separately and match them in the latent semantic space. However, all the exiting encoding models only leverage the information of the document itself, which is often not sufficient in practice when matching with query terms, especially for the hard tail queries. In this work we aim to leverage the additional information for each document from its co-click neighbour to help document retrieval. The challenges include how to effectively extract information and eliminate noise when involving co-click information in deep model while meet the demands of billion-scale data size for real time online inference. To handle the noise in co-click relations, we firstly propose a web-scale Multi-Intention Co-click document Graph(MICG) which builds the co-click connections between documents on click intention level but not on document level. Then we present an encoding framework MIRA based on Bert and graph attention networks which leverages a two-factor attention mechanism to aggregate neighbours. To meet the online latency requirements, we only involve neighbour information in document side, which can save the time-consuming query neighbor search in real time serving. We conduct extensive offline experiments on both public dataset and private web-scale dataset from two major commercial search engines demonstrating the effectiveness and scalability of the proposed method compared with several baselines. And a further case study reveals that co-click relations mainly help improve web search quality from two aspects: key concept enhancing and query term complementary.</description><subject>Artificial neural networks</subject><subject>Datasets</subject><subject>Information retrieval</subject><subject>Queries</subject><subject>Real time</subject><subject>Search engines</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNjMsKwjAURIMgKNp_uOA6EJP6wJ34wIK6KIJLieFWojGpeejvW8UPcDMDZw7TIl0uxJBOc847JAvhyhjj4wkfjUSX1LuinM9gi0_08qLtBXbJRE0LG9FG7SwsHFVGqxsUtnL-Lr9QWzjimQYlDcLSqXRvbCgxeo1PaSCFz9USsYY9Jt-QPcaX87fQJ-1KmoDZr3tksF4dFhtae_dIGOLp6pK3zXTiOWdCCNbEf9YbOkpJ3g</recordid><startdate>20200703</startdate><enddate>20200703</enddate><creator>Zhang, Yusi</creator><creator>Liu, Chuanjie</creator><creator>Luo, Angen</creator><creator>Xue, Hui</creator><creator>Shan, Xuan</creator><creator>Luo, Yuxiang</creator><creator>Xia, Yiqian</creator><creator>Yuanchi Yan</creator><creator>Wang, Haidong</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20200703</creationdate><title>MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks</title><author>Zhang, Yusi ; Liu, Chuanjie ; Luo, Angen ; Xue, Hui ; Shan, Xuan ; Luo, Yuxiang ; Xia, Yiqian ; Yuanchi Yan ; Wang, Haidong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_24203330033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Artificial neural networks</topic><topic>Datasets</topic><topic>Information retrieval</topic><topic>Queries</topic><topic>Real time</topic><topic>Search engines</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yusi</creatorcontrib><creatorcontrib>Liu, Chuanjie</creatorcontrib><creatorcontrib>Luo, Angen</creatorcontrib><creatorcontrib>Xue, Hui</creatorcontrib><creatorcontrib>Shan, Xuan</creatorcontrib><creatorcontrib>Luo, Yuxiang</creatorcontrib><creatorcontrib>Xia, Yiqian</creatorcontrib><creatorcontrib>Yuanchi Yan</creatorcontrib><creatorcontrib>Wang, Haidong</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Yusi</au><au>Liu, Chuanjie</au><au>Luo, Angen</au><au>Xue, Hui</au><au>Shan, Xuan</au><au>Luo, Yuxiang</au><au>Xia, Yiqian</au><au>Yuanchi Yan</au><au>Wang, Haidong</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks</atitle><jtitle>arXiv.org</jtitle><date>2020-07-03</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevance documents from billions of candidates. The common framework is to train two encoding models based on neural embedding which learn the distributed representations of queries and documents separately and match them in the latent semantic space. However, all the exiting encoding models only leverage the information of the document itself, which is often not sufficient in practice when matching with query terms, especially for the hard tail queries. In this work we aim to leverage the additional information for each document from its co-click neighbour to help document retrieval. The challenges include how to effectively extract information and eliminate noise when involving co-click information in deep model while meet the demands of billion-scale data size for real time online inference. To handle the noise in co-click relations, we firstly propose a web-scale Multi-Intention Co-click document Graph(MICG) which builds the co-click connections between documents on click intention level but not on document level. Then we present an encoding framework MIRA based on Bert and graph attention networks which leverages a two-factor attention mechanism to aggregate neighbours. To meet the online latency requirements, we only involve neighbour information in document side, which can save the time-consuming query neighbor search in real time serving. We conduct extensive offline experiments on both public dataset and private web-scale dataset from two major commercial search engines demonstrating the effectiveness and scalability of the proposed method compared with several baselines. And a further case study reveals that co-click relations mainly help improve web search quality from two aspects: key concept enhancing and query term complementary.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2020-07
issn 2331-8422
language eng
recordid cdi_proquest_journals_2420333003
source Free E- Journals
subjects Artificial neural networks
Datasets
Information retrieval
Queries
Real time
Search engines
title MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T00%3A49%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=MIRA:%20Leveraging%20Multi-Intention%20Co-click%20Information%20in%20Web-scale%20Document%20Retrieval%20using%20Deep%20Neural%20Networks&rft.jtitle=arXiv.org&rft.au=Zhang,%20Yusi&rft.date=2020-07-03&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2420333003%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2420333003&rft_id=info:pmid/&rfr_iscdi=true