Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning

The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Si, Zihua, Sun, Zhongxiang, Chen, Jiale, Chen, Guozhang, Zang, Xiaoxue, Zheng, Kai, Song, Yang, Zhang, Xiao, Xu, Jun, Gai, Kun
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Information Retrieval
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Si, Zihua Sun, Zhongxiang Chen, Jiale Chen, Guozhang Zang, Xiaoxue Zheng, Kai Song, Yang Zhang, Xiao Xu, Jun Gai, Kun
description	The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and effectiveness for large-scale recommendations. To obtain efficiency and effectiveness, this paper introduces a generative retrieval framework, namely SEATER, which learns SEmAntic Tree-structured item identifiERs via contrastive learning. Specifically, we employ an encoder-decoder model to extract user interests from historical behaviors and retrieve candidates via tree-structured item identifiers. SEATER devises a balanced k-ary tree structure of item identifiers, allocating semantic space to each token individually. This strategy maintains semantic consistency within the same level, while distinct levels correlate to varying semantic granularities. This structure also maintains consistent and fast inference speed for all items. Considering the tree structure, SEATER learns identifier tokens' semantics, hierarchical relationships, and inter-token dependencies. To achieve this, we incorporate two contrastive learning tasks with the generation task to optimize both the model and identifiers. The infoNCE loss aligns the token embeddings based on their hierarchical positions. The triplet loss ranks similar identifiers in desired orders. In this way, SEATER achieves both efficiency and effectiveness. Extensive experiments on three public datasets and an industrial dataset have demonstrated that SEATER outperforms state-of-the-art models significantly.
doi_str_mv	10.48550/arxiv.2309.13375
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2309_13375</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2309_13375</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-ab2428b5a0d268645019fe71812b6dddaf3cacf42b5b654eb47f41927929b0583</originalsourceid><addsrcrecordid>eNotz81KxDAUhuFsXMjoBbgyN9Ca3_4spehYKAhOl0I5aU40MK1ymql692p19S0-eOFh7EqK3FTWihugz7jmSos6l1qX9pw973FGghRX5E-YKOIKR_4R0ys_4ARziiPvCTE7JDqN6UToeZtw4q3HnzNEpIWvEXjzNieCZQt1CDTH-eWCnQU4Lnj5vzvW39_1zUPWPe7b5rbLoChtBk4ZVTkLwquiKowVsg5YykoqV3jvIegRxmCUs66wBp0pg5G1KmtVO2ErvWPXf9mNN7xTnIC-hl_msDH1N-c3TnY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning</title><source>arXiv.org</source><creator>Si, Zihua ; Sun, Zhongxiang ; Chen, Jiale ; Chen, Guozhang ; Zang, Xiaoxue ; Zheng, Kai ; Song, Yang ; Zhang, Xiao ; Xu, Jun ; Gai, Kun</creator><creatorcontrib>Si, Zihua ; Sun, Zhongxiang ; Chen, Jiale ; Chen, Guozhang ; Zang, Xiaoxue ; Zheng, Kai ; Song, Yang ; Zhang, Xiao ; Xu, Jun ; Gai, Kun</creatorcontrib><description>The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and effectiveness for large-scale recommendations. To obtain efficiency and effectiveness, this paper introduces a generative retrieval framework, namely SEATER, which learns SEmAntic Tree-structured item identifiERs via contrastive learning. Specifically, we employ an encoder-decoder model to extract user interests from historical behaviors and retrieve candidates via tree-structured item identifiers. SEATER devises a balanced k-ary tree structure of item identifiers, allocating semantic space to each token individually. This strategy maintains semantic consistency within the same level, while distinct levels correlate to varying semantic granularities. This structure also maintains consistent and fast inference speed for all items. Considering the tree structure, SEATER learns identifier tokens' semantics, hierarchical relationships, and inter-token dependencies. To achieve this, we incorporate two contrastive learning tasks with the generation task to optimize both the model and identifiers. The infoNCE loss aligns the token embeddings based on their hierarchical positions. The triplet loss ranks similar identifiers in desired orders. In this way, SEATER achieves both efficiency and effectiveness. Extensive experiments on three public datasets and an industrial dataset have demonstrated that SEATER outperforms state-of-the-art models significantly.</description><identifier>DOI: 10.48550/arxiv.2309.13375</identifier><language>eng</language><subject>Computer Science - Information Retrieval</subject><creationdate>2023-09</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2309.13375$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2309.13375$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Si, Zihua</creatorcontrib><creatorcontrib>Sun, Zhongxiang</creatorcontrib><creatorcontrib>Chen, Jiale</creatorcontrib><creatorcontrib>Chen, Guozhang</creatorcontrib><creatorcontrib>Zang, Xiaoxue</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><creatorcontrib>Song, Yang</creatorcontrib><creatorcontrib>Zhang, Xiao</creatorcontrib><creatorcontrib>Xu, Jun</creatorcontrib><creatorcontrib>Gai, Kun</creatorcontrib><title>Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning</title><description>The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and effectiveness for large-scale recommendations. To obtain efficiency and effectiveness, this paper introduces a generative retrieval framework, namely SEATER, which learns SEmAntic Tree-structured item identifiERs via contrastive learning. Specifically, we employ an encoder-decoder model to extract user interests from historical behaviors and retrieve candidates via tree-structured item identifiers. SEATER devises a balanced k-ary tree structure of item identifiers, allocating semantic space to each token individually. This strategy maintains semantic consistency within the same level, while distinct levels correlate to varying semantic granularities. This structure also maintains consistent and fast inference speed for all items. Considering the tree structure, SEATER learns identifier tokens' semantics, hierarchical relationships, and inter-token dependencies. To achieve this, we incorporate two contrastive learning tasks with the generation task to optimize both the model and identifiers. The infoNCE loss aligns the token embeddings based on their hierarchical positions. The triplet loss ranks similar identifiers in desired orders. In this way, SEATER achieves both efficiency and effectiveness. Extensive experiments on three public datasets and an industrial dataset have demonstrated that SEATER outperforms state-of-the-art models significantly.</description><subject>Computer Science - Information Retrieval</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81KxDAUhuFsXMjoBbgyN9Ca3_4spehYKAhOl0I5aU40MK1ymql692p19S0-eOFh7EqK3FTWihugz7jmSos6l1qX9pw973FGghRX5E-YKOIKR_4R0ys_4ARziiPvCTE7JDqN6UToeZtw4q3HnzNEpIWvEXjzNieCZQt1CDTH-eWCnQU4Lnj5vzvW39_1zUPWPe7b5rbLoChtBk4ZVTkLwquiKowVsg5YykoqV3jvIegRxmCUs66wBp0pg5G1KmtVO2ErvWPXf9mNN7xTnIC-hl_msDH1N-c3TnY</recordid><startdate>20230923</startdate><enddate>20230923</enddate><creator>Si, Zihua</creator><creator>Sun, Zhongxiang</creator><creator>Chen, Jiale</creator><creator>Chen, Guozhang</creator><creator>Zang, Xiaoxue</creator><creator>Zheng, Kai</creator><creator>Song, Yang</creator><creator>Zhang, Xiao</creator><creator>Xu, Jun</creator><creator>Gai, Kun</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230923</creationdate><title>Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning</title><author>Si, Zihua ; Sun, Zhongxiang ; Chen, Jiale ; Chen, Guozhang ; Zang, Xiaoxue ; Zheng, Kai ; Song, Yang ; Zhang, Xiao ; Xu, Jun ; Gai, Kun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-ab2428b5a0d268645019fe71812b6dddaf3cacf42b5b654eb47f41927929b0583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Information Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Si, Zihua</creatorcontrib><creatorcontrib>Sun, Zhongxiang</creatorcontrib><creatorcontrib>Chen, Jiale</creatorcontrib><creatorcontrib>Chen, Guozhang</creatorcontrib><creatorcontrib>Zang, Xiaoxue</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><creatorcontrib>Song, Yang</creatorcontrib><creatorcontrib>Zhang, Xiao</creatorcontrib><creatorcontrib>Xu, Jun</creatorcontrib><creatorcontrib>Gai, Kun</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Si, Zihua</au><au>Sun, Zhongxiang</au><au>Chen, Jiale</au><au>Chen, Guozhang</au><au>Zang, Xiaoxue</au><au>Zheng, Kai</au><au>Song, Yang</au><au>Zhang, Xiao</au><au>Xu, Jun</au><au>Gai, Kun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning</atitle><date>2023-09-23</date><risdate>2023</risdate><abstract>The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and effectiveness for large-scale recommendations. To obtain efficiency and effectiveness, this paper introduces a generative retrieval framework, namely SEATER, which learns SEmAntic Tree-structured item identifiERs via contrastive learning. Specifically, we employ an encoder-decoder model to extract user interests from historical behaviors and retrieve candidates via tree-structured item identifiers. SEATER devises a balanced k-ary tree structure of item identifiers, allocating semantic space to each token individually. This strategy maintains semantic consistency within the same level, while distinct levels correlate to varying semantic granularities. This structure also maintains consistent and fast inference speed for all items. Considering the tree structure, SEATER learns identifier tokens' semantics, hierarchical relationships, and inter-token dependencies. To achieve this, we incorporate two contrastive learning tasks with the generation task to optimize both the model and identifiers. The infoNCE loss aligns the token embeddings based on their hierarchical positions. The triplet loss ranks similar identifiers in desired orders. In this way, SEATER achieves both efficiency and effectiveness. Extensive experiments on three public datasets and an industrial dataset have demonstrated that SEATER outperforms state-of-the-art models significantly.</abstract><doi>10.48550/arxiv.2309.13375</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2309.13375
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2309_13375
source	arXiv.org
subjects	Computer Science - Information Retrieval
title	Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T19%3A31%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generative%20Retrieval%20with%20Semantic%20Tree-Structured%20Item%20Identifiers%20via%20Contrastive%20Learning&rft.au=Si,%20Zihua&rft.date=2023-09-23&rft_id=info:doi/10.48550/arxiv.2309.13375&rft_dat=%3Carxiv_GOX%3E2309_13375%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true