Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning

The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Si, Zihua, Sun, Zhongxiang, Chen, Jiale, Chen, Guozhang, Zang, Xiaoxue, Zheng, Kai, Song, Yang, Zhang, Xiao, Xu, Jun, Gai, Kun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Si, Zihua
Sun, Zhongxiang
Chen, Jiale
Chen, Guozhang
Zang, Xiaoxue
Zheng, Kai
Song, Yang
Zhang, Xiao
Xu, Jun
Gai, Kun
description The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and effectiveness for large-scale recommendations. To obtain efficiency and effectiveness, this paper introduces a generative retrieval framework, namely SEATER, which learns SEmAntic Tree-structured item identifiERs via contrastive learning. Specifically, we employ an encoder-decoder model to extract user interests from historical behaviors and retrieve candidates via tree-structured item identifiers. SEATER devises a balanced k-ary tree structure of item identifiers, allocating semantic space to each token individually. This strategy maintains semantic consistency within the same level, while distinct levels correlate to varying semantic granularities. This structure also maintains consistent and fast inference speed for all items. Considering the tree structure, SEATER learns identifier tokens' semantics, hierarchical relationships, and inter-token dependencies. To achieve this, we incorporate two contrastive learning tasks with the generation task to optimize both the model and identifiers. The infoNCE loss aligns the token embeddings based on their hierarchical positions. The triplet loss ranks similar identifiers in desired orders. In this way, SEATER achieves both efficiency and effectiveness. Extensive experiments on three public datasets and an industrial dataset have demonstrated that SEATER outperforms state-of-the-art models significantly.
doi_str_mv 10.48550/arxiv.2309.13375
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2309_13375</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2309_13375</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-ab2428b5a0d268645019fe71812b6dddaf3cacf42b5b654eb47f41927929b0583</originalsourceid><addsrcrecordid>eNotz81KxDAUhuFsXMjoBbgyN9Ca3_4spehYKAhOl0I5aU40MK1ymql692p19S0-eOFh7EqK3FTWihugz7jmSos6l1qX9pw973FGghRX5E-YKOIKR_4R0ys_4ARziiPvCTE7JDqN6UToeZtw4q3HnzNEpIWvEXjzNieCZQt1CDTH-eWCnQU4Lnj5vzvW39_1zUPWPe7b5rbLoChtBk4ZVTkLwquiKowVsg5YykoqV3jvIegRxmCUs66wBp0pg5G1KmtVO2ErvWPXf9mNN7xTnIC-hl_msDH1N-c3TnY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning</title><source>arXiv.org</source><creator>Si, Zihua ; Sun, Zhongxiang ; Chen, Jiale ; Chen, Guozhang ; Zang, Xiaoxue ; Zheng, Kai ; Song, Yang ; Zhang, Xiao ; Xu, Jun ; Gai, Kun</creator><creatorcontrib>Si, Zihua ; Sun, Zhongxiang ; Chen, Jiale ; Chen, Guozhang ; Zang, Xiaoxue ; Zheng, Kai ; Song, Yang ; Zhang, Xiao ; Xu, Jun ; Gai, Kun</creatorcontrib><description>The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and effectiveness for large-scale recommendations. To obtain efficiency and effectiveness, this paper introduces a generative retrieval framework, namely SEATER, which learns SEmAntic Tree-structured item identifiERs via contrastive learning. Specifically, we employ an encoder-decoder model to extract user interests from historical behaviors and retrieve candidates via tree-structured item identifiers. SEATER devises a balanced k-ary tree structure of item identifiers, allocating semantic space to each token individually. This strategy maintains semantic consistency within the same level, while distinct levels correlate to varying semantic granularities. This structure also maintains consistent and fast inference speed for all items. Considering the tree structure, SEATER learns identifier tokens' semantics, hierarchical relationships, and inter-token dependencies. To achieve this, we incorporate two contrastive learning tasks with the generation task to optimize both the model and identifiers. The infoNCE loss aligns the token embeddings based on their hierarchical positions. The triplet loss ranks similar identifiers in desired orders. In this way, SEATER achieves both efficiency and effectiveness. Extensive experiments on three public datasets and an industrial dataset have demonstrated that SEATER outperforms state-of-the-art models significantly.</description><identifier>DOI: 10.48550/arxiv.2309.13375</identifier><language>eng</language><subject>Computer Science - Information Retrieval</subject><creationdate>2023-09</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2309.13375$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2309.13375$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Si, Zihua</creatorcontrib><creatorcontrib>Sun, Zhongxiang</creatorcontrib><creatorcontrib>Chen, Jiale</creatorcontrib><creatorcontrib>Chen, Guozhang</creatorcontrib><creatorcontrib>Zang, Xiaoxue</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><creatorcontrib>Song, Yang</creatorcontrib><creatorcontrib>Zhang, Xiao</creatorcontrib><creatorcontrib>Xu, Jun</creatorcontrib><creatorcontrib>Gai, Kun</creatorcontrib><title>Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning</title><description>The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and effectiveness for large-scale recommendations. To obtain efficiency and effectiveness, this paper introduces a generative retrieval framework, namely SEATER, which learns SEmAntic Tree-structured item identifiERs via contrastive learning. Specifically, we employ an encoder-decoder model to extract user interests from historical behaviors and retrieve candidates via tree-structured item identifiers. SEATER devises a balanced k-ary tree structure of item identifiers, allocating semantic space to each token individually. This strategy maintains semantic consistency within the same level, while distinct levels correlate to varying semantic granularities. This structure also maintains consistent and fast inference speed for all items. Considering the tree structure, SEATER learns identifier tokens' semantics, hierarchical relationships, and inter-token dependencies. To achieve this, we incorporate two contrastive learning tasks with the generation task to optimize both the model and identifiers. The infoNCE loss aligns the token embeddings based on their hierarchical positions. The triplet loss ranks similar identifiers in desired orders. In this way, SEATER achieves both efficiency and effectiveness. Extensive experiments on three public datasets and an industrial dataset have demonstrated that SEATER outperforms state-of-the-art models significantly.</description><subject>Computer Science - Information Retrieval</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81KxDAUhuFsXMjoBbgyN9Ca3_4spehYKAhOl0I5aU40MK1ymql692p19S0-eOFh7EqK3FTWihugz7jmSos6l1qX9pw973FGghRX5E-YKOIKR_4R0ys_4ARziiPvCTE7JDqN6UToeZtw4q3HnzNEpIWvEXjzNieCZQt1CDTH-eWCnQU4Lnj5vzvW39_1zUPWPe7b5rbLoChtBk4ZVTkLwquiKowVsg5YykoqV3jvIegRxmCUs66wBp0pg5G1KmtVO2ErvWPXf9mNN7xTnIC-hl_msDH1N-c3TnY</recordid><startdate>20230923</startdate><enddate>20230923</enddate><creator>Si, Zihua</creator><creator>Sun, Zhongxiang</creator><creator>Chen, Jiale</creator><creator>Chen, Guozhang</creator><creator>Zang, Xiaoxue</creator><creator>Zheng, Kai</creator><creator>Song, Yang</creator><creator>Zhang, Xiao</creator><creator>Xu, Jun</creator><creator>Gai, Kun</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230923</creationdate><title>Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning</title><author>Si, Zihua ; Sun, Zhongxiang ; Chen, Jiale ; Chen, Guozhang ; Zang, Xiaoxue ; Zheng, Kai ; Song, Yang ; Zhang, Xiao ; Xu, Jun ; Gai, Kun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-ab2428b5a0d268645019fe71812b6dddaf3cacf42b5b654eb47f41927929b0583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Information Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Si, Zihua</creatorcontrib><creatorcontrib>Sun, Zhongxiang</creatorcontrib><creatorcontrib>Chen, Jiale</creatorcontrib><creatorcontrib>Chen, Guozhang</creatorcontrib><creatorcontrib>Zang, Xiaoxue</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><creatorcontrib>Song, Yang</creatorcontrib><creatorcontrib>Zhang, Xiao</creatorcontrib><creatorcontrib>Xu, Jun</creatorcontrib><creatorcontrib>Gai, Kun</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Si, Zihua</au><au>Sun, Zhongxiang</au><au>Chen, Jiale</au><au>Chen, Guozhang</au><au>Zang, Xiaoxue</au><au>Zheng, Kai</au><au>Song, Yang</au><au>Zhang, Xiao</au><au>Xu, Jun</au><au>Gai, Kun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning</atitle><date>2023-09-23</date><risdate>2023</risdate><abstract>The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and effectiveness for large-scale recommendations. To obtain efficiency and effectiveness, this paper introduces a generative retrieval framework, namely SEATER, which learns SEmAntic Tree-structured item identifiERs via contrastive learning. Specifically, we employ an encoder-decoder model to extract user interests from historical behaviors and retrieve candidates via tree-structured item identifiers. SEATER devises a balanced k-ary tree structure of item identifiers, allocating semantic space to each token individually. This strategy maintains semantic consistency within the same level, while distinct levels correlate to varying semantic granularities. This structure also maintains consistent and fast inference speed for all items. Considering the tree structure, SEATER learns identifier tokens' semantics, hierarchical relationships, and inter-token dependencies. To achieve this, we incorporate two contrastive learning tasks with the generation task to optimize both the model and identifiers. The infoNCE loss aligns the token embeddings based on their hierarchical positions. The triplet loss ranks similar identifiers in desired orders. In this way, SEATER achieves both efficiency and effectiveness. Extensive experiments on three public datasets and an industrial dataset have demonstrated that SEATER outperforms state-of-the-art models significantly.</abstract><doi>10.48550/arxiv.2309.13375</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2309.13375
ispartof
issn
language eng
recordid cdi_arxiv_primary_2309_13375
source arXiv.org
subjects Computer Science - Information Retrieval
title Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T19%3A31%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generative%20Retrieval%20with%20Semantic%20Tree-Structured%20Item%20Identifiers%20via%20Contrastive%20Learning&rft.au=Si,%20Zihua&rft.date=2023-09-23&rft_id=info:doi/10.48550/arxiv.2309.13375&rft_dat=%3Carxiv_GOX%3E2309_13375%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true