STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM

Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promisin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Liu, Qijiong, Zhu, Jieming, Fan, Lu, Zhao, Zhou, Wu, Xiao-Ming
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Liu, Qijiong
Zhu, Jieming
Fan, Lu
Zhao, Zhou
Wu, Xiao-Ming
description Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promising solution that aims to tokenize each item's semantic representation into a sequence of discrete tokens. In this way, it preserves the item's semantics within these tokens and ensures that semantically similar items are represented by similar tokens. These semantic tokens have become fundamental in training generative recommendation models. However, existing generative recommendation methods typically involve multiple sub-models for embedding, quantization, and recommendation, leading to an overly complex system. In this paper, we propose to streamline the semantic tokenization and generative recommendation process with a unified framework, dubbed STORE, which leverages a single large language model (LLM) for both tasks. Specifically, we formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task. All these tasks are framed in a generative manner and trained using a single LLM backbone. Extensive experiments have been conducted to validate the effectiveness of our STORE framework across various recommendation tasks and datasets. We will release the source code and configurations for reproducible research.
doi_str_mv 10.48550/arxiv.2409.07276
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_07276</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_07276</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_072763</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DMwNzI342SIDA7xD3K1UgguKUpNzM3JzMvMS1cITs1NzCvJTFYIyc9OzcusSizJzM9TSMxLUXBPzUstAnLLUhWCUpPzc3NT81IgsuWZJRkKjgrBQP05qQo-Pr48DKxpiTnFqbxQmptB3s01xNlDF-yI-IKizNzEosp4kGPiwY4xJqwCACZmP1k</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM</title><source>arXiv.org</source><creator>Liu, Qijiong ; Zhu, Jieming ; Fan, Lu ; Zhao, Zhou ; Wu, Xiao-Ming</creator><creatorcontrib>Liu, Qijiong ; Zhu, Jieming ; Fan, Lu ; Zhao, Zhou ; Wu, Xiao-Ming</creatorcontrib><description>Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promising solution that aims to tokenize each item's semantic representation into a sequence of discrete tokens. In this way, it preserves the item's semantics within these tokens and ensures that semantically similar items are represented by similar tokens. These semantic tokens have become fundamental in training generative recommendation models. However, existing generative recommendation methods typically involve multiple sub-models for embedding, quantization, and recommendation, leading to an overly complex system. In this paper, we propose to streamline the semantic tokenization and generative recommendation process with a unified framework, dubbed STORE, which leverages a single large language model (LLM) for both tasks. Specifically, we formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task. All these tasks are framed in a generative manner and trained using a single LLM backbone. Extensive experiments have been conducted to validate the effectiveness of our STORE framework across various recommendation tasks and datasets. We will release the source code and configurations for reproducible research.</description><identifier>DOI: 10.48550/arxiv.2409.07276</identifier><language>eng</language><subject>Computer Science - Information Retrieval</subject><creationdate>2024-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.07276$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.07276$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Qijiong</creatorcontrib><creatorcontrib>Zhu, Jieming</creatorcontrib><creatorcontrib>Fan, Lu</creatorcontrib><creatorcontrib>Zhao, Zhou</creatorcontrib><creatorcontrib>Wu, Xiao-Ming</creatorcontrib><title>STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM</title><description>Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promising solution that aims to tokenize each item's semantic representation into a sequence of discrete tokens. In this way, it preserves the item's semantics within these tokens and ensures that semantically similar items are represented by similar tokens. These semantic tokens have become fundamental in training generative recommendation models. However, existing generative recommendation methods typically involve multiple sub-models for embedding, quantization, and recommendation, leading to an overly complex system. In this paper, we propose to streamline the semantic tokenization and generative recommendation process with a unified framework, dubbed STORE, which leverages a single large language model (LLM) for both tasks. Specifically, we formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task. All these tasks are framed in a generative manner and trained using a single LLM backbone. Extensive experiments have been conducted to validate the effectiveness of our STORE framework across various recommendation tasks and datasets. We will release the source code and configurations for reproducible research.</description><subject>Computer Science - Information Retrieval</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DMwNzI342SIDA7xD3K1UgguKUpNzM3JzMvMS1cITs1NzCvJTFYIyc9OzcusSizJzM9TSMxLUXBPzUstAnLLUhWCUpPzc3NT81IgsuWZJRkKjgrBQP05qQo-Pr48DKxpiTnFqbxQmptB3s01xNlDF-yI-IKizNzEosp4kGPiwY4xJqwCACZmP1k</recordid><startdate>20240911</startdate><enddate>20240911</enddate><creator>Liu, Qijiong</creator><creator>Zhu, Jieming</creator><creator>Fan, Lu</creator><creator>Zhao, Zhou</creator><creator>Wu, Xiao-Ming</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240911</creationdate><title>STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM</title><author>Liu, Qijiong ; Zhu, Jieming ; Fan, Lu ; Zhao, Zhou ; Wu, Xiao-Ming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_072763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Information Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Qijiong</creatorcontrib><creatorcontrib>Zhu, Jieming</creatorcontrib><creatorcontrib>Fan, Lu</creatorcontrib><creatorcontrib>Zhao, Zhou</creatorcontrib><creatorcontrib>Wu, Xiao-Ming</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Qijiong</au><au>Zhu, Jieming</au><au>Fan, Lu</au><au>Zhao, Zhou</au><au>Wu, Xiao-Ming</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM</atitle><date>2024-09-11</date><risdate>2024</risdate><abstract>Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promising solution that aims to tokenize each item's semantic representation into a sequence of discrete tokens. In this way, it preserves the item's semantics within these tokens and ensures that semantically similar items are represented by similar tokens. These semantic tokens have become fundamental in training generative recommendation models. However, existing generative recommendation methods typically involve multiple sub-models for embedding, quantization, and recommendation, leading to an overly complex system. In this paper, we propose to streamline the semantic tokenization and generative recommendation process with a unified framework, dubbed STORE, which leverages a single large language model (LLM) for both tasks. Specifically, we formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task. All these tasks are framed in a generative manner and trained using a single LLM backbone. Extensive experiments have been conducted to validate the effectiveness of our STORE framework across various recommendation tasks and datasets. We will release the source code and configurations for reproducible research.</abstract><doi>10.48550/arxiv.2409.07276</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2409.07276
ispartof
issn
language eng
recordid cdi_arxiv_primary_2409_07276
source arXiv.org
subjects Computer Science - Information Retrieval
title STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T16%3A24%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=STORE:%20Streamlining%20Semantic%20Tokenization%20and%20Generative%20Recommendation%20with%20A%20Single%20LLM&rft.au=Liu,%20Qijiong&rft.date=2024-09-11&rft_id=info:doi/10.48550/arxiv.2409.07276&rft_dat=%3Carxiv_GOX%3E2409_07276%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true