STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM

Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promisin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Liu, Qijiong, Zhu, Jieming, Fan, Lu, Zhao, Zhou, Wu, Xiao-Ming
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Information Retrieval
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Liu, Qijiong Zhu, Jieming Fan, Lu Zhao, Zhou Wu, Xiao-Ming
description	Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promising solution that aims to tokenize each item's semantic representation into a sequence of discrete tokens. In this way, it preserves the item's semantics within these tokens and ensures that semantically similar items are represented by similar tokens. These semantic tokens have become fundamental in training generative recommendation models. However, existing generative recommendation methods typically involve multiple sub-models for embedding, quantization, and recommendation, leading to an overly complex system. In this paper, we propose to streamline the semantic tokenization and generative recommendation process with a unified framework, dubbed STORE, which leverages a single large language model (LLM) for both tasks. Specifically, we formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task. All these tasks are framed in a generative manner and trained using a single LLM backbone. Extensive experiments have been conducted to validate the effectiveness of our STORE framework across various recommendation tasks and datasets. We will release the source code and configurations for reproducible research.
doi_str_mv	10.48550/arxiv.2409.07276
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_07276</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_07276</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_072763</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DMwNzI342SIDA7xD3K1UgguKUpNzM3JzMvMS1cITs1NzCvJTFYIyc9OzcusSizJzM9TSMxLUXBPzUstAnLLUhWCUpPzc3NT81IgsuWZJRkKjgrBQP05qQo-Pr48DKxpiTnFqbxQmptB3s01xNlDF-yI-IKizNzEosp4kGPiwY4xJqwCACZmP1k</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM</title><source>arXiv.org</source><creator>Liu, Qijiong ; Zhu, Jieming ; Fan, Lu ; Zhao, Zhou ; Wu, Xiao-Ming</creator><creatorcontrib>Liu, Qijiong ; Zhu, Jieming ; Fan, Lu ; Zhao, Zhou ; Wu, Xiao-Ming</creatorcontrib><description>Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promising solution that aims to tokenize each item's semantic representation into a sequence of discrete tokens. In this way, it preserves the item's semantics within these tokens and ensures that semantically similar items are represented by similar tokens. These semantic tokens have become fundamental in training generative recommendation models. However, existing generative recommendation methods typically involve multiple sub-models for embedding, quantization, and recommendation, leading to an overly complex system. In this paper, we propose to streamline the semantic tokenization and generative recommendation process with a unified framework, dubbed STORE, which leverages a single large language model (LLM) for both tasks. Specifically, we formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task. All these tasks are framed in a generative manner and trained using a single LLM backbone. Extensive experiments have been conducted to validate the effectiveness of our STORE framework across various recommendation tasks and datasets. We will release the source code and configurations for reproducible research.</description><identifier>DOI: 10.48550/arxiv.2409.07276</identifier><language>eng</language><subject>Computer Science - Information Retrieval</subject><creationdate>2024-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.07276$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.07276$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Qijiong</creatorcontrib><creatorcontrib>Zhu, Jieming</creatorcontrib><creatorcontrib>Fan, Lu</creatorcontrib><creatorcontrib>Zhao, Zhou</creatorcontrib><creatorcontrib>Wu, Xiao-Ming</creatorcontrib><title>STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM</title><description>Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promising solution that aims to tokenize each item's semantic representation into a sequence of discrete tokens. In this way, it preserves the item's semantics within these tokens and ensures that semantically similar items are represented by similar tokens. These semantic tokens have become fundamental in training generative recommendation models. However, existing generative recommendation methods typically involve multiple sub-models for embedding, quantization, and recommendation, leading to an overly complex system. In this paper, we propose to streamline the semantic tokenization and generative recommendation process with a unified framework, dubbed STORE, which leverages a single large language model (LLM) for both tasks. Specifically, we formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task. All these tasks are framed in a generative manner and trained using a single LLM backbone. Extensive experiments have been conducted to validate the effectiveness of our STORE framework across various recommendation tasks and datasets. We will release the source code and configurations for reproducible research.</description><subject>Computer Science - Information Retrieval</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DMwNzI342SIDA7xD3K1UgguKUpNzM3JzMvMS1cITs1NzCvJTFYIyc9OzcusSizJzM9TSMxLUXBPzUstAnLLUhWCUpPzc3NT81IgsuWZJRkKjgrBQP05qQo-Pr48DKxpiTnFqbxQmptB3s01xNlDF-yI-IKizNzEosp4kGPiwY4xJqwCACZmP1k</recordid><startdate>20240911</startdate><enddate>20240911</enddate><creator>Liu, Qijiong</creator><creator>Zhu, Jieming</creator><creator>Fan, Lu</creator><creator>Zhao, Zhou</creator><creator>Wu, Xiao-Ming</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240911</creationdate><title>STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM</title><author>Liu, Qijiong ; Zhu, Jieming ; Fan, Lu ; Zhao, Zhou ; Wu, Xiao-Ming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_072763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Information Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Qijiong</creatorcontrib><creatorcontrib>Zhu, Jieming</creatorcontrib><creatorcontrib>Fan, Lu</creatorcontrib><creatorcontrib>Zhao, Zhou</creatorcontrib><creatorcontrib>Wu, Xiao-Ming</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Qijiong</au><au>Zhu, Jieming</au><au>Fan, Lu</au><au>Zhao, Zhou</au><au>Wu, Xiao-Ming</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM</atitle><date>2024-09-11</date><risdate>2024</risdate><abstract>Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promising solution that aims to tokenize each item's semantic representation into a sequence of discrete tokens. In this way, it preserves the item's semantics within these tokens and ensures that semantically similar items are represented by similar tokens. These semantic tokens have become fundamental in training generative recommendation models. However, existing generative recommendation methods typically involve multiple sub-models for embedding, quantization, and recommendation, leading to an overly complex system. In this paper, we propose to streamline the semantic tokenization and generative recommendation process with a unified framework, dubbed STORE, which leverages a single large language model (LLM) for both tasks. Specifically, we formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task. All these tasks are framed in a generative manner and trained using a single LLM backbone. Extensive experiments have been conducted to validate the effectiveness of our STORE framework across various recommendation tasks and datasets. We will release the source code and configurations for reproducible research.</abstract><doi>10.48550/arxiv.2409.07276</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2409.07276
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2409_07276
source	arXiv.org
subjects	Computer Science - Information Retrieval
title	STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T16%3A24%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=STORE:%20Streamlining%20Semantic%20Tokenization%20and%20Generative%20Recommendation%20with%20A%20Single%20LLM&rft.au=Liu,%20Qijiong&rft.date=2024-09-11&rft_id=info:doi/10.48550/arxiv.2409.07276&rft_dat=%3Carxiv_GOX%3E2409_07276%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true