Improving Out-of-Vocabulary Handling in Recommendation Systems

Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough informatio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Shiao, William, Ju, Mingxuan, Guo, Zhichun, Chen, Xin, Papalexakis, Evangelos, Zhao, Tong, Shah, Neil, Liu, Yozen
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Information Retrieval
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Shiao, William Ju, Mingxuan Guo, Zhichun Chen, Xin Papalexakis, Evangelos Zhao, Tong Shah, Neil Liu, Yozen
description	Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommending new users and items unseen (out-of-vocabulary, or OOV) at training time. This setting is known as the inductive setting and is especially problematic for factorization-based models, which rely on encoding only those users/items seen at training time with fixed parameter vectors. Many existing solutions applied in practice are often naive, such as assigning OOV users/items to random buckets. In this work, we tackle this problem and propose approaches that better leverage available user/item features to improve OOV handling at the embedding table level. We discuss general-purpose plug-and-play approaches that are easily applicable to most RS models and improve inductive performance without negatively impacting transductive model performance. We extensively evaluate 9 OOV embedding methods on 5 models across 4 datasets (spanning different domains). One of these datasets is a proprietary production dataset from a prominent RS employed by a large social platform serving hundreds of millions of daily active users. In our experiments, we find that several proposed methods that exploit feature similarity using LSH consistently outperform alternatives on most model-dataset combinations, with the best method showing a mean improvement of 3.74% over the industry standard baseline in inductive performance. We release our code and hope our work helps practitioners make more informed decisions when handling OOV for their RS and further inspires academic research into improving OOV support in RS.
doi_str_mv	10.48550/arxiv.2403.18280
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_18280</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_18280</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-da5ffabd2c9d83289b8d4be5d03374e985425b740b73646360ceefcf44eae4f3</originalsourceid><addsrcrecordid>eNotj7FqwzAURbV0KGk_oFP8A3IV68mWl0IJbRMIBJLQ1TxJT0VgScF2QvP3bdJOdzhwOYexp4UoQSslnnH4DueyAiHLha60uGcv63gc8jmkr2J7mnj2_DNbNKceh0uxwuT6Kwqp2JHNMVJyOIWciv1lnCiOD-zOYz_S4__O2P797bBc8c32Y7183XCsG8EdKu_RuMq2TstKt0Y7MKSckLIBarWCSpkGhGlkDbWshSXy1gMQEng5Y_O_15t_dxxC_NXrrh3drUP-AIdwQ6o</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Improving Out-of-Vocabulary Handling in Recommendation Systems</title><source>arXiv.org</source><creator>Shiao, William ; Ju, Mingxuan ; Guo, Zhichun ; Chen, Xin ; Papalexakis, Evangelos ; Zhao, Tong ; Shah, Neil ; Liu, Yozen</creator><creatorcontrib>Shiao, William ; Ju, Mingxuan ; Guo, Zhichun ; Chen, Xin ; Papalexakis, Evangelos ; Zhao, Tong ; Shah, Neil ; Liu, Yozen</creatorcontrib><description>Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommending new users and items unseen (out-of-vocabulary, or OOV) at training time. This setting is known as the inductive setting and is especially problematic for factorization-based models, which rely on encoding only those users/items seen at training time with fixed parameter vectors. Many existing solutions applied in practice are often naive, such as assigning OOV users/items to random buckets. In this work, we tackle this problem and propose approaches that better leverage available user/item features to improve OOV handling at the embedding table level. We discuss general-purpose plug-and-play approaches that are easily applicable to most RS models and improve inductive performance without negatively impacting transductive model performance. We extensively evaluate 9 OOV embedding methods on 5 models across 4 datasets (spanning different domains). One of these datasets is a proprietary production dataset from a prominent RS employed by a large social platform serving hundreds of millions of daily active users. In our experiments, we find that several proposed methods that exploit feature similarity using LSH consistently outperform alternatives on most model-dataset combinations, with the best method showing a mean improvement of 3.74% over the industry standard baseline in inductive performance. We release our code and hope our work helps practitioners make more informed decisions when handling OOV for their RS and further inspires academic research into improving OOV support in RS.</description><identifier>DOI: 10.48550/arxiv.2403.18280</identifier><language>eng</language><subject>Computer Science - Information Retrieval</subject><creationdate>2024-03</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.18280$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.18280$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Shiao, William</creatorcontrib><creatorcontrib>Ju, Mingxuan</creatorcontrib><creatorcontrib>Guo, Zhichun</creatorcontrib><creatorcontrib>Chen, Xin</creatorcontrib><creatorcontrib>Papalexakis, Evangelos</creatorcontrib><creatorcontrib>Zhao, Tong</creatorcontrib><creatorcontrib>Shah, Neil</creatorcontrib><creatorcontrib>Liu, Yozen</creatorcontrib><title>Improving Out-of-Vocabulary Handling in Recommendation Systems</title><description>Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommending new users and items unseen (out-of-vocabulary, or OOV) at training time. This setting is known as the inductive setting and is especially problematic for factorization-based models, which rely on encoding only those users/items seen at training time with fixed parameter vectors. Many existing solutions applied in practice are often naive, such as assigning OOV users/items to random buckets. In this work, we tackle this problem and propose approaches that better leverage available user/item features to improve OOV handling at the embedding table level. We discuss general-purpose plug-and-play approaches that are easily applicable to most RS models and improve inductive performance without negatively impacting transductive model performance. We extensively evaluate 9 OOV embedding methods on 5 models across 4 datasets (spanning different domains). One of these datasets is a proprietary production dataset from a prominent RS employed by a large social platform serving hundreds of millions of daily active users. In our experiments, we find that several proposed methods that exploit feature similarity using LSH consistently outperform alternatives on most model-dataset combinations, with the best method showing a mean improvement of 3.74% over the industry standard baseline in inductive performance. We release our code and hope our work helps practitioners make more informed decisions when handling OOV for their RS and further inspires academic research into improving OOV support in RS.</description><subject>Computer Science - Information Retrieval</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7FqwzAURbV0KGk_oFP8A3IV68mWl0IJbRMIBJLQ1TxJT0VgScF2QvP3bdJOdzhwOYexp4UoQSslnnH4DueyAiHLha60uGcv63gc8jmkr2J7mnj2_DNbNKceh0uxwuT6Kwqp2JHNMVJyOIWciv1lnCiOD-zOYz_S4__O2P797bBc8c32Y7183XCsG8EdKu_RuMq2TstKt0Y7MKSckLIBarWCSpkGhGlkDbWshSXy1gMQEng5Y_O_15t_dxxC_NXrrh3drUP-AIdwQ6o</recordid><startdate>20240327</startdate><enddate>20240327</enddate><creator>Shiao, William</creator><creator>Ju, Mingxuan</creator><creator>Guo, Zhichun</creator><creator>Chen, Xin</creator><creator>Papalexakis, Evangelos</creator><creator>Zhao, Tong</creator><creator>Shah, Neil</creator><creator>Liu, Yozen</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240327</creationdate><title>Improving Out-of-Vocabulary Handling in Recommendation Systems</title><author>Shiao, William ; Ju, Mingxuan ; Guo, Zhichun ; Chen, Xin ; Papalexakis, Evangelos ; Zhao, Tong ; Shah, Neil ; Liu, Yozen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-da5ffabd2c9d83289b8d4be5d03374e985425b740b73646360ceefcf44eae4f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Information Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Shiao, William</creatorcontrib><creatorcontrib>Ju, Mingxuan</creatorcontrib><creatorcontrib>Guo, Zhichun</creatorcontrib><creatorcontrib>Chen, Xin</creatorcontrib><creatorcontrib>Papalexakis, Evangelos</creatorcontrib><creatorcontrib>Zhao, Tong</creatorcontrib><creatorcontrib>Shah, Neil</creatorcontrib><creatorcontrib>Liu, Yozen</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shiao, William</au><au>Ju, Mingxuan</au><au>Guo, Zhichun</au><au>Chen, Xin</au><au>Papalexakis, Evangelos</au><au>Zhao, Tong</au><au>Shah, Neil</au><au>Liu, Yozen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving Out-of-Vocabulary Handling in Recommendation Systems</atitle><date>2024-03-27</date><risdate>2024</risdate><abstract>Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommending new users and items unseen (out-of-vocabulary, or OOV) at training time. This setting is known as the inductive setting and is especially problematic for factorization-based models, which rely on encoding only those users/items seen at training time with fixed parameter vectors. Many existing solutions applied in practice are often naive, such as assigning OOV users/items to random buckets. In this work, we tackle this problem and propose approaches that better leverage available user/item features to improve OOV handling at the embedding table level. We discuss general-purpose plug-and-play approaches that are easily applicable to most RS models and improve inductive performance without negatively impacting transductive model performance. We extensively evaluate 9 OOV embedding methods on 5 models across 4 datasets (spanning different domains). One of these datasets is a proprietary production dataset from a prominent RS employed by a large social platform serving hundreds of millions of daily active users. In our experiments, we find that several proposed methods that exploit feature similarity using LSH consistently outperform alternatives on most model-dataset combinations, with the best method showing a mean improvement of 3.74% over the industry standard baseline in inductive performance. We release our code and hope our work helps practitioners make more informed decisions when handling OOV for their RS and further inspires academic research into improving OOV support in RS.</abstract><doi>10.48550/arxiv.2403.18280</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2403.18280
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2403_18280
source	arXiv.org
subjects	Computer Science - Information Retrieval
title	Improving Out-of-Vocabulary Handling in Recommendation Systems
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T18%3A02%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20Out-of-Vocabulary%20Handling%20in%20Recommendation%20Systems&rft.au=Shiao,%20William&rft.date=2024-03-27&rft_id=info:doi/10.48550/arxiv.2403.18280&rft_dat=%3Carxiv_GOX%3E2403_18280%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true