Exploring and Exploiting Data Heterogeneity in Recommendation

Massive amounts of data are the foundation of data-driven recommendation models. As an inherent nature of big data, data heterogeneity widely exists in real-world recommendation systems. It reflects the differences in the properties among sub-populations. Ignoring the heterogeneity in recommendation...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wang, Zimu, Liu, Jiashuo, Zou, Hao, Zhang, Xingxuan, He, Yue, Liang, Dongxu, Cui, Peng
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Information Retrieval Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wang, Zimu Liu, Jiashuo Zou, Hao Zhang, Xingxuan He, Yue Liang, Dongxu Cui, Peng
description	Massive amounts of data are the foundation of data-driven recommendation models. As an inherent nature of big data, data heterogeneity widely exists in real-world recommendation systems. It reflects the differences in the properties among sub-populations. Ignoring the heterogeneity in recommendation data could limit the performance of recommendation models, hurt the sub-populational robustness, and make the models misled by biases. However, data heterogeneity has not attracted substantial attention in the recommendation community. Therefore, it inspires us to adequately explore and exploit heterogeneity for solving the above problems and assisting data analysis. In this work, we focus on exploring two representative categories of heterogeneity in recommendation data that is the heterogeneity of prediction mechanism and covariate distribution and propose an algorithm that explores the heterogeneity through a bilevel clustering method. Furthermore, the uncovered heterogeneity is exploited for two purposes in recommendation scenarios which are prediction with multiple sub-models and supporting debias. Extensive experiments on real-world data validate the existence of heterogeneity in recommendation data and the effectiveness of exploring and exploiting data heterogeneity in recommendation.
doi_str_mv	10.48550/arxiv.2305.15431
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_15431</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_15431</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-a289a3e79d6430591b2d2927b131e6a68a34e1ef681d6b45a111bd8ba73d5a323</originalsourceid><addsrcrecordid>eNotj8FOwzAQRH3pAZV-ACf8AwlZr-04Bw5VKRSpEhLqPVrjbWWpcSpjofbvSwOn0bvMzBPiAZpaO2OaJ8rn-FMrbEwNRiPcief1-XQcc0wHSSnIiWK54QsVkhsunMcDJ47lImOSn_w1DgOnQCWO6V7M9nT85sV_zsXudb1bbartx9v7armtyLZQkXIdIbddsPp3ugOvgupU6wGBLVlHqBl4bx0E67UhAPDBeWoxGEKFc_H4Vzv97085DpQv_c2jnzzwCiQpQnQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Exploring and Exploiting Data Heterogeneity in Recommendation</title><source>arXiv.org</source><creator>Wang, Zimu ; Liu, Jiashuo ; Zou, Hao ; Zhang, Xingxuan ; He, Yue ; Liang, Dongxu ; Cui, Peng</creator><creatorcontrib>Wang, Zimu ; Liu, Jiashuo ; Zou, Hao ; Zhang, Xingxuan ; He, Yue ; Liang, Dongxu ; Cui, Peng</creatorcontrib><description>Massive amounts of data are the foundation of data-driven recommendation models. As an inherent nature of big data, data heterogeneity widely exists in real-world recommendation systems. It reflects the differences in the properties among sub-populations. Ignoring the heterogeneity in recommendation data could limit the performance of recommendation models, hurt the sub-populational robustness, and make the models misled by biases. However, data heterogeneity has not attracted substantial attention in the recommendation community. Therefore, it inspires us to adequately explore and exploit heterogeneity for solving the above problems and assisting data analysis. In this work, we focus on exploring two representative categories of heterogeneity in recommendation data that is the heterogeneity of prediction mechanism and covariate distribution and propose an algorithm that explores the heterogeneity through a bilevel clustering method. Furthermore, the uncovered heterogeneity is exploited for two purposes in recommendation scenarios which are prediction with multiple sub-models and supporting debias. Extensive experiments on real-world data validate the existence of heterogeneity in recommendation data and the effectiveness of exploring and exploiting data heterogeneity in recommendation.</description><identifier>DOI: 10.48550/arxiv.2305.15431</identifier><language>eng</language><subject>Computer Science - Information Retrieval ; Computer Science - Learning</subject><creationdate>2023-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.15431$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.15431$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Zimu</creatorcontrib><creatorcontrib>Liu, Jiashuo</creatorcontrib><creatorcontrib>Zou, Hao</creatorcontrib><creatorcontrib>Zhang, Xingxuan</creatorcontrib><creatorcontrib>He, Yue</creatorcontrib><creatorcontrib>Liang, Dongxu</creatorcontrib><creatorcontrib>Cui, Peng</creatorcontrib><title>Exploring and Exploiting Data Heterogeneity in Recommendation</title><description>Massive amounts of data are the foundation of data-driven recommendation models. As an inherent nature of big data, data heterogeneity widely exists in real-world recommendation systems. It reflects the differences in the properties among sub-populations. Ignoring the heterogeneity in recommendation data could limit the performance of recommendation models, hurt the sub-populational robustness, and make the models misled by biases. However, data heterogeneity has not attracted substantial attention in the recommendation community. Therefore, it inspires us to adequately explore and exploit heterogeneity for solving the above problems and assisting data analysis. In this work, we focus on exploring two representative categories of heterogeneity in recommendation data that is the heterogeneity of prediction mechanism and covariate distribution and propose an algorithm that explores the heterogeneity through a bilevel clustering method. Furthermore, the uncovered heterogeneity is exploited for two purposes in recommendation scenarios which are prediction with multiple sub-models and supporting debias. Extensive experiments on real-world data validate the existence of heterogeneity in recommendation data and the effectiveness of exploring and exploiting data heterogeneity in recommendation.</description><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOwzAQRH3pAZV-ACf8AwlZr-04Bw5VKRSpEhLqPVrjbWWpcSpjofbvSwOn0bvMzBPiAZpaO2OaJ8rn-FMrbEwNRiPcief1-XQcc0wHSSnIiWK54QsVkhsunMcDJ47lImOSn_w1DgOnQCWO6V7M9nT85sV_zsXudb1bbartx9v7armtyLZQkXIdIbddsPp3ugOvgupU6wGBLVlHqBl4bx0E67UhAPDBeWoxGEKFc_H4Vzv97085DpQv_c2jnzzwCiQpQnQ</recordid><startdate>20230521</startdate><enddate>20230521</enddate><creator>Wang, Zimu</creator><creator>Liu, Jiashuo</creator><creator>Zou, Hao</creator><creator>Zhang, Xingxuan</creator><creator>He, Yue</creator><creator>Liang, Dongxu</creator><creator>Cui, Peng</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230521</creationdate><title>Exploring and Exploiting Data Heterogeneity in Recommendation</title><author>Wang, Zimu ; Liu, Jiashuo ; Zou, Hao ; Zhang, Xingxuan ; He, Yue ; Liang, Dongxu ; Cui, Peng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-a289a3e79d6430591b2d2927b131e6a68a34e1ef681d6b45a111bd8ba73d5a323</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Zimu</creatorcontrib><creatorcontrib>Liu, Jiashuo</creatorcontrib><creatorcontrib>Zou, Hao</creatorcontrib><creatorcontrib>Zhang, Xingxuan</creatorcontrib><creatorcontrib>He, Yue</creatorcontrib><creatorcontrib>Liang, Dongxu</creatorcontrib><creatorcontrib>Cui, Peng</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Zimu</au><au>Liu, Jiashuo</au><au>Zou, Hao</au><au>Zhang, Xingxuan</au><au>He, Yue</au><au>Liang, Dongxu</au><au>Cui, Peng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploring and Exploiting Data Heterogeneity in Recommendation</atitle><date>2023-05-21</date><risdate>2023</risdate><abstract>Massive amounts of data are the foundation of data-driven recommendation models. As an inherent nature of big data, data heterogeneity widely exists in real-world recommendation systems. It reflects the differences in the properties among sub-populations. Ignoring the heterogeneity in recommendation data could limit the performance of recommendation models, hurt the sub-populational robustness, and make the models misled by biases. However, data heterogeneity has not attracted substantial attention in the recommendation community. Therefore, it inspires us to adequately explore and exploit heterogeneity for solving the above problems and assisting data analysis. In this work, we focus on exploring two representative categories of heterogeneity in recommendation data that is the heterogeneity of prediction mechanism and covariate distribution and propose an algorithm that explores the heterogeneity through a bilevel clustering method. Furthermore, the uncovered heterogeneity is exploited for two purposes in recommendation scenarios which are prediction with multiple sub-models and supporting debias. Extensive experiments on real-world data validate the existence of heterogeneity in recommendation data and the effectiveness of exploring and exploiting data heterogeneity in recommendation.</abstract><doi>10.48550/arxiv.2305.15431</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2305.15431
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2305_15431
source	arXiv.org
subjects	Computer Science - Information Retrieval Computer Science - Learning
title	Exploring and Exploiting Data Heterogeneity in Recommendation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T02%3A28%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploring%20and%20Exploiting%20Data%20Heterogeneity%20in%20Recommendation&rft.au=Wang,%20Zimu&rft.date=2023-05-21&rft_id=info:doi/10.48550/arxiv.2305.15431&rft_dat=%3Carxiv_GOX%3E2305_15431%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true