Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation

In specific domains like fashion, music, and movie recommendation, the multi-faceted features characterizing products and services may influence each customer on online selling platforms differently, paving the way to novel multimodal recommendation models that can learn from such multimodal content...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Attimonelli, Matteo, Danese, Danilo, Di Fazio, Angela, Malitesta, Daniele, Pomo, Claudio, Di Noia, Tommaso
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Attimonelli, Matteo
Danese, Danilo
Di Fazio, Angela
Malitesta, Daniele
Pomo, Claudio
Di Noia, Tommaso
description In specific domains like fashion, music, and movie recommendation, the multi-faceted features characterizing products and services may influence each customer on online selling platforms differently, paving the way to novel multimodal recommendation models that can learn from such multimodal content. According to the literature, the common multimodal recommendation pipeline involves (i) extracting multimodal features, (ii) refining their high-level representations to suit the recommendation task, (iii) optionally fusing all multimodal features, and (iv) predicting the user-item score. While great effort has been put into designing optimal solutions for (ii-iv), to the best of our knowledge, very little attention has been devoted to exploring procedures for (i). In this respect, the existing literature outlines the large availability of multimodal datasets and the ever-growing number of large models accounting for multimodal-aware tasks, but (at the same time) an unjustified adoption of limited standardized solutions. This motivates us to explore more extensive techniques for the (i) stage of the pipeline. To this end, this paper settles as the first attempt to offer a large-scale benchmarking for multimodal recommender systems, with a specific focus on multimodal extractors. Specifically, we take advantage of two popular and recent frameworks for multimodal feature extraction and reproducibility in recommendation, Ducho and Elliot, to offer a unified and ready-to-use experimental environment able to run extensive benchmarking analyses leveraging novel multimodal feature extractors. Results, largely validated under different hyper-parameter settings for the chosen extractors, provide important insights on how to train and tune the next generation of multimodal recommendation algorithms.
doi_str_mv 10.48550/arxiv.2409.15857
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_15857</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_15857</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_158573</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0tTA152TwcClNzshXyE1NLSlWcM3JycwvsVLwSSxKT9UtTk7MSVVwSs1LzshNLMouVkjLL1LwLc0pyczNT0nMUQhKTc7PzU3NS0ksyczP42FgTUvMKU7lhdLcDPJuriHOHrpgO-MLijKBhlTGg-yOB9ttTFgFANtBOb0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation</title><source>arXiv.org</source><creator>Attimonelli, Matteo ; Danese, Danilo ; Di Fazio, Angela ; Malitesta, Daniele ; Pomo, Claudio ; Di Noia, Tommaso</creator><creatorcontrib>Attimonelli, Matteo ; Danese, Danilo ; Di Fazio, Angela ; Malitesta, Daniele ; Pomo, Claudio ; Di Noia, Tommaso</creatorcontrib><description>In specific domains like fashion, music, and movie recommendation, the multi-faceted features characterizing products and services may influence each customer on online selling platforms differently, paving the way to novel multimodal recommendation models that can learn from such multimodal content. According to the literature, the common multimodal recommendation pipeline involves (i) extracting multimodal features, (ii) refining their high-level representations to suit the recommendation task, (iii) optionally fusing all multimodal features, and (iv) predicting the user-item score. While great effort has been put into designing optimal solutions for (ii-iv), to the best of our knowledge, very little attention has been devoted to exploring procedures for (i). In this respect, the existing literature outlines the large availability of multimodal datasets and the ever-growing number of large models accounting for multimodal-aware tasks, but (at the same time) an unjustified adoption of limited standardized solutions. This motivates us to explore more extensive techniques for the (i) stage of the pipeline. To this end, this paper settles as the first attempt to offer a large-scale benchmarking for multimodal recommender systems, with a specific focus on multimodal extractors. Specifically, we take advantage of two popular and recent frameworks for multimodal feature extraction and reproducibility in recommendation, Ducho and Elliot, to offer a unified and ready-to-use experimental environment able to run extensive benchmarking analyses leveraging novel multimodal feature extractors. Results, largely validated under different hyper-parameter settings for the chosen extractors, provide important insights on how to train and tune the next generation of multimodal recommendation algorithms.</description><identifier>DOI: 10.48550/arxiv.2409.15857</identifier><language>eng</language><subject>Computer Science - Information Retrieval</subject><creationdate>2024-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.15857$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.15857$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Attimonelli, Matteo</creatorcontrib><creatorcontrib>Danese, Danilo</creatorcontrib><creatorcontrib>Di Fazio, Angela</creatorcontrib><creatorcontrib>Malitesta, Daniele</creatorcontrib><creatorcontrib>Pomo, Claudio</creatorcontrib><creatorcontrib>Di Noia, Tommaso</creatorcontrib><title>Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation</title><description>In specific domains like fashion, music, and movie recommendation, the multi-faceted features characterizing products and services may influence each customer on online selling platforms differently, paving the way to novel multimodal recommendation models that can learn from such multimodal content. According to the literature, the common multimodal recommendation pipeline involves (i) extracting multimodal features, (ii) refining their high-level representations to suit the recommendation task, (iii) optionally fusing all multimodal features, and (iv) predicting the user-item score. While great effort has been put into designing optimal solutions for (ii-iv), to the best of our knowledge, very little attention has been devoted to exploring procedures for (i). In this respect, the existing literature outlines the large availability of multimodal datasets and the ever-growing number of large models accounting for multimodal-aware tasks, but (at the same time) an unjustified adoption of limited standardized solutions. This motivates us to explore more extensive techniques for the (i) stage of the pipeline. To this end, this paper settles as the first attempt to offer a large-scale benchmarking for multimodal recommender systems, with a specific focus on multimodal extractors. Specifically, we take advantage of two popular and recent frameworks for multimodal feature extraction and reproducibility in recommendation, Ducho and Elliot, to offer a unified and ready-to-use experimental environment able to run extensive benchmarking analyses leveraging novel multimodal feature extractors. Results, largely validated under different hyper-parameter settings for the chosen extractors, provide important insights on how to train and tune the next generation of multimodal recommendation algorithms.</description><subject>Computer Science - Information Retrieval</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0tTA152TwcClNzshXyE1NLSlWcM3JycwvsVLwSSxKT9UtTk7MSVVwSs1LzshNLMouVkjLL1LwLc0pyczNT0nMUQhKTc7PzU3NS0ksyczP42FgTUvMKU7lhdLcDPJuriHOHrpgO-MLijKBhlTGg-yOB9ttTFgFANtBOb0</recordid><startdate>20240924</startdate><enddate>20240924</enddate><creator>Attimonelli, Matteo</creator><creator>Danese, Danilo</creator><creator>Di Fazio, Angela</creator><creator>Malitesta, Daniele</creator><creator>Pomo, Claudio</creator><creator>Di Noia, Tommaso</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240924</creationdate><title>Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation</title><author>Attimonelli, Matteo ; Danese, Danilo ; Di Fazio, Angela ; Malitesta, Daniele ; Pomo, Claudio ; Di Noia, Tommaso</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_158573</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Information Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Attimonelli, Matteo</creatorcontrib><creatorcontrib>Danese, Danilo</creatorcontrib><creatorcontrib>Di Fazio, Angela</creatorcontrib><creatorcontrib>Malitesta, Daniele</creatorcontrib><creatorcontrib>Pomo, Claudio</creatorcontrib><creatorcontrib>Di Noia, Tommaso</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Attimonelli, Matteo</au><au>Danese, Danilo</au><au>Di Fazio, Angela</au><au>Malitesta, Daniele</au><au>Pomo, Claudio</au><au>Di Noia, Tommaso</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation</atitle><date>2024-09-24</date><risdate>2024</risdate><abstract>In specific domains like fashion, music, and movie recommendation, the multi-faceted features characterizing products and services may influence each customer on online selling platforms differently, paving the way to novel multimodal recommendation models that can learn from such multimodal content. According to the literature, the common multimodal recommendation pipeline involves (i) extracting multimodal features, (ii) refining their high-level representations to suit the recommendation task, (iii) optionally fusing all multimodal features, and (iv) predicting the user-item score. While great effort has been put into designing optimal solutions for (ii-iv), to the best of our knowledge, very little attention has been devoted to exploring procedures for (i). In this respect, the existing literature outlines the large availability of multimodal datasets and the ever-growing number of large models accounting for multimodal-aware tasks, but (at the same time) an unjustified adoption of limited standardized solutions. This motivates us to explore more extensive techniques for the (i) stage of the pipeline. To this end, this paper settles as the first attempt to offer a large-scale benchmarking for multimodal recommender systems, with a specific focus on multimodal extractors. Specifically, we take advantage of two popular and recent frameworks for multimodal feature extraction and reproducibility in recommendation, Ducho and Elliot, to offer a unified and ready-to-use experimental environment able to run extensive benchmarking analyses leveraging novel multimodal feature extractors. Results, largely validated under different hyper-parameter settings for the chosen extractors, provide important insights on how to train and tune the next generation of multimodal recommendation algorithms.</abstract><doi>10.48550/arxiv.2409.15857</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2409.15857
ispartof
issn
language eng
recordid cdi_arxiv_primary_2409_15857
source arXiv.org
subjects Computer Science - Information Retrieval
title Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T13%3A33%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Ducho%20meets%20Elliot:%20Large-scale%20Benchmarks%20for%20Multimodal%20Recommendation&rft.au=Attimonelli,%20Matteo&rft.date=2024-09-24&rft_id=info:doi/10.48550/arxiv.2409.15857&rft_dat=%3Carxiv_GOX%3E2409_15857%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true