Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation
In specific domains like fashion, music, and movie recommendation, the multi-faceted features characterizing products and services may influence each customer on online selling platforms differently, paving the way to novel multimodal recommendation models that can learn from such multimodal content...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Attimonelli, Matteo Danese, Danilo Di Fazio, Angela Malitesta, Daniele Pomo, Claudio Di Noia, Tommaso |
description | In specific domains like fashion, music, and movie recommendation, the
multi-faceted features characterizing products and services may influence each
customer on online selling platforms differently, paving the way to novel
multimodal recommendation models that can learn from such multimodal content.
According to the literature, the common multimodal recommendation pipeline
involves (i) extracting multimodal features, (ii) refining their high-level
representations to suit the recommendation task, (iii) optionally fusing all
multimodal features, and (iv) predicting the user-item score. While great
effort has been put into designing optimal solutions for (ii-iv), to the best
of our knowledge, very little attention has been devoted to exploring
procedures for (i). In this respect, the existing literature outlines the large
availability of multimodal datasets and the ever-growing number of large models
accounting for multimodal-aware tasks, but (at the same time) an unjustified
adoption of limited standardized solutions. This motivates us to explore more
extensive techniques for the (i) stage of the pipeline. To this end, this paper
settles as the first attempt to offer a large-scale benchmarking for multimodal
recommender systems, with a specific focus on multimodal extractors.
Specifically, we take advantage of two popular and recent frameworks for
multimodal feature extraction and reproducibility in recommendation, Ducho and
Elliot, to offer a unified and ready-to-use experimental environment able to
run extensive benchmarking analyses leveraging novel multimodal feature
extractors. Results, largely validated under different hyper-parameter settings
for the chosen extractors, provide important insights on how to train and tune
the next generation of multimodal recommendation algorithms. |
doi_str_mv | 10.48550/arxiv.2409.15857 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_15857</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_15857</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_158573</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0tTA152TwcClNzshXyE1NLSlWcM3JycwvsVLwSSxKT9UtTk7MSVVwSs1LzshNLMouVkjLL1LwLc0pyczNT0nMUQhKTc7PzU3NS0ksyczP42FgTUvMKU7lhdLcDPJuriHOHrpgO-MLijKBhlTGg-yOB9ttTFgFANtBOb0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation</title><source>arXiv.org</source><creator>Attimonelli, Matteo ; Danese, Danilo ; Di Fazio, Angela ; Malitesta, Daniele ; Pomo, Claudio ; Di Noia, Tommaso</creator><creatorcontrib>Attimonelli, Matteo ; Danese, Danilo ; Di Fazio, Angela ; Malitesta, Daniele ; Pomo, Claudio ; Di Noia, Tommaso</creatorcontrib><description>In specific domains like fashion, music, and movie recommendation, the
multi-faceted features characterizing products and services may influence each
customer on online selling platforms differently, paving the way to novel
multimodal recommendation models that can learn from such multimodal content.
According to the literature, the common multimodal recommendation pipeline
involves (i) extracting multimodal features, (ii) refining their high-level
representations to suit the recommendation task, (iii) optionally fusing all
multimodal features, and (iv) predicting the user-item score. While great
effort has been put into designing optimal solutions for (ii-iv), to the best
of our knowledge, very little attention has been devoted to exploring
procedures for (i). In this respect, the existing literature outlines the large
availability of multimodal datasets and the ever-growing number of large models
accounting for multimodal-aware tasks, but (at the same time) an unjustified
adoption of limited standardized solutions. This motivates us to explore more
extensive techniques for the (i) stage of the pipeline. To this end, this paper
settles as the first attempt to offer a large-scale benchmarking for multimodal
recommender systems, with a specific focus on multimodal extractors.
Specifically, we take advantage of two popular and recent frameworks for
multimodal feature extraction and reproducibility in recommendation, Ducho and
Elliot, to offer a unified and ready-to-use experimental environment able to
run extensive benchmarking analyses leveraging novel multimodal feature
extractors. Results, largely validated under different hyper-parameter settings
for the chosen extractors, provide important insights on how to train and tune
the next generation of multimodal recommendation algorithms.</description><identifier>DOI: 10.48550/arxiv.2409.15857</identifier><language>eng</language><subject>Computer Science - Information Retrieval</subject><creationdate>2024-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.15857$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.15857$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Attimonelli, Matteo</creatorcontrib><creatorcontrib>Danese, Danilo</creatorcontrib><creatorcontrib>Di Fazio, Angela</creatorcontrib><creatorcontrib>Malitesta, Daniele</creatorcontrib><creatorcontrib>Pomo, Claudio</creatorcontrib><creatorcontrib>Di Noia, Tommaso</creatorcontrib><title>Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation</title><description>In specific domains like fashion, music, and movie recommendation, the
multi-faceted features characterizing products and services may influence each
customer on online selling platforms differently, paving the way to novel
multimodal recommendation models that can learn from such multimodal content.
According to the literature, the common multimodal recommendation pipeline
involves (i) extracting multimodal features, (ii) refining their high-level
representations to suit the recommendation task, (iii) optionally fusing all
multimodal features, and (iv) predicting the user-item score. While great
effort has been put into designing optimal solutions for (ii-iv), to the best
of our knowledge, very little attention has been devoted to exploring
procedures for (i). In this respect, the existing literature outlines the large
availability of multimodal datasets and the ever-growing number of large models
accounting for multimodal-aware tasks, but (at the same time) an unjustified
adoption of limited standardized solutions. This motivates us to explore more
extensive techniques for the (i) stage of the pipeline. To this end, this paper
settles as the first attempt to offer a large-scale benchmarking for multimodal
recommender systems, with a specific focus on multimodal extractors.
Specifically, we take advantage of two popular and recent frameworks for
multimodal feature extraction and reproducibility in recommendation, Ducho and
Elliot, to offer a unified and ready-to-use experimental environment able to
run extensive benchmarking analyses leveraging novel multimodal feature
extractors. Results, largely validated under different hyper-parameter settings
for the chosen extractors, provide important insights on how to train and tune
the next generation of multimodal recommendation algorithms.</description><subject>Computer Science - Information Retrieval</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0tTA152TwcClNzshXyE1NLSlWcM3JycwvsVLwSSxKT9UtTk7MSVVwSs1LzshNLMouVkjLL1LwLc0pyczNT0nMUQhKTc7PzU3NS0ksyczP42FgTUvMKU7lhdLcDPJuriHOHrpgO-MLijKBhlTGg-yOB9ttTFgFANtBOb0</recordid><startdate>20240924</startdate><enddate>20240924</enddate><creator>Attimonelli, Matteo</creator><creator>Danese, Danilo</creator><creator>Di Fazio, Angela</creator><creator>Malitesta, Daniele</creator><creator>Pomo, Claudio</creator><creator>Di Noia, Tommaso</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240924</creationdate><title>Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation</title><author>Attimonelli, Matteo ; Danese, Danilo ; Di Fazio, Angela ; Malitesta, Daniele ; Pomo, Claudio ; Di Noia, Tommaso</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_158573</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Information Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Attimonelli, Matteo</creatorcontrib><creatorcontrib>Danese, Danilo</creatorcontrib><creatorcontrib>Di Fazio, Angela</creatorcontrib><creatorcontrib>Malitesta, Daniele</creatorcontrib><creatorcontrib>Pomo, Claudio</creatorcontrib><creatorcontrib>Di Noia, Tommaso</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Attimonelli, Matteo</au><au>Danese, Danilo</au><au>Di Fazio, Angela</au><au>Malitesta, Daniele</au><au>Pomo, Claudio</au><au>Di Noia, Tommaso</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation</atitle><date>2024-09-24</date><risdate>2024</risdate><abstract>In specific domains like fashion, music, and movie recommendation, the
multi-faceted features characterizing products and services may influence each
customer on online selling platforms differently, paving the way to novel
multimodal recommendation models that can learn from such multimodal content.
According to the literature, the common multimodal recommendation pipeline
involves (i) extracting multimodal features, (ii) refining their high-level
representations to suit the recommendation task, (iii) optionally fusing all
multimodal features, and (iv) predicting the user-item score. While great
effort has been put into designing optimal solutions for (ii-iv), to the best
of our knowledge, very little attention has been devoted to exploring
procedures for (i). In this respect, the existing literature outlines the large
availability of multimodal datasets and the ever-growing number of large models
accounting for multimodal-aware tasks, but (at the same time) an unjustified
adoption of limited standardized solutions. This motivates us to explore more
extensive techniques for the (i) stage of the pipeline. To this end, this paper
settles as the first attempt to offer a large-scale benchmarking for multimodal
recommender systems, with a specific focus on multimodal extractors.
Specifically, we take advantage of two popular and recent frameworks for
multimodal feature extraction and reproducibility in recommendation, Ducho and
Elliot, to offer a unified and ready-to-use experimental environment able to
run extensive benchmarking analyses leveraging novel multimodal feature
extractors. Results, largely validated under different hyper-parameter settings
for the chosen extractors, provide important insights on how to train and tune
the next generation of multimodal recommendation algorithms.</abstract><doi>10.48550/arxiv.2409.15857</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2409.15857 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2409_15857 |
source | arXiv.org |
subjects | Computer Science - Information Retrieval |
title | Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T13%3A33%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Ducho%20meets%20Elliot:%20Large-scale%20Benchmarks%20for%20Multimodal%20Recommendation&rft.au=Attimonelli,%20Matteo&rft.date=2024-09-24&rft_id=info:doi/10.48550/arxiv.2409.15857&rft_dat=%3Carxiv_GOX%3E2409_15857%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |