Efficient low-rank multi-component fusion with component-specific factors in image-recipe retrieval
Image-Recipe retrieval is the task of retrieving closely related recipes from a collection given a food image and vice versa. The modality gap between images and recipes makes it a challenging task. Recent studies usually focus on learning consistent image and recipe representations to bridge the se...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2024, Vol.83 (2), p.3601-3619 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3619 |
---|---|
container_issue | 2 |
container_start_page | 3601 |
container_title | Multimedia tools and applications |
container_volume | 83 |
creator | Zhao, Wenyu Zhou, Dong Cao, Buqing Zhang, Kai Chen, Jinjun |
description | Image-Recipe retrieval is the task of retrieving closely related recipes from a collection given a food image and vice versa. The modality gap between images and recipes makes it a challenging task. Recent studies usually focus on learning consistent image and recipe representations to bridge the semantic gap. Though the existing methods have significantly improved image-recipe retrieval, several challenges still remain: 1) Previous studies usually directly concatenate the textual embeddings of different recipe components to generate recipe presentations. Only simple interactions rather than complex interactions are considered. 2) They commonly focus on textual feature extraction from recipes. The methods to extract image features are relatively simple, and most studies utilize the ResNet-50 model. 3) Apart from the retrieval learning loss (triplet loss, for example), several auxiliary loss functions (such as adversarial loss and reconstruction loss) are commonly used to match the recipe and image representations. To deal with these issues, we introduce a novel Low-rank Multi-component Fusion method with Component-Specific Factors (LMF-CSF) to model the different textual components in a recipe for producing superior textual representations. Furthermore, try to pay some attention to image feature extraction. A visual transformer is used to learn better image representations. Then the enhanced representations from two modalities are directly fed into a triplet loss function for image-recipe retrieval learning. Experimental results conducted on the Recipe1M dataset indicate that our LMF-CSF method can outperform the current state-of-the-art image-recipe retrieval baselines. |
doi_str_mv | 10.1007/s11042-023-15819-7 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2911130180</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2911130180</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-b05e90a9da4d4e6d44386b000b02b2d37407b8c4b73ffec54c4138738b5edff43</originalsourceid><addsrcrecordid>eNp9UEtPwzAMjhBIjMEf4BSJc8BO0qU9omk8pElc4By1aTIyurYkLRP_nowiuHGy5e9h-yPkEuEaAdRNRATJGXDBMMuxYOqIzDBTginF8Tj1IgemMsBTchbjFgAXGZczYlbOeeNtO9Cm27NQtm90NzaDZ6bb9V17ANwYfdfSvR9e6e-Uxd4an7TUlWboQqS-pX5XbiwLCegtDXYI3n6UzTk5cWUT7cVPnZOXu9Xz8oGtn-4fl7drZgQWA6sgswWURV3KWtpFLaXIFxUAVMArXgslQVW5kZUSzlmTSSPTV0rkVWZr56SYk6vJtw_d-2jjoLfdGNq0UvMCEQVgDonFJ5YJXYzBOt2HdHf41Aj6EKaewtQpTP0dplZJJCZRTOR2Y8Of9T-qL1w0eOY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2911130180</pqid></control><display><type>article</type><title>Efficient low-rank multi-component fusion with component-specific factors in image-recipe retrieval</title><source>SpringerLink Journals</source><creator>Zhao, Wenyu ; Zhou, Dong ; Cao, Buqing ; Zhang, Kai ; Chen, Jinjun</creator><creatorcontrib>Zhao, Wenyu ; Zhou, Dong ; Cao, Buqing ; Zhang, Kai ; Chen, Jinjun</creatorcontrib><description>Image-Recipe retrieval is the task of retrieving closely related recipes from a collection given a food image and vice versa. The modality gap between images and recipes makes it a challenging task. Recent studies usually focus on learning consistent image and recipe representations to bridge the semantic gap. Though the existing methods have significantly improved image-recipe retrieval, several challenges still remain: 1) Previous studies usually directly concatenate the textual embeddings of different recipe components to generate recipe presentations. Only simple interactions rather than complex interactions are considered. 2) They commonly focus on textual feature extraction from recipes. The methods to extract image features are relatively simple, and most studies utilize the ResNet-50 model. 3) Apart from the retrieval learning loss (triplet loss, for example), several auxiliary loss functions (such as adversarial loss and reconstruction loss) are commonly used to match the recipe and image representations. To deal with these issues, we introduce a novel Low-rank Multi-component Fusion method with Component-Specific Factors (LMF-CSF) to model the different textual components in a recipe for producing superior textual representations. Furthermore, try to pay some attention to image feature extraction. A visual transformer is used to learn better image representations. Then the enhanced representations from two modalities are directly fed into a triplet loss function for image-recipe retrieval learning. Experimental results conducted on the Recipe1M dataset indicate that our LMF-CSF method can outperform the current state-of-the-art image-recipe retrieval baselines.</description><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-023-15819-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Computer Communication Networks ; Computer Science ; Data Structures and Information Theory ; Feature extraction ; Image enhancement ; Learning ; Multimedia Information Systems ; Recipes ; Representations ; Retrieval ; Special Purpose and Application-Based Systems</subject><ispartof>Multimedia tools and applications, 2024, Vol.83 (2), p.3601-3619</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-b05e90a9da4d4e6d44386b000b02b2d37407b8c4b73ffec54c4138738b5edff43</citedby><cites>FETCH-LOGICAL-c319t-b05e90a9da4d4e6d44386b000b02b2d37407b8c4b73ffec54c4138738b5edff43</cites><orcidid>0000-0002-3310-8347</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-023-15819-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-023-15819-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Zhao, Wenyu</creatorcontrib><creatorcontrib>Zhou, Dong</creatorcontrib><creatorcontrib>Cao, Buqing</creatorcontrib><creatorcontrib>Zhang, Kai</creatorcontrib><creatorcontrib>Chen, Jinjun</creatorcontrib><title>Efficient low-rank multi-component fusion with component-specific factors in image-recipe retrieval</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>Image-Recipe retrieval is the task of retrieving closely related recipes from a collection given a food image and vice versa. The modality gap between images and recipes makes it a challenging task. Recent studies usually focus on learning consistent image and recipe representations to bridge the semantic gap. Though the existing methods have significantly improved image-recipe retrieval, several challenges still remain: 1) Previous studies usually directly concatenate the textual embeddings of different recipe components to generate recipe presentations. Only simple interactions rather than complex interactions are considered. 2) They commonly focus on textual feature extraction from recipes. The methods to extract image features are relatively simple, and most studies utilize the ResNet-50 model. 3) Apart from the retrieval learning loss (triplet loss, for example), several auxiliary loss functions (such as adversarial loss and reconstruction loss) are commonly used to match the recipe and image representations. To deal with these issues, we introduce a novel Low-rank Multi-component Fusion method with Component-Specific Factors (LMF-CSF) to model the different textual components in a recipe for producing superior textual representations. Furthermore, try to pay some attention to image feature extraction. A visual transformer is used to learn better image representations. Then the enhanced representations from two modalities are directly fed into a triplet loss function for image-recipe retrieval learning. Experimental results conducted on the Recipe1M dataset indicate that our LMF-CSF method can outperform the current state-of-the-art image-recipe retrieval baselines.</description><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Data Structures and Information Theory</subject><subject>Feature extraction</subject><subject>Image enhancement</subject><subject>Learning</subject><subject>Multimedia Information Systems</subject><subject>Recipes</subject><subject>Representations</subject><subject>Retrieval</subject><subject>Special Purpose and Application-Based Systems</subject><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp9UEtPwzAMjhBIjMEf4BSJc8BO0qU9omk8pElc4By1aTIyurYkLRP_nowiuHGy5e9h-yPkEuEaAdRNRATJGXDBMMuxYOqIzDBTginF8Tj1IgemMsBTchbjFgAXGZczYlbOeeNtO9Cm27NQtm90NzaDZ6bb9V17ANwYfdfSvR9e6e-Uxd4an7TUlWboQqS-pX5XbiwLCegtDXYI3n6UzTk5cWUT7cVPnZOXu9Xz8oGtn-4fl7drZgQWA6sgswWURV3KWtpFLaXIFxUAVMArXgslQVW5kZUSzlmTSSPTV0rkVWZr56SYk6vJtw_d-2jjoLfdGNq0UvMCEQVgDonFJ5YJXYzBOt2HdHf41Aj6EKaewtQpTP0dplZJJCZRTOR2Y8Of9T-qL1w0eOY</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Zhao, Wenyu</creator><creator>Zhou, Dong</creator><creator>Cao, Buqing</creator><creator>Zhang, Kai</creator><creator>Chen, Jinjun</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-3310-8347</orcidid></search><sort><creationdate>2024</creationdate><title>Efficient low-rank multi-component fusion with component-specific factors in image-recipe retrieval</title><author>Zhao, Wenyu ; Zhou, Dong ; Cao, Buqing ; Zhang, Kai ; Chen, Jinjun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-b05e90a9da4d4e6d44386b000b02b2d37407b8c4b73ffec54c4138738b5edff43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Data Structures and Information Theory</topic><topic>Feature extraction</topic><topic>Image enhancement</topic><topic>Learning</topic><topic>Multimedia Information Systems</topic><topic>Recipes</topic><topic>Representations</topic><topic>Retrieval</topic><topic>Special Purpose and Application-Based Systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Wenyu</creatorcontrib><creatorcontrib>Zhou, Dong</creatorcontrib><creatorcontrib>Cao, Buqing</creatorcontrib><creatorcontrib>Zhang, Kai</creatorcontrib><creatorcontrib>Chen, Jinjun</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Wenyu</au><au>Zhou, Dong</au><au>Cao, Buqing</au><au>Zhang, Kai</au><au>Chen, Jinjun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient low-rank multi-component fusion with component-specific factors in image-recipe retrieval</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2024</date><risdate>2024</risdate><volume>83</volume><issue>2</issue><spage>3601</spage><epage>3619</epage><pages>3601-3619</pages><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>Image-Recipe retrieval is the task of retrieving closely related recipes from a collection given a food image and vice versa. The modality gap between images and recipes makes it a challenging task. Recent studies usually focus on learning consistent image and recipe representations to bridge the semantic gap. Though the existing methods have significantly improved image-recipe retrieval, several challenges still remain: 1) Previous studies usually directly concatenate the textual embeddings of different recipe components to generate recipe presentations. Only simple interactions rather than complex interactions are considered. 2) They commonly focus on textual feature extraction from recipes. The methods to extract image features are relatively simple, and most studies utilize the ResNet-50 model. 3) Apart from the retrieval learning loss (triplet loss, for example), several auxiliary loss functions (such as adversarial loss and reconstruction loss) are commonly used to match the recipe and image representations. To deal with these issues, we introduce a novel Low-rank Multi-component Fusion method with Component-Specific Factors (LMF-CSF) to model the different textual components in a recipe for producing superior textual representations. Furthermore, try to pay some attention to image feature extraction. A visual transformer is used to learn better image representations. Then the enhanced representations from two modalities are directly fed into a triplet loss function for image-recipe retrieval learning. Experimental results conducted on the Recipe1M dataset indicate that our LMF-CSF method can outperform the current state-of-the-art image-recipe retrieval baselines.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-023-15819-7</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0002-3310-8347</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1380-7501 |
ispartof | Multimedia tools and applications, 2024, Vol.83 (2), p.3601-3619 |
issn | 1380-7501 1573-7721 |
language | eng |
recordid | cdi_proquest_journals_2911130180 |
source | SpringerLink Journals |
subjects | Computer Communication Networks Computer Science Data Structures and Information Theory Feature extraction Image enhancement Learning Multimedia Information Systems Recipes Representations Retrieval Special Purpose and Application-Based Systems |
title | Efficient low-rank multi-component fusion with component-specific factors in image-recipe retrieval |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T10%3A06%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20low-rank%20multi-component%20fusion%20with%20component-specific%20factors%20in%20image-recipe%20retrieval&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Zhao,%20Wenyu&rft.date=2024&rft.volume=83&rft.issue=2&rft.spage=3601&rft.epage=3619&rft.pages=3601-3619&rft.issn=1380-7501&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-023-15819-7&rft_dat=%3Cproquest_cross%3E2911130180%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2911130180&rft_id=info:pmid/&rfr_iscdi=true |