Double-scale similarity with rich features for cross-modal retrieval

This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semanti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia systems 2022, Vol.28 (5), p.1767-1777
Hauptverfasser: Zhao, Kaiqiang, Wang, Hufei, Zhao, Dexin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1777
container_issue 5
container_start_page 1767
container_title Multimedia systems
container_volume 28
creator Zhao, Kaiqiang
Wang, Hufei
Zhao, Dexin
description This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance.
doi_str_mv 10.1007/s00530-022-00933-7
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2717709801</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2717709801</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-c80e07941e99a2ffe8a10ebb0603819210ad9b21ded6d8dc5d64b7870f7f67c83</originalsourceid><addsrcrecordid>eNp9kLFOwzAURS0EEqXwA0yWmA3PdhrbI2qhIFVigdly4mfqKm2KnYD694QGiY3pLefe-3QIueZwywHUXQaYSWAgBAMwUjJ1Qia8kIJxrcUpmYApBCtMKc7JRc4bAK5KCROyWLR91SDLtWuQ5riNjUuxO9Cv2K1pivWaBnRdnzDT0CZapzZntm29a2jCLkX8dM0lOQuuyXj1e6fk7fHhdf7EVi_L5_n9itVCQcdqDQjKFByNcSIE1I4DVhWUIDU3goPzphLcoy-99vXMl0WltIKgQqlqLafkZuzdp_ajx9zZTdun3TBpheJKgdHAB0qM1PHXhMHuU9y6dLAc7I8tO9qygy17tGXVEJJjKA_w7h3TX_U_qW9Ix2zq</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2717709801</pqid></control><display><type>article</type><title>Double-scale similarity with rich features for cross-modal retrieval</title><source>SpringerNature Journals</source><creator>Zhao, Kaiqiang ; Wang, Hufei ; Zhao, Dexin</creator><creatorcontrib>Zhao, Kaiqiang ; Wang, Hufei ; Zhao, Dexin</creatorcontrib><description>This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-022-00933-7</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Feature extraction ; Labels ; Measurement methods ; Multimedia Information Systems ; Object recognition ; Operating Systems ; Redundancy ; Regular Article ; Retrieval ; Semantics ; Similarity</subject><ispartof>Multimedia systems, 2022, Vol.28 (5), p.1767-1777</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-c80e07941e99a2ffe8a10ebb0603819210ad9b21ded6d8dc5d64b7870f7f67c83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-022-00933-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-022-00933-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>315,781,785,27929,27930,41493,42562,51324</link.rule.ids></links><search><creatorcontrib>Zhao, Kaiqiang</creatorcontrib><creatorcontrib>Wang, Hufei</creatorcontrib><creatorcontrib>Zhao, Dexin</creatorcontrib><title>Double-scale similarity with rich features for cross-modal retrieval</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance.</description><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Feature extraction</subject><subject>Labels</subject><subject>Measurement methods</subject><subject>Multimedia Information Systems</subject><subject>Object recognition</subject><subject>Operating Systems</subject><subject>Redundancy</subject><subject>Regular Article</subject><subject>Retrieval</subject><subject>Semantics</subject><subject>Similarity</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kLFOwzAURS0EEqXwA0yWmA3PdhrbI2qhIFVigdly4mfqKm2KnYD694QGiY3pLefe-3QIueZwywHUXQaYSWAgBAMwUjJ1Qia8kIJxrcUpmYApBCtMKc7JRc4bAK5KCROyWLR91SDLtWuQ5riNjUuxO9Cv2K1pivWaBnRdnzDT0CZapzZntm29a2jCLkX8dM0lOQuuyXj1e6fk7fHhdf7EVi_L5_n9itVCQcdqDQjKFByNcSIE1I4DVhWUIDU3goPzphLcoy-99vXMl0WltIKgQqlqLafkZuzdp_ajx9zZTdun3TBpheJKgdHAB0qM1PHXhMHuU9y6dLAc7I8tO9qygy17tGXVEJJjKA_w7h3TX_U_qW9Ix2zq</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Zhao, Kaiqiang</creator><creator>Wang, Hufei</creator><creator>Zhao, Dexin</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>2022</creationdate><title>Double-scale similarity with rich features for cross-modal retrieval</title><author>Zhao, Kaiqiang ; Wang, Hufei ; Zhao, Dexin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-c80e07941e99a2ffe8a10ebb0603819210ad9b21ded6d8dc5d64b7870f7f67c83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Feature extraction</topic><topic>Labels</topic><topic>Measurement methods</topic><topic>Multimedia Information Systems</topic><topic>Object recognition</topic><topic>Operating Systems</topic><topic>Redundancy</topic><topic>Regular Article</topic><topic>Retrieval</topic><topic>Semantics</topic><topic>Similarity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Kaiqiang</creatorcontrib><creatorcontrib>Wang, Hufei</creatorcontrib><creatorcontrib>Zhao, Dexin</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Kaiqiang</au><au>Wang, Hufei</au><au>Zhao, Dexin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Double-scale similarity with rich features for cross-modal retrieval</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2022</date><risdate>2022</risdate><volume>28</volume><issue>5</issue><spage>1767</spage><epage>1777</epage><pages>1767-1777</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-022-00933-7</doi><tpages>11</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0942-4962
ispartof Multimedia systems, 2022, Vol.28 (5), p.1767-1777
issn 0942-4962
1432-1882
language eng
recordid cdi_proquest_journals_2717709801
source SpringerNature Journals
subjects Computer Communication Networks
Computer Graphics
Computer Science
Cryptology
Data Storage Representation
Feature extraction
Labels
Measurement methods
Multimedia Information Systems
Object recognition
Operating Systems
Redundancy
Regular Article
Retrieval
Semantics
Similarity
title Double-scale similarity with rich features for cross-modal retrieval
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T11%3A08%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Double-scale%20similarity%20with%20rich%20features%20for%20cross-modal%20retrieval&rft.jtitle=Multimedia%20systems&rft.au=Zhao,%20Kaiqiang&rft.date=2022&rft.volume=28&rft.issue=5&rft.spage=1767&rft.epage=1777&rft.pages=1767-1777&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-022-00933-7&rft_dat=%3Cproquest_cross%3E2717709801%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2717709801&rft_id=info:pmid/&rfr_iscdi=true