Double-scale similarity with rich features for cross-modal retrieval

This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semanti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia systems 2022, Vol.28 (5), p.1767-1777
Hauptverfasser:	Zhao, Kaiqiang, Wang, Hufei, Zhao, Dexin
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Feature extraction Labels Measurement methods Multimedia Information Systems Object recognition Operating Systems Redundancy Regular Article Retrieval Semantics Similarity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1777
container_issue	5
container_start_page	1767
container_title	Multimedia systems
container_volume	28
creator	Zhao, Kaiqiang Wang, Hufei Zhao, Dexin
description	This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance.
doi_str_mv	10.1007/s00530-022-00933-7
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2717709801</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2717709801</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-c80e07941e99a2ffe8a10ebb0603819210ad9b21ded6d8dc5d64b7870f7f67c83</originalsourceid><addsrcrecordid>eNp9kLFOwzAURS0EEqXwA0yWmA3PdhrbI2qhIFVigdly4mfqKm2KnYD694QGiY3pLefe-3QIueZwywHUXQaYSWAgBAMwUjJ1Qia8kIJxrcUpmYApBCtMKc7JRc4bAK5KCROyWLR91SDLtWuQ5riNjUuxO9Cv2K1pivWaBnRdnzDT0CZapzZntm29a2jCLkX8dM0lOQuuyXj1e6fk7fHhdf7EVi_L5_n9itVCQcdqDQjKFByNcSIE1I4DVhWUIDU3goPzphLcoy-99vXMl0WltIKgQqlqLafkZuzdp_ajx9zZTdun3TBpheJKgdHAB0qM1PHXhMHuU9y6dLAc7I8tO9qygy17tGXVEJJjKA_w7h3TX_U_qW9Ix2zq</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2717709801</pqid></control><display><type>article</type><title>Double-scale similarity with rich features for cross-modal retrieval</title><source>SpringerNature Journals</source><creator>Zhao, Kaiqiang ; Wang, Hufei ; Zhao, Dexin</creator><creatorcontrib>Zhao, Kaiqiang ; Wang, Hufei ; Zhao, Dexin</creatorcontrib><description>This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-022-00933-7</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Feature extraction ; Labels ; Measurement methods ; Multimedia Information Systems ; Object recognition ; Operating Systems ; Redundancy ; Regular Article ; Retrieval ; Semantics ; Similarity</subject><ispartof>Multimedia systems, 2022, Vol.28 (5), p.1767-1777</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-c80e07941e99a2ffe8a10ebb0603819210ad9b21ded6d8dc5d64b7870f7f67c83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-022-00933-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-022-00933-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>315,781,785,27929,27930,41493,42562,51324</link.rule.ids></links><search><creatorcontrib>Zhao, Kaiqiang</creatorcontrib><creatorcontrib>Wang, Hufei</creatorcontrib><creatorcontrib>Zhao, Dexin</creatorcontrib><title>Double-scale similarity with rich features for cross-modal retrieval</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance.</description><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Feature extraction</subject><subject>Labels</subject><subject>Measurement methods</subject><subject>Multimedia Information Systems</subject><subject>Object recognition</subject><subject>Operating Systems</subject><subject>Redundancy</subject><subject>Regular Article</subject><subject>Retrieval</subject><subject>Semantics</subject><subject>Similarity</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kLFOwzAURS0EEqXwA0yWmA3PdhrbI2qhIFVigdly4mfqKm2KnYD694QGiY3pLefe-3QIueZwywHUXQaYSWAgBAMwUjJ1Qia8kIJxrcUpmYApBCtMKc7JRc4bAK5KCROyWLR91SDLtWuQ5riNjUuxO9Cv2K1pivWaBnRdnzDT0CZapzZntm29a2jCLkX8dM0lOQuuyXj1e6fk7fHhdf7EVi_L5_n9itVCQcdqDQjKFByNcSIE1I4DVhWUIDU3goPzphLcoy-99vXMl0WltIKgQqlqLafkZuzdp_ajx9zZTdun3TBpheJKgdHAB0qM1PHXhMHuU9y6dLAc7I8tO9qygy17tGXVEJJjKA_w7h3TX_U_qW9Ix2zq</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Zhao, Kaiqiang</creator><creator>Wang, Hufei</creator><creator>Zhao, Dexin</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>2022</creationdate><title>Double-scale similarity with rich features for cross-modal retrieval</title><author>Zhao, Kaiqiang ; Wang, Hufei ; Zhao, Dexin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-c80e07941e99a2ffe8a10ebb0603819210ad9b21ded6d8dc5d64b7870f7f67c83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Feature extraction</topic><topic>Labels</topic><topic>Measurement methods</topic><topic>Multimedia Information Systems</topic><topic>Object recognition</topic><topic>Operating Systems</topic><topic>Redundancy</topic><topic>Regular Article</topic><topic>Retrieval</topic><topic>Semantics</topic><topic>Similarity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Kaiqiang</creatorcontrib><creatorcontrib>Wang, Hufei</creatorcontrib><creatorcontrib>Zhao, Dexin</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Kaiqiang</au><au>Wang, Hufei</au><au>Zhao, Dexin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Double-scale similarity with rich features for cross-modal retrieval</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2022</date><risdate>2022</risdate><volume>28</volume><issue>5</issue><spage>1767</spage><epage>1777</epage><pages>1767-1777</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-022-00933-7</doi><tpages>11</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0942-4962
ispartof	Multimedia systems, 2022, Vol.28 (5), p.1767-1777
issn	0942-4962 1432-1882
language	eng
recordid	cdi_proquest_journals_2717709801
source	SpringerNature Journals
subjects	Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Feature extraction Labels Measurement methods Multimedia Information Systems Object recognition Operating Systems Redundancy Regular Article Retrieval Semantics Similarity
title	Double-scale similarity with rich features for cross-modal retrieval
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T11%3A08%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Double-scale%20similarity%20with%20rich%20features%20for%20cross-modal%20retrieval&rft.jtitle=Multimedia%20systems&rft.au=Zhao,%20Kaiqiang&rft.date=2022&rft.volume=28&rft.issue=5&rft.spage=1767&rft.epage=1777&rft.pages=1767-1777&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-022-00933-7&rft_dat=%3Cproquest_cross%3E2717709801%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2717709801&rft_id=info:pmid/&rfr_iscdi=true