Double-scale similarity with rich features for cross-modal retrieval
This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semanti...
Gespeichert in:
Veröffentlicht in: | Multimedia systems 2022, Vol.28 (5), p.1767-1777 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1777 |
---|---|
container_issue | 5 |
container_start_page | 1767 |
container_title | Multimedia systems |
container_volume | 28 |
creator | Zhao, Kaiqiang Wang, Hufei Zhao, Dexin |
description | This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance. |
doi_str_mv | 10.1007/s00530-022-00933-7 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2717709801</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2717709801</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-c80e07941e99a2ffe8a10ebb0603819210ad9b21ded6d8dc5d64b7870f7f67c83</originalsourceid><addsrcrecordid>eNp9kLFOwzAURS0EEqXwA0yWmA3PdhrbI2qhIFVigdly4mfqKm2KnYD694QGiY3pLefe-3QIueZwywHUXQaYSWAgBAMwUjJ1Qia8kIJxrcUpmYApBCtMKc7JRc4bAK5KCROyWLR91SDLtWuQ5riNjUuxO9Cv2K1pivWaBnRdnzDT0CZapzZntm29a2jCLkX8dM0lOQuuyXj1e6fk7fHhdf7EVi_L5_n9itVCQcdqDQjKFByNcSIE1I4DVhWUIDU3goPzphLcoy-99vXMl0WltIKgQqlqLafkZuzdp_ajx9zZTdun3TBpheJKgdHAB0qM1PHXhMHuU9y6dLAc7I8tO9qygy17tGXVEJJjKA_w7h3TX_U_qW9Ix2zq</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2717709801</pqid></control><display><type>article</type><title>Double-scale similarity with rich features for cross-modal retrieval</title><source>SpringerNature Journals</source><creator>Zhao, Kaiqiang ; Wang, Hufei ; Zhao, Dexin</creator><creatorcontrib>Zhao, Kaiqiang ; Wang, Hufei ; Zhao, Dexin</creatorcontrib><description>This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-022-00933-7</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Feature extraction ; Labels ; Measurement methods ; Multimedia Information Systems ; Object recognition ; Operating Systems ; Redundancy ; Regular Article ; Retrieval ; Semantics ; Similarity</subject><ispartof>Multimedia systems, 2022, Vol.28 (5), p.1767-1777</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-c80e07941e99a2ffe8a10ebb0603819210ad9b21ded6d8dc5d64b7870f7f67c83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-022-00933-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-022-00933-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>315,781,785,27929,27930,41493,42562,51324</link.rule.ids></links><search><creatorcontrib>Zhao, Kaiqiang</creatorcontrib><creatorcontrib>Wang, Hufei</creatorcontrib><creatorcontrib>Zhao, Dexin</creatorcontrib><title>Double-scale similarity with rich features for cross-modal retrieval</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance.</description><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Feature extraction</subject><subject>Labels</subject><subject>Measurement methods</subject><subject>Multimedia Information Systems</subject><subject>Object recognition</subject><subject>Operating Systems</subject><subject>Redundancy</subject><subject>Regular Article</subject><subject>Retrieval</subject><subject>Semantics</subject><subject>Similarity</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kLFOwzAURS0EEqXwA0yWmA3PdhrbI2qhIFVigdly4mfqKm2KnYD694QGiY3pLefe-3QIueZwywHUXQaYSWAgBAMwUjJ1Qia8kIJxrcUpmYApBCtMKc7JRc4bAK5KCROyWLR91SDLtWuQ5riNjUuxO9Cv2K1pivWaBnRdnzDT0CZapzZntm29a2jCLkX8dM0lOQuuyXj1e6fk7fHhdf7EVi_L5_n9itVCQcdqDQjKFByNcSIE1I4DVhWUIDU3goPzphLcoy-99vXMl0WltIKgQqlqLafkZuzdp_ajx9zZTdun3TBpheJKgdHAB0qM1PHXhMHuU9y6dLAc7I8tO9qygy17tGXVEJJjKA_w7h3TX_U_qW9Ix2zq</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Zhao, Kaiqiang</creator><creator>Wang, Hufei</creator><creator>Zhao, Dexin</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>2022</creationdate><title>Double-scale similarity with rich features for cross-modal retrieval</title><author>Zhao, Kaiqiang ; Wang, Hufei ; Zhao, Dexin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-c80e07941e99a2ffe8a10ebb0603819210ad9b21ded6d8dc5d64b7870f7f67c83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Feature extraction</topic><topic>Labels</topic><topic>Measurement methods</topic><topic>Multimedia Information Systems</topic><topic>Object recognition</topic><topic>Operating Systems</topic><topic>Redundancy</topic><topic>Regular Article</topic><topic>Retrieval</topic><topic>Semantics</topic><topic>Similarity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Kaiqiang</creatorcontrib><creatorcontrib>Wang, Hufei</creatorcontrib><creatorcontrib>Zhao, Dexin</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Kaiqiang</au><au>Wang, Hufei</au><au>Zhao, Dexin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Double-scale similarity with rich features for cross-modal retrieval</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2022</date><risdate>2022</risdate><volume>28</volume><issue>5</issue><spage>1767</spage><epage>1777</epage><pages>1767-1777</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-022-00933-7</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0942-4962 |
ispartof | Multimedia systems, 2022, Vol.28 (5), p.1767-1777 |
issn | 0942-4962 1432-1882 |
language | eng |
recordid | cdi_proquest_journals_2717709801 |
source | SpringerNature Journals |
subjects | Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Feature extraction Labels Measurement methods Multimedia Information Systems Object recognition Operating Systems Redundancy Regular Article Retrieval Semantics Similarity |
title | Double-scale similarity with rich features for cross-modal retrieval |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T11%3A08%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Double-scale%20similarity%20with%20rich%20features%20for%20cross-modal%20retrieval&rft.jtitle=Multimedia%20systems&rft.au=Zhao,%20Kaiqiang&rft.date=2022&rft.volume=28&rft.issue=5&rft.spage=1767&rft.epage=1777&rft.pages=1767-1777&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-022-00933-7&rft_dat=%3Cproquest_cross%3E2717709801%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2717709801&rft_id=info:pmid/&rfr_iscdi=true |