Multi-label modality enhanced attention based self-supervised deep cross-modal hashing

The recent deep cross-modal hashing (DCMH) has achieved superior performance in effective and efficient cross-modal retrieval and thus has drawn increasing attention. Nevertheless, there are still two limitations for most existing DCMH methods: (1) single labels are usually leveraged to measure the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2022-03, Vol.239, p.107927, Article 107927
Hauptverfasser:	Zou, Xitao, Wu, Song, Zhang, Nian, Bakker, Erwin M.
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention mechanism Datasets Deep cross-modal hashing Labels Learning Modal data Multi-label semantic learning Optimization Representations Retrieval Semantics Similarity Source code
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	107927
container_title	Knowledge-based systems
container_volume	239
creator	Zou, Xitao Wu, Song Zhang, Nian Bakker, Erwin M.
description	The recent deep cross-modal hashing (DCMH) has achieved superior performance in effective and efficient cross-modal retrieval and thus has drawn increasing attention. Nevertheless, there are still two limitations for most existing DCMH methods: (1) single labels are usually leveraged to measure the semantic similarity of cross-modal pairwise instances while neglecting that many cross-modal datasets contain abundant semantic information among multi-labels. (2) several DCMH methods utilized the multi-labels to supervise the learning of hash functions. Nevertheless, the feature space of multi-labels suffers the weakness of sparse, resulting in sub-optimization for the hash functions learning. Thus, this paper proposed a multi-label modality enhanced attention-based self-supervised deep cross-modal hashing (MMACH) framework. Specifically, a multi-label modality enhanced attention module is designed to integrate the significant features from cross-modal data into multi-labels feature representations, aiming to improve its completion. Moreover, a multi-label cross-modal triplet loss is defined based on the criterion that the feature representations of cross-modal pairwise instances with more common categories should preserve higher semantic similarity than other instances. To the best of our knowledge, the multi-label cross-modal triplet loss is the first time designed for cross-modal retrieval. Extensive experiments on four multi-label cross-modal datasets demonstrate the effectiveness and efficiency of our proposed MMACH. Moreover, the MMACH also achieved superior performance and outperformed several state-of-the-art methods on the task of cross-modal retrieval. The source code of MMACH is available at https://github.com/SWU-CS-MediaLab/MMACH.
doi_str_mv	10.1016/j.knosys.2021.107927
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2638772524</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S095070512101073X</els_id><sourcerecordid>2638772524</sourcerecordid><originalsourceid>FETCH-LOGICAL-c380t-9dcf550840e0c1e669228f8d90a3f6707d8193eb77c16c6ac784b558e3b004603</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEqXwBywisXYZO4ntbJBQxUsqYgNsLceZUIc0CXZSqX9P0rBmNbqje-dxCLlmsGLAxG21-m7acAgrDpyNLZlxeUIWTElOZQLZKVlAlgKVkLJzchFCBQCcM7Ugn69D3TtamxzraNcWpnb9IcJmaxqLRWT6HpvetU2UmzDqgHVJw9Ch37tJF4hdZH0bAj2Go60JW9d8XZKz0tQBr_7qknw8Pryvn-nm7ellfb-hNlbQ06ywZZqCSgDBMhQi41yVqsjAxKWQIAvFshhzKS0TVhgrVZKnqcI4B0gExEtyM8_tfPszYOh11Q6-GVdqLmIlJU95MrqS2XW81GOpO-92xh80Az0R1JWeCeqJoJ4JjrG7OYbjB3uHXgfrcOLiPNpeF637f8Av1cB75g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2638772524</pqid></control><display><type>article</type><title>Multi-label modality enhanced attention based self-supervised deep cross-modal hashing</title><source>Access via ScienceDirect (Elsevier)</source><creator>Zou, Xitao ; Wu, Song ; Zhang, Nian ; Bakker, Erwin M.</creator><creatorcontrib>Zou, Xitao ; Wu, Song ; Zhang, Nian ; Bakker, Erwin M.</creatorcontrib><description>The recent deep cross-modal hashing (DCMH) has achieved superior performance in effective and efficient cross-modal retrieval and thus has drawn increasing attention. Nevertheless, there are still two limitations for most existing DCMH methods: (1) single labels are usually leveraged to measure the semantic similarity of cross-modal pairwise instances while neglecting that many cross-modal datasets contain abundant semantic information among multi-labels. (2) several DCMH methods utilized the multi-labels to supervise the learning of hash functions. Nevertheless, the feature space of multi-labels suffers the weakness of sparse, resulting in sub-optimization for the hash functions learning. Thus, this paper proposed a multi-label modality enhanced attention-based self-supervised deep cross-modal hashing (MMACH) framework. Specifically, a multi-label modality enhanced attention module is designed to integrate the significant features from cross-modal data into multi-labels feature representations, aiming to improve its completion. Moreover, a multi-label cross-modal triplet loss is defined based on the criterion that the feature representations of cross-modal pairwise instances with more common categories should preserve higher semantic similarity than other instances. To the best of our knowledge, the multi-label cross-modal triplet loss is the first time designed for cross-modal retrieval. Extensive experiments on four multi-label cross-modal datasets demonstrate the effectiveness and efficiency of our proposed MMACH. Moreover, the MMACH also achieved superior performance and outperformed several state-of-the-art methods on the task of cross-modal retrieval. The source code of MMACH is available at https://github.com/SWU-CS-MediaLab/MMACH.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2021.107927</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Attention mechanism ; Datasets ; Deep cross-modal hashing ; Labels ; Learning ; Modal data ; Multi-label semantic learning ; Optimization ; Representations ; Retrieval ; Semantics ; Similarity ; Source code</subject><ispartof>Knowledge-based systems, 2022-03, Vol.239, p.107927, Article 107927</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Mar 5, 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c380t-9dcf550840e0c1e669228f8d90a3f6707d8193eb77c16c6ac784b558e3b004603</citedby><cites>FETCH-LOGICAL-c380t-9dcf550840e0c1e669228f8d90a3f6707d8193eb77c16c6ac784b558e3b004603</cites><orcidid>0000-0003-1916-7719</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.knosys.2021.107927$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Zou, Xitao</creatorcontrib><creatorcontrib>Wu, Song</creatorcontrib><creatorcontrib>Zhang, Nian</creatorcontrib><creatorcontrib>Bakker, Erwin M.</creatorcontrib><title>Multi-label modality enhanced attention based self-supervised deep cross-modal hashing</title><title>Knowledge-based systems</title><description>The recent deep cross-modal hashing (DCMH) has achieved superior performance in effective and efficient cross-modal retrieval and thus has drawn increasing attention. Nevertheless, there are still two limitations for most existing DCMH methods: (1) single labels are usually leveraged to measure the semantic similarity of cross-modal pairwise instances while neglecting that many cross-modal datasets contain abundant semantic information among multi-labels. (2) several DCMH methods utilized the multi-labels to supervise the learning of hash functions. Nevertheless, the feature space of multi-labels suffers the weakness of sparse, resulting in sub-optimization for the hash functions learning. Thus, this paper proposed a multi-label modality enhanced attention-based self-supervised deep cross-modal hashing (MMACH) framework. Specifically, a multi-label modality enhanced attention module is designed to integrate the significant features from cross-modal data into multi-labels feature representations, aiming to improve its completion. Moreover, a multi-label cross-modal triplet loss is defined based on the criterion that the feature representations of cross-modal pairwise instances with more common categories should preserve higher semantic similarity than other instances. To the best of our knowledge, the multi-label cross-modal triplet loss is the first time designed for cross-modal retrieval. Extensive experiments on four multi-label cross-modal datasets demonstrate the effectiveness and efficiency of our proposed MMACH. Moreover, the MMACH also achieved superior performance and outperformed several state-of-the-art methods on the task of cross-modal retrieval. The source code of MMACH is available at https://github.com/SWU-CS-MediaLab/MMACH.</description><subject>Attention mechanism</subject><subject>Datasets</subject><subject>Deep cross-modal hashing</subject><subject>Labels</subject><subject>Learning</subject><subject>Modal data</subject><subject>Multi-label semantic learning</subject><subject>Optimization</subject><subject>Representations</subject><subject>Retrieval</subject><subject>Semantics</subject><subject>Similarity</subject><subject>Source code</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EEqXwBywisXYZO4ntbJBQxUsqYgNsLceZUIc0CXZSqX9P0rBmNbqje-dxCLlmsGLAxG21-m7acAgrDpyNLZlxeUIWTElOZQLZKVlAlgKVkLJzchFCBQCcM7Ugn69D3TtamxzraNcWpnb9IcJmaxqLRWT6HpvetU2UmzDqgHVJw9Ch37tJF4hdZH0bAj2Go60JW9d8XZKz0tQBr_7qknw8Pryvn-nm7ellfb-hNlbQ06ywZZqCSgDBMhQi41yVqsjAxKWQIAvFshhzKS0TVhgrVZKnqcI4B0gExEtyM8_tfPszYOh11Q6-GVdqLmIlJU95MrqS2XW81GOpO-92xh80Az0R1JWeCeqJoJ4JjrG7OYbjB3uHXgfrcOLiPNpeF637f8Av1cB75g</recordid><startdate>20220305</startdate><enddate>20220305</enddate><creator>Zou, Xitao</creator><creator>Wu, Song</creator><creator>Zhang, Nian</creator><creator>Bakker, Erwin M.</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1916-7719</orcidid></search><sort><creationdate>20220305</creationdate><title>Multi-label modality enhanced attention based self-supervised deep cross-modal hashing</title><author>Zou, Xitao ; Wu, Song ; Zhang, Nian ; Bakker, Erwin M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c380t-9dcf550840e0c1e669228f8d90a3f6707d8193eb77c16c6ac784b558e3b004603</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Attention mechanism</topic><topic>Datasets</topic><topic>Deep cross-modal hashing</topic><topic>Labels</topic><topic>Learning</topic><topic>Modal data</topic><topic>Multi-label semantic learning</topic><topic>Optimization</topic><topic>Representations</topic><topic>Retrieval</topic><topic>Semantics</topic><topic>Similarity</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zou, Xitao</creatorcontrib><creatorcontrib>Wu, Song</creatorcontrib><creatorcontrib>Zhang, Nian</creatorcontrib><creatorcontrib>Bakker, Erwin M.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zou, Xitao</au><au>Wu, Song</au><au>Zhang, Nian</au><au>Bakker, Erwin M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-label modality enhanced attention based self-supervised deep cross-modal hashing</atitle><jtitle>Knowledge-based systems</jtitle><date>2022-03-05</date><risdate>2022</risdate><volume>239</volume><spage>107927</spage><pages>107927-</pages><artnum>107927</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>The recent deep cross-modal hashing (DCMH) has achieved superior performance in effective and efficient cross-modal retrieval and thus has drawn increasing attention. Nevertheless, there are still two limitations for most existing DCMH methods: (1) single labels are usually leveraged to measure the semantic similarity of cross-modal pairwise instances while neglecting that many cross-modal datasets contain abundant semantic information among multi-labels. (2) several DCMH methods utilized the multi-labels to supervise the learning of hash functions. Nevertheless, the feature space of multi-labels suffers the weakness of sparse, resulting in sub-optimization for the hash functions learning. Thus, this paper proposed a multi-label modality enhanced attention-based self-supervised deep cross-modal hashing (MMACH) framework. Specifically, a multi-label modality enhanced attention module is designed to integrate the significant features from cross-modal data into multi-labels feature representations, aiming to improve its completion. Moreover, a multi-label cross-modal triplet loss is defined based on the criterion that the feature representations of cross-modal pairwise instances with more common categories should preserve higher semantic similarity than other instances. To the best of our knowledge, the multi-label cross-modal triplet loss is the first time designed for cross-modal retrieval. Extensive experiments on four multi-label cross-modal datasets demonstrate the effectiveness and efficiency of our proposed MMACH. Moreover, the MMACH also achieved superior performance and outperformed several state-of-the-art methods on the task of cross-modal retrieval. The source code of MMACH is available at https://github.com/SWU-CS-MediaLab/MMACH.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2021.107927</doi><orcidid>https://orcid.org/0000-0003-1916-7719</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0950-7051
ispartof	Knowledge-based systems, 2022-03, Vol.239, p.107927, Article 107927
issn	0950-7051 1872-7409
language	eng
recordid	cdi_proquest_journals_2638772524
source	Access via ScienceDirect (Elsevier)
subjects	Attention mechanism Datasets Deep cross-modal hashing Labels Learning Modal data Multi-label semantic learning Optimization Representations Retrieval Semantics Similarity Source code
title	Multi-label modality enhanced attention based self-supervised deep cross-modal hashing
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T22%3A42%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-label%20modality%20enhanced%20attention%20based%20self-supervised%20deep%20cross-modal%20hashing&rft.jtitle=Knowledge-based%20systems&rft.au=Zou,%20Xitao&rft.date=2022-03-05&rft.volume=239&rft.spage=107927&rft.pages=107927-&rft.artnum=107927&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2021.107927&rft_dat=%3Cproquest_cross%3E2638772524%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2638772524&rft_id=info:pmid/&rft_els_id=S095070512101073X&rfr_iscdi=true