Multi-label modality enhanced attention based self-supervised deep cross-modal hashing

The recent deep cross-modal hashing (DCMH) has achieved superior performance in effective and efficient cross-modal retrieval and thus has drawn increasing attention. Nevertheless, there are still two limitations for most existing DCMH methods: (1) single labels are usually leveraged to measure the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2022-03, Vol.239, p.107927, Article 107927
Hauptverfasser:	Zou, Xitao, Wu, Song, Zhang, Nian, Bakker, Erwin M.
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention mechanism Datasets Deep cross-modal hashing Labels Learning Modal data Multi-label semantic learning Optimization Representations Retrieval Semantics Similarity Source code
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The recent deep cross-modal hashing (DCMH) has achieved superior performance in effective and efficient cross-modal retrieval and thus has drawn increasing attention. Nevertheless, there are still two limitations for most existing DCMH methods: (1) single labels are usually leveraged to measure the semantic similarity of cross-modal pairwise instances while neglecting that many cross-modal datasets contain abundant semantic information among multi-labels. (2) several DCMH methods utilized the multi-labels to supervise the learning of hash functions. Nevertheless, the feature space of multi-labels suffers the weakness of sparse, resulting in sub-optimization for the hash functions learning. Thus, this paper proposed a multi-label modality enhanced attention-based self-supervised deep cross-modal hashing (MMACH) framework. Specifically, a multi-label modality enhanced attention module is designed to integrate the significant features from cross-modal data into multi-labels feature representations, aiming to improve its completion. Moreover, a multi-label cross-modal triplet loss is defined based on the criterion that the feature representations of cross-modal pairwise instances with more common categories should preserve higher semantic similarity than other instances. To the best of our knowledge, the multi-label cross-modal triplet loss is the first time designed for cross-modal retrieval. Extensive experiments on four multi-label cross-modal datasets demonstrate the effectiveness and efficiency of our proposed MMACH. Moreover, the MMACH also achieved superior performance and outperformed several state-of-the-art methods on the task of cross-modal retrieval. The source code of MMACH is available at https://github.com/SWU-CS-MediaLab/MMACH.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2021.107927