Dual-supervised attention network for deep cross-modal hashing

•We introduce a semantic prediction loss to learn effective cross-modal hash codes.•We propose cross-modal attention block to extract more semantic-rich cues for cross-modal samples.•Experiments on three benchmarks show the effectiveness of our proposed method. Cross-modal hashing has received inten...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition letters 2019-12, Vol.128, p.333-339
Hauptverfasser: Peng, Hanyu, He, Junjun, Chen, Shifeng, Wang, Yali, Qiao, Yu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We introduce a semantic prediction loss to learn effective cross-modal hash codes.•We propose cross-modal attention block to extract more semantic-rich cues for cross-modal samples.•Experiments on three benchmarks show the effectiveness of our proposed method. Cross-modal hashing has received intensive attention due to its low computation and storage efficiency in cross-modal retrieval task. Most previous cross-modal hashing methods mainly focus on extracting correlated binary codes from the pairwise label, but largely ignore the semantic categories of cross-modal data. On the other hand, human perception exploits category information to connect cross-modal samples. Inspired by this fact, we propose to embed category information into hash codes. More specifically, we introduce semantic prediction loss into our framework to enhance hash codes with category supervision. In addition, there always exists a large gap between features from different modalities (e.g. text and images), leading cross-modal hashing to link irrelevant features for retrieval task. To address this issue, this paper proposes Dual-Supervised Attention Network for Deep Hashing (DSADH) to learn the cross-modal relationship via an elaborately-designed attention mechanism. Our cross-modal network applies cross-modal attention block to efficiently encode rich and relevant features to learn compact hash codes. Extensive experiments on three challenging benchmarks demonstrate that our proposed method significantly improves the retrieval results.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2019.08.032