Deep Hierarchy-Aware Proxy Hashing With Self-Paced Learning for Cross-Modal Retrieval

Due to its low storage cost and high retrieval efficiency, hashing technology is popularly applied in both academia and industry, which provides an interesting solution for cross-modal similarity retrieval. However, most existing supervised cross-modal hashing methods typically view the fixed-level...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2024-11, Vol.36 (11), p.5926-5939
Hauptverfasser: Huo, Yadong, Qin, Qibing, Zhang, Wenfeng, Huang, Lei, Nie, Jie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Due to its low storage cost and high retrieval efficiency, hashing technology is popularly applied in both academia and industry, which provides an interesting solution for cross-modal similarity retrieval. However, most existing supervised cross-modal hashing methods typically view the fixed-level semantic affinity defined by manual labels as supervised signals to guide hash learning, which only represents a small subset of complex semantic relations between multi-modal samples, thus impeding the hash function learning and degrading the obtained hash codes. In the paper, by learning shared hierarchy proxies, a novel deep cross-modal hashing framework, called Deep Hierarchy-aware Proxy Hashing (DHaPH), is proposed to construct the semantic hierarchy in a data-driven manner, thereby capturing the accurate fine-grained semantic relationships and achieving small intra-class scatter and big inter-class scatter. Specifically, by regarding the hierarchical proxies as learnable ancestors, a novel hierarchy-aware proxy loss is designed to model the latent semantic hierarchical structures from different modalities without prior hierarchy knowledge, in which similar samples share the same Lowest Common Ancestor (LCA) and dissimilar points have different LCA. Meanwhile, to adequately capture valuable semantic information from hard pairs, a multi-modal self-paced loss is introduced into cross-modal hashing to reweight multi-modal pairs dynamically, which enables the model to gradually focus on hard pairs while simultaneously learning universal patterns from multi-modal pairs. Extensive experiments on three available benchmark databases demonstrate that our proposed DHaPH framework outperforms the compared baselines with different evaluation metrics.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2024.3401050