Deep Ranking Distribution Preserving Hashing for Robust Multi-Label Cross-Modal Retrieval

Deep supervised hashing techniques have exhibited remarkable efficiency in cross-modal retrieval tasks, because they enable the transformation of data from different modalities into compact binary codes that preserve semantic similarity structures. Nonetheless, existing methods often rely on pairwis...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2024, Vol.26, p.7027-7042
Hauptverfasser: Song, Ge, Huang, Kai, Su, Hanwen, Song, Fengyi, Yang, Ming
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Deep supervised hashing techniques have exhibited remarkable efficiency in cross-modal retrieval tasks, because they enable the transformation of data from different modalities into compact binary codes that preserve semantic similarity structures. Nonetheless, existing methods often rely on pairwise or triplet relationships within known (or in-distribution) semantics during training, failing to capture the comprehensive ranking information inherent in web data that encompasses diverse concepts. In addition, these methods are vulnerable to out-of-distribution (OOD) semantic data when applied in realistic scenarios, resulting in suboptimal performance. In this paper, we propose ranking distribution preserving hashing (RDPH) to address these problems. We present a novel ranking loss, a differentiable surrogate that maximizes the NDCG metric for cross-modal retrieval. This loss incorporates two target ranking distributions derived from the ideal NDCG scores of samples and the cosine similarity of features. These distributions encourage RDPH to generate hash codes that approximate the desired inter-modal and intra-modal ranking distributions. To enhance the robustness of the hash codes against OOD data, RDPH leverages the CLIP paradigm to acquire OOD-resilient intermediate representations. Besides, we utilize the outlier exposure strategy to enhance the discriminative ability of OOD for hash codes under supervision by constructing auxiliary pseudo-OOD data from known data in feature space. Experiments on three datasets demonstrate that the proposed method achieves state-of-the-art performance on regular retrieval tasks and good results on simulated real-world retrieval tasks.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2024.3358995