Deep Neighborhood-aware Proxy Hashing with Uniform Distribution Constraint for Cross-modal Retrieval

Cross-modal retrieval methods based on hashing have gained significant attention in both academic and industrial research. Deep learning techniques have played a crucial role in advancing supervised cross-modal hashing methods, leading to significant practical improvements. Despite these achievement...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on multimedia computing communications and applications 2024-03, Vol.20 (6), p.1-23, Article 169
Hauptverfasser:	Huo, Yadong, Qibing, Qin, Dai, Jiangyan, Zhang, Wenfeng, Huang, Lei, Wang, Chengduan
Format:	Artikel
Sprache:	eng
Schlagworte:	Information retrieval Information systems Multimedia and multimodal retrieval
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Cross-modal retrieval methods based on hashing have gained significant attention in both academic and industrial research. Deep learning techniques have played a crucial role in advancing supervised cross-modal hashing methods, leading to significant practical improvements. Despite these achievements, current deep cross-modal hashing still encounters some underexplored limitations. Specifically, most of the available deep hashing usually utilizes pair-wise or triplet-wise strategies to promote the separation of the inter-classes by calculating the relative similarities between samples, weakening the compactness of intra-class data from different modalities, which could generate ambiguous neighborhoods. In this article, the Deep Neighborhood-aware Proxy Hashing (DNPH) framework is proposed to learn a discriminative embedding space with the original neighborhood relation preserved. By introducing learnable shared category proxies, the neighborhood-aware proxy loss is proposed to project the heterogeneous data into a unified common embedding, in which the sample is pulled closer to the corresponding category proxy and is pushed away from other proxies, capturing small within-class scatter and big between-class scatter. To enhance the quality of the obtained binary codes, the uniform distribution constraint is developed to make each hash bit independently obey the discrete uniform distribution. In addition, the discrimination loss is designed to preserve modality-specific semantic information of samples. Extensive experiments are performed on three benchmark datasets to prove that our proposed DNPH framework achieves comparable or even better performance compared with the state-of-the-art cross-modal retrieval applications. The corresponding code implementation of our DNPH framework is as follows: https://github.com/QinLab-WFU/OUR-DNPH.
ISSN:	1551-6857 1551-6865
DOI:	10.1145/3643639