Semi-supervised cross-modal hashing with joint hyperboloid mapping

By using a small amount of label information to achieve favorable performance, semi-supervised methods are more practical in real-world application scenarios. However, existing semi-supervised cross-modal retrieval methods mainly focus on preserving similarities and learning more consistent hash cod...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2024-11, Vol.304, p.112547, Article 112547
Hauptverfasser: Fu, Hao, Gu, Guanghua, Dou, Yiyang, Li, Zhuoyi, Zhao, Yao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:By using a small amount of label information to achieve favorable performance, semi-supervised methods are more practical in real-world application scenarios. However, existing semi-supervised cross-modal retrieval methods mainly focus on preserving similarities and learning more consistent hash codes yet overlook the importance of constructing a joint abstract space shared by multi-modal embeddings. In this paper, we propose a novel Semi-supervised Cross-modal Hashing with Joint Hyperboloid Mapping (SCH-JHM). Firstly, we present a diffusion-based teacher model in SCH-JHM to learn the generalized semantic knowledge and output the pseudo-labels for unlabeled data. Secondly, SCH-JHM establishes a five-tuple plane, resembling an hourglass, for each retrieval task based on the queries, positive pairs, negative pairs, semi-supervised positive pairs, and semi-supervised negative pairs included in the semi-supervised cross-modal retrieval task. Furthermore, it projects the 12 tasks from the image, text, video, and audio modalities into a joint hyperboloid space. Finally, the student model in SCH-JHM is employed to explore the latent semantic relevance between filtered heterogeneous entities, which can be considered as a supervised process. Comprehensive experiments compared with state-of-the-art methods on three widely used datasets verify the effectiveness of our proposed approach.
ISSN:0950-7051
DOI:10.1016/j.knosys.2024.112547