Cross-Modal Generation and Pair Correlation Alignment Hashing
Cross-modal hashing is an effective cross-modal retrieval approach because of its low storage and high efficiency. However, most existing methods mainly utilize pre-trained networks to extract modality-specific features, while ignore the position information and lack information interaction between...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on intelligent transportation systems 2023-03, Vol.24 (3), p.3018-3026 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Cross-modal hashing is an effective cross-modal retrieval approach because of its low storage and high efficiency. However, most existing methods mainly utilize pre-trained networks to extract modality-specific features, while ignore the position information and lack information interaction between different modalities. To address those problems, in this paper, we propose a novel approach, named cross-modal generation and pair correlation alignment hashing (CMGCAH), which introduces transformer to exploit position information and utilizes cross-modal generative adversarial networks (GAN) to boost cross-modal information interaction. Concretely, a cross-modal interaction network based on conditional generative adversarial network and pair correlation alignment networks are proposed to generate cross-modal common representations. On the other hand, a transformer-based feature extraction network (TFEN) is designed to exploit position information, which can be propagated to text modality and enforce the common representation to be semantically consistent. Experiments are performed on widely used datasets with text-image modalities, and results show that the proposed method achieved competitive performance compared with many existing methods. |
---|---|
ISSN: | 1524-9050 1558-0016 |
DOI: | 10.1109/TITS.2022.3221787 |