Text-assisted attention-based cross-modal hashing

As one of the hottest research topics in multimedia information retrieval, cross-modal hashing has drawn widespread attention in the past decades. How to minimize the semantic gap of heterogeneous data and accurately calculate the similarity of cross-modal data is a key challenge for this task. A pa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of multimedia information retrieval 2024-03, Vol.13 (1), p.3, Article 3
Hauptverfasser:	Yuan, Xiang, Shan, Shihao, Huo, Yuwen, Jiang, Junkai, Wu, Song
Format:	Artikel
Sprache:	eng
Schlagworte:	Codes Computer Science Data Mining and Knowledge Discovery Database Management Datasets Deep learning Image enhancement Image Processing and Computer Vision Information retrieval Information Storage and Retrieval Information Systems Applications (incl.Internet) Methods Modal data Multimedia Multimedia computer applications Multimedia Information Systems Neural networks Regular Paper Semantics Source code
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	As one of the hottest research topics in multimedia information retrieval, cross-modal hashing has drawn widespread attention in the past decades. How to minimize the semantic gap of heterogeneous data and accurately calculate the similarity of cross-modal data is a key challenge for this task. A paradigm for tackling this problem is to map features of multi-modal data into common space. However, these approaches lack inter-modal information interaction and may not achieve satisfactory results. To overcome this problem, we propose a novel text-assisted attention-based cross-modal hashing (TAACH) method in this paper. Firstly, TAACH relies on LabelNet supervision to guide the learning of hash functions for each modality. In addition, a novel text-assisted attention mechanism is designed in our TAACH to densely integrate text features into image features, perceiving their spatial correlation and enhancing the consistency of image and text knowledge. Extensive experiments on four benchmark datasets show the effectiveness of our proposed TAACH, and it also achieves competitive performance compared to state-of-the-art methods. The source code is available at https://github.com/SWU-CS-MediaLab/TAACH .
ISSN:	2192-6611 2192-662X
DOI:	10.1007/s13735-023-00311-7