Self-supervised learning-based weight adaptive hashing for fast cross-modal retrieval

Due to the low storage cost and fast search speed, hashing is widely used in cross-modal retrieval. However, there still remain some crucial bottlenecks: Firstly, there are not suitable big datasets for multimodal data. Secondly, imbalance instances will affect the accuracy of the retrieval system....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Signal, image and video processing image and video processing, 2021-06, Vol.15 (4), p.673-680
Hauptverfasser:	Li, Yifan, Wang, Xuan, Qi, Shuhan, Huang, Chengkai, Jiang, Zoe. L, Liao, Qing, Guan, Jian, Zhang, Jiajia
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Imaging Computer Science Datasets Feature extraction Image Processing and Computer Vision Multimedia Information Systems Original Paper Pattern Recognition and Graphics Regularization Retrieval Self-supervised learning Signal,Image and Speech Processing Supervised learning Vision Weight loss
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Due to the low storage cost and fast search speed, hashing is widely used in cross-modal retrieval. However, there still remain some crucial bottlenecks: Firstly, there are not suitable big datasets for multimodal data. Secondly, imbalance instances will affect the accuracy of the retrieval system. In this paper, we propose an end-to-end self-supervised learning-based weight adaptive hashing method for cross-modal retrieval. For the restriction of datasets, we use the self-supervised fashion to directly extract fine-grained features from labels and use them to supervise the hashing learning of other modalities. To overcome the problem of imbalance instances, we design an adaptive weight loss to flexibly adjust the weight of training samples according to their proportions. Besides these, we also use a binary approximation regularization term to reduce the regularization error. Experiments on MIRFLICKR-25K and NUS-WIDE datasets show that our method can improve 3% performance compared to other methods.
ISSN:	1863-1703 1863-1711
DOI:	10.1007/s11760-019-01534-0