Narrowing the variance of variational cross-encoder for cross-modal hashing
Cross-modal hashing which embeds data to binary codes is an efficient tool for retrieving heterogeneous but correlated multimedia data. In real applications, the sizes of queries are much larger than that of the training set and the queries may be dissimilar to training data, which lays bare the sho...
Gespeichert in:
Veröffentlicht in: | Multimedia systems 2023-12, Vol.29 (6), p.3421-3430 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Cross-modal hashing which embeds data to binary codes is an efficient tool for retrieving heterogeneous but correlated multimedia data. In real applications, the sizes of queries are much larger than that of the training set and the queries may be dissimilar to training data, which lays bare the shortage of generalization of deterministic models, such as cross-encoder and autoencoder. In this paper, we design a variational cross-encoder (VCE), a generative model, to tackle this problem. At the bottleneck layer, the VCE outputs distributions parameterized by means and variances. As VCE can generate diversified data using noises, the proposed model can perform better in testing data. Ideally, each distribution is expected to describe a category of data and samples of this distribution are expected to generate data in the same category. Under this expectation, the means and variances can be used as real codes for input data. However, the generated data generally are not belonging to the same category as the input data. Hence, we add a penalty term on variance output of VCE and use means as real codes for further generating hashing codes. Experiments on three widely used datasets validate the effectiveness of our method. |
---|---|
ISSN: | 0942-4962 1432-1882 |
DOI: | 10.1007/s00530-023-01161-3 |