Double-scale similarity with rich features for cross-modal retrieval
This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semanti...
Gespeichert in:
Veröffentlicht in: | Multimedia systems 2022, Vol.28 (5), p.1767-1777 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper proposes a method named Double-scale Similarity with Rich Features for Cross-modal Retrieval (DSRF) to handle the retrieval task between images and texts. The difficulties of cross-modal retrieval are manifested in how to establish a good similarity metric and obtain rich accurate semantics features. Most existing approaches map different modalities data into a common space by category labels and pair relations, which is insufficient to model the complex semantic relationships of multimodal data. A new similarity measurement method (Double-scale similarity) is proposed, in which the similarity of multimodal data does not depend on category labels only, but also on the objects involved. The retrieval result in the same category without identical objects will be punished appropriately, while the distance between the correct result and query is further close. Moreover, a semantics features extraction framework is designed to provide rich semantics features for the similarity metric. Multiple attention maps are created to focus on local features from different perspectives and obtain numerous semantics features. Distinguish from other works that accumulate multiple semantic representations for averaging, we use LSTM only with forgetting gate to eliminate the redundancy of repetitive information. Specifically, the forgetting factor is generated for each semantics features, and a larger forgetting factor coefficient removes the useless semantics information. We evaluate DSRF on two public benchmark, DSRF achieves competitive performance. |
---|---|
ISSN: | 0942-4962 1432-1882 |
DOI: | 10.1007/s00530-022-00933-7 |