Combination subspace graph learning for cross-modal retrieval
In this paper, a novel supervised cross-modal retrieval method, combination subspace graph learning (CSGL), is proposed, which primarily concentrates on the cross retrieval between images and texts, i.e., image retrieving texts (I2T) and text retrieving images (T2I). To project multimodal data from...
Gespeichert in:
Veröffentlicht in: | Alexandria engineering journal 2020-06, Vol.59 (3), p.1333-1343 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, a novel supervised cross-modal retrieval method, combination subspace graph learning (CSGL), is proposed, which primarily concentrates on the cross retrieval between images and texts, i.e., image retrieving texts (I2T) and text retrieving images (T2I). To project multimodal data from a low-level feature space into a latent common subspace, most classical methods would learn the projective matrix separately for each mode, ignoring the consistency between different modes. Graph regularization is added to our objective function to preserve the structure of the original data in the projective space. Furthermore, to avoid the suboptimal solution during optimization, we use the collaborative learning strategy to obtain the projective matrix directly, which unites all the modes for better projection. Generally, the CSGL method takes the advantage of the semantic information and the original distribution of the image and text to obtain a more discriminative projection, which is learned in combination, rather than learning individually. Experimental results on three benchmark datasets, Wikipedia, Pascal Sentence, and INRIA-Websearch, show that the proposed method outperforms the state-of-the-art methods. |
---|---|
ISSN: | 1110-0168 2090-2670 |
DOI: | 10.1016/j.aej.2020.02.034 |