Deep cross-view autoencoder network for multi-view learning

In many real-world applications, an increasing number of objects can be collected at varying viewpoints or by different sensors, which brings in the urgent demand for recognizing objects from distinct heterogeneous views. Although significant progress has been achieved recently, heterogeneous recogn...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2022-07, Vol.81 (17), p.24645-24664
Hauptverfasser: Mi, Jian-Xun, Fu, Chang-Qing, Chen, Tao, Gou, Tingting
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In many real-world applications, an increasing number of objects can be collected at varying viewpoints or by different sensors, which brings in the urgent demand for recognizing objects from distinct heterogeneous views. Although significant progress has been achieved recently, heterogeneous recognition (cross-view recognition) in multi-view learning is still challenging due to the complex correlations among views. Multi-view subspace learning is an effective solution, which attempts to obtain a common representation from downstream computations. Most previous methods are based on the idea of maximal correlation after feature extraction to establish the relationship among different views in a two-step manner, thus leading to performance deterioration. To overcome this drawback, in this paper, we propose a deep cross-view autoencoder network (DCVAE) that extracts the features of different views and establishes the correlation between views in one step to simultaneously handle view-specific, view-correlation, and consistency in a joint manner. Specifically, DCVAE contains self-reconstruction, newly designed cross-view reconstruction, and consistency constraint modules. Self-reconstruction ensures the view-specific, cross-view reconstruction transfers the information from one view to another view, and consistency constraint makes the representation of different views more consistent. The proposed model suffices to discover the complex correlation embedded in multi-view data and to integrate heterogeneous views into a latent common representation subspace. Furthermore, the 2D embeddings of the learned common representation subspace demonstrate the consistency constraint is valid and cross-view classification experiments verify the superior performance of DCVAE in the two-view scenario.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-022-12636-2