Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition

Most existing feature learning-based methods for RGB-D object recognition either combine RGB and depth data in an undifferentiated manner from the outset, or learn features from color and depth separately, which do not adequately exploit different characteristics of the two modalities or utilize the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2015-11, Vol.17 (11), p.1887-1898
Hauptverfasser:	Wang, Anran, Lu, Jiwen, Cai, Jianfei, Cham, Tat-Jen, Wang, Gang
Format:	Artikel
Sprache:	eng
Schlagworte:	Back propagation Color Convergence Correlation Deep learning Feature extraction Harnesses Image color analysis Labeling large-margin feature learning Learning Machine learning multi-modality Multimedia Neural networks Object recognition RGB-D object recognition State of the art
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Most existing feature learning-based methods for RGB-D object recognition either combine RGB and depth data in an undifferentiated manner from the outset, or learn features from color and depth separately, which do not adequately exploit different characteristics of the two modalities or utilize the shared relationship between the modalities. In this paper, we propose a general CNN-based multi-modal learning framework for RGB-D object recognition. We first construct deep CNN layers for color and depth separately, which are then connected with a carefully designed multi-modal layer. This layer is designed to not only discover the most discriminative features for each modality, but is also able to harness the complementary relationship between the two modalities. The results of the multi-modal layer are back-propagated to update parameters of the CNN layers, and the multi-modal feature learning and the back-propagation are iteratively performed until convergence. Experimental results on two widely used RGB-D object datasets show that our method for general multi-modal learning achieves comparable performance to state-of-the-art methods specifically designed for RGB-D data.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2015.2476655