Multi-dimensional Residual Dense Attention Network for Stereo Matching

Very deep convolutional neural networks (CNN) have recently achieved great success in stereo matching. It is still highly desirable to learn a robust feature map to improve ill-posed regions such as weakly-textured regions, reflective surfaces, and repetitive patterns. Therefore, we propose an end-t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2019-01, Vol.7, p.1-1
Hauptverfasser:	Zhang, Guanghui, Zhu, Dongchen, Shi, Wenjun, Ye, Xiaoqing, Li, Jiamao, Zhang, Xiaolin
Format:	Artikel
Sprache:	eng
Schlagworte:	3D attention mechanism Artificial neural networks Datasets Feature extraction Feature maps hierarchical Matching Modules multi-dimensional residual dense attention Stereo matching
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Very deep convolutional neural networks (CNN) have recently achieved great success in stereo matching. It is still highly desirable to learn a robust feature map to improve ill-posed regions such as weakly-textured regions, reflective surfaces, and repetitive patterns. Therefore, we propose an end-to-end Multi-dimensional Residual Dense Attention Network (MRDA-Net) in this paper, focusing on more comprehensive pixel-wise feature extraction. Our proposed network consists of two parts: the 2D residual dense attention net for feature extraction and the 3D convolutional attention net for matching. Our designed 2D residual dense attention net uses dense network structure to fully exploit the hierarchical features from preceding convolutional layers, and uses residual network structure to fuse low-level structure information and high-level semantic information. The 2D attention module of the net aims to adaptively recalibrate channel-wise features to be more concerned about informative features. Our proposed 3D convolutional attention net further expands attention mechanism for matching. The stacked hourglass module of the net is exploited to extract multi-scale context information as well as geometry information. The novel 3D attention module of the net aggregates hierarchical sub-cost volumes adaptively instead of manually, and then achieves a comprehensive recalibrated cost volume for more correct disparity computation. Experiments demonstrate that our approach achieves state-of-the-art accuracy on Scene Flow dataset, KITTI 2012, and KITTI 2015 stereo dataset.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2019.2911618