Attention based multilayer feature fusion convolutional neural network for unsupervised monocular depth estimation

Predicting depth from an image is an essential problem in the area of scene understanding and deep learning shows great potential in this area. Unsupervised methods regard image reconstruction loss as the supervised information which shows a great potential of application. Most methods face the prob...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neurocomputing (Amsterdam) 2021-01, Vol.423, p.343-352
Hauptverfasser:	Lei, Zeyu, Wang, Yan, Li, Zijian, Yang, Junyao
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science Computer Science, Artificial Intelligence Hybrid attention mechanism Monocular depth estimation Multilayer feature fusion Science & Technology Technology U-Net Unsupervised Learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Predicting depth from an image is an essential problem in the area of scene understanding and deep learning shows great potential in this area. Unsupervised methods regard image reconstruction loss as the supervised information which shows a great potential of application. Most methods face the problem that their depth estimation results are not accurate enough for detailed autopilot scenarios, which will limit its application. These methods are often based on the fully convolutional neural network, which is the most commonly used network in the image-to-image field. In this paper, aiming at optimizing the depth estimation network architecture, we propose two networks that fuse the features of different encoding layers for monocular depth estimation, a multilayer information fusion U-Net (FU-Net) and a more lightweight one (LFU-Net). In order to improve the efficiency of feature fusion, we propose a hybrid attention mechanism to optimize our network and named it as AgFU-Net. We compare our networks with other improvements of U-Net, and the result shows that our networks are more efficient. Also, the loss function is fine-tuned for the unsupervised depth estimation algorithm. Our improvements achieve results that are comparable with state-of-the-art unsupervised monocular depth prediction methods on the KITTI benchmarks.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2020.11.002