PCTDepth: Exploiting Parallel CNNs and Transformer via Dual Attention for Monocular Depth Estimation

Monocular depth estimation (MDE) has made great progress with the development of convolutional neural networks (CNNs). However, these approaches suffer from essential shortsightedness due to the utilization of insufficient feature-based reasoning. To this end, we propose an effective parallel CNNs a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural processing letters 2024-02, Vol.56 (2), p.73, Article 73
Hauptverfasser: Xia, Chenxing, Duan, Xiuzhen, Gao, Xiuju, Ge, Bin, Li, Kuan-Ching, Fang, Xianjin, Zhang, Yan, Yang, Ke
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Monocular depth estimation (MDE) has made great progress with the development of convolutional neural networks (CNNs). However, these approaches suffer from essential shortsightedness due to the utilization of insufficient feature-based reasoning. To this end, we propose an effective parallel CNNs and Transformer model for MDE via dual attention (PCTDepth). Specifically, we use two stream backbones to extract features, where ResNet and Swin Transformer are utilized to obtain local detail features and global long-range dependencies, respectively. Furthermore, a hierarchical fusion module (HFM) is designed to actively exchange beneficial information for the complementation of each representation during the intermediate fusion. Finally, a dual attention module is incorporated for each fused feature in the decoder stage to improve the accuracy of the model by enhancing inter-channel correlations and focusing on relevant spatial locations. Comprehensive experiments on the KITTI dataset demonstrate that the proposed model consistently outperforms the other state-of-the-art methods.
ISSN:1573-773X
1370-4621
1573-773X
DOI:10.1007/s11063-024-11524-0