Digging into depth-adaptive structure for guided depth super-resolution

Depth maps captured by current depth cameras have a lower resolution than RGB images, driving the guided depth super-resolution (GDSR) a prominent research topic. Existing methods usually transfer the structure information of RGB images to guide the restoration of depth maps. However, due to the inh...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Displays 2024-09, Vol.84, p.102752, Article 102752
Hauptverfasser:	Hou, Yue, Nie, Lang, Lin, Chunyu, Guo, Baoqing, Zhao, Yao
Format:	Artikel
Sprache:	eng
Schlagworte:	Cross-modal attention Depth map super-resolution High-frequency information Subjective quality
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Depth maps captured by current depth cameras have a lower resolution than RGB images, driving the guided depth super-resolution (GDSR) a prominent research topic. Existing methods usually transfer the structure information of RGB images to guide the restoration of depth maps. However, due to the inherent modality gap, these approaches are prone to introduce spurious edges to the result, known as the “RGB texture over-transferred”. Therefore, accurate feature representation and selective utilization of RGB structure are two key challenges for GDSR. In this paper, we dig into depth-adaptive structure to address the above issues. We first design a Hybrid Encoder to incorporate the advantages of different network architectures into a unified feature extractor, striking a balance between model efficiency and performance to provide comprehensive high-level semantics. Subsequently, we leverage the high-frequency parts of depth maps to optimize those of RGB images through a cross-modal attention mechanism, effectively filtering out the unreasonable components from redundant textures and yielding depth-adaptive structural features. Finally, we integrate Discrete Cosine Transform (DCT) operations for feature reconstruction, enhancing model interpretability and forming the whole DASNet alongside the aforementioned modules. Experimental results demonstrate that our DASNet achieves quality improvements in both depth maps and synthetic views. •We introduce a novel Hybrid Encoder that amalgamates different network architectures for multi-stage feature extraction. This enables a robust and comprehensive feature representation.•We present Feature Adjustment and Fusion Module (FAFM) to optimize the high-frequency signals of the RGB image by bridging them with the high-frequency signals of the depth map via a cross-modal attention mechanism, thus focusing on the depth-related structure in RGB.•The experimental results show that our DASNet obtains quality improvement on depth maps and synthesized views, compared with state-of-the-art methods.
ISSN:	0141-9382
DOI:	10.1016/j.displa.2024.102752