Absolute Monocular Depth Estimation on Robotic Visual and Kinematics Data via Self-Supervised Learning

Accurate estimation of absolute depth from a monocular endoscope is a fundamental task for automatic navigation systems in robotic surgery. Previous works solely rely on uni-modal data ( i.e. , monocular images), which can only estimate depth values arbitrarily scaled with the real world. In this pa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on automation science and engineering 2024-06, p.1-14
Hauptverfasser: Wei, Ruofeng, Li, Bin, Zhong, Fangxun, Mo, Hangjie, Dou, Qi, Liu, Yun-Hui, Sun, Dong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Accurate estimation of absolute depth from a monocular endoscope is a fundamental task for automatic navigation systems in robotic surgery. Previous works solely rely on uni-modal data ( i.e. , monocular images), which can only estimate depth values arbitrarily scaled with the real world. In this paper, we present a novel framework, SADER, which explores vision and robot kinematics to estimate the high-quality absolute depth for monocular surgical scenes. To jointly learn the multi-modal data, we introduce a self-distillation based two-stage training policy in the framework. In the first stage, a boosting depth module based on vision transformer is proposed to improve the relative depth estimation network that is trained in a self-supervised method. Then, we develop an algorithm to automatically compute the scale from robot kinematics. By coupling the scale and relative depth data, pseudo absolute depth labels for all images are yielded. In the second stage, we re-train the network with 3D loss supervised by pseudo labels. To make our method generalize to different endoscopes, the learning of endoscopic intrinsics is integrated into the network. In addition, we did cadaver experiments to collect new surgical depth estimation data about robotic laparoscopy for evaluation. Experimental results on public SCARED and cadaver data demonstrate that the SADER outperforms previous state-of-art even stereo-based methods with an accuracy error under 1.90 mm, proving the feasibility of our approach to recover the absolute depth with monocular inputs. Note to Practitioners -This paper aims to solve the problem of absolute monocular depth estimation in automatic surgical navigation by leveraging the multi-modal data from the robot-based endoscopic system. Accurate depth perception with real scales of the monocular scene is essential for the control of surgical robots in automatic navigation. However, current methods can only predict the relative depth of the surgical scene using monocular images. In this article, we propose a self-supervised learning-based method to achieve high-quality absolute depth estimation of monocular endoscopic images. It neither needs manual data annotation, nor other imaging modalities. The experiments extensively validate the feasibility and high performance of our framework for absolute depth estimation on monocular endoscopes. This absolute depth perception framework can be potentially encapsulated into the automatic navigation system in th
ISSN:1545-5955
1558-3783
DOI:10.1109/TASE.2024.3409392