Self-supervised learning of monocular depth and ego-motion estimation for non-rigid scenes in wireless capsule endoscopy videos

Gastrointestinal (GI) cancers represent the most widespread type of cancer worldwide. Wireless capsule endoscopy (WCE), an innovative, capsule-sized endoscope, has the potential to revolutionize both the diagnosis and treatment of GI cancers as well as other GI diseases by offering patients a less i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Biomedical signal processing and control 2024-05, Vol.91, p.105978, Article 105978
Hauptverfasser: Liao, Chao, Wang, Chengliang, Wang, Peng, Wu, Hao, Wang, Hongqian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Gastrointestinal (GI) cancers represent the most widespread type of cancer worldwide. Wireless capsule endoscopy (WCE), an innovative, capsule-sized endoscope, has the potential to revolutionize both the diagnosis and treatment of GI cancers as well as other GI diseases by offering patients a less invasive and more comfortable option. Nonetheless, WCE videos frequently display non-rigid transformations and brightness fluctuations, rendering prior simultaneous localization and mapping (SLAM) approaches unfeasible. The depth can assist in recognizing and monitoring potential obstructions or anomalies when localization is required. In this paper, we present a self-supervised model, SfMLearner-WCE, specifically designed for estimating depth and ego motion in WCE videos. Our approach incorporates a pose estimation network and a Transformer network with a global self-attention mechanism. To ensure high-quality depth and pose estimation, we propose learnable binary per-pixel masks to eliminate misaligned image regions arising from non-rigid transformations or significant changes in lighting. Additionally, we introduce multi-interval frame sampling to enhance training data diversity, coupled with long-term pose consistency regularization. We present a comprehensive evaluation of the performance of SfMLearner-WCE in comparison with five state-of-the-art self-supervised SLAM methods. Our proposed approach is rigorously assessed on three WCE datasets. The experimental results demonstrate our approach achieves high-quality depth estimation and high-precision ego-motion estimation for non-rigid scenes in WCE videos, outperforming other self-supervised SLAM methods. In the quantitative evaluation of depth estimation using the ColonDepth dataset, an absolute relative error of 0.232 was observed. Additionally, during the quantitative assessment of ego-motion estimation on the ColonSim dataset, a translation drift percentage of 43.176% was achieved at a frame rate of 2 frames per second. The experimental analysis conducted in this study offers evidence of the effectiveness and robustness of our proposed method, SfMLearner-WCE, in non-rigid scenes of WCE videos. SfMLearner-WCE assists in enhancing diagnostic efficiency, enabling physicians to navigate and analyze WCE videos more effectively, benefiting patient outcomes. Our code will be released at https://github.com/fisherliaoc/SfMLearner-WCE. •Transformer improves pose estimation with self-attention mechanism.•Multiple fr
ISSN:1746-8094
1746-8108
DOI:10.1016/j.bspc.2024.105978