A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video
Generating accurate 3D reconstructions from endoscopic video is a promising avenue for longitudinal radiation-free analysis of sinus anatomy and surgical outcomes. Several methods for monocular reconstruction have been proposed, yielding visually pleasant 3D anatomical structures by retrieving relat...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Generating accurate 3D reconstructions from endoscopic video is a promising
avenue for longitudinal radiation-free analysis of sinus anatomy and surgical
outcomes. Several methods for monocular reconstruction have been proposed,
yielding visually pleasant 3D anatomical structures by retrieving relative
camera poses with structure-from-motion-type algorithms and fusion of monocular
depth estimates. However, due to the complex properties of the underlying
algorithms and endoscopic scenes, the reconstruction pipeline may perform
poorly or fail unexpectedly. Further, acquiring medical data conveys additional
challenges, presenting difficulties in quantitatively benchmarking these
models, understanding failure cases, and identifying critical components that
contribute to their precision. In this work, we perform a quantitative analysis
of a self-supervised approach for sinus reconstruction using endoscopic
sequences paired with optical tracking and high-resolution computed tomography
acquired from nine ex-vivo specimens. Our results show that the generated
reconstructions are in high agreement with the anatomy, yielding an average
point-to-mesh error of 0.91 mm between reconstructions and CT segmentations.
However, in a point-to-point matching scenario, relevant for endoscope tracking
and navigation, we found average target registration errors of 6.58 mm. We
identified that pose and depth estimation inaccuracies contribute equally to
this error and that locally consistent sequences with shorter trajectories
generate more accurate reconstructions. These results suggest that achieving
global consistency between relative camera poses and estimated depths with the
anatomy is essential. In doing so, we can ensure proper synergy between all
components of the pipeline for improved reconstructions that will facilitate
clinical application of this innovative technology. |
---|---|
DOI: | 10.48550/arxiv.2310.14364 |