Pixel-Wise Recognition for Holistic Surgical Scene Understanding
This paper presents the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies (GraSP) dataset, a curated benchmark that models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity. Our approach encompasses long-term tasks, such a...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper presents the Holistic and Multi-Granular Surgical Scene
Understanding of Prostatectomies (GraSP) dataset, a curated benchmark that
models surgical scene understanding as a hierarchy of complementary tasks with
varying levels of granularity. Our approach encompasses long-term tasks, such
as surgical phase and step recognition, and short-term tasks, including
surgical instrument segmentation and atomic visual actions detection. To
exploit our proposed benchmark, we introduce the Transformers for Actions,
Phases, Steps, and Instrument Segmentation (TAPIS) model, a general
architecture that combines a global video feature extractor with localized
region proposals from an instrument segmentation model to tackle the
multi-granularity of our benchmark. Through extensive experimentation in ours
and alternative benchmarks, we demonstrate TAPIS's versatility and
state-of-the-art performance across different tasks. This work represents a
foundational step forward in Endoscopic Vision, offering a novel framework for
future research towards holistic surgical scene understanding. |
---|---|
DOI: | 10.48550/arxiv.2401.11174 |