Automatic summarization of endoscopic skull base surgical videos through object detection and hidden Markov modeling

Endoscopic endonasal surgery is a medical procedure that utilizes an endoscopic video camera to view and manipulate a surgical site accessed through the nose. Despite these surgeries being video recorded, these videos are seldom reviewed or even saved in patient files due to the size and length of t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computerized medical imaging and graphics 2023-09, Vol.108, p.102248-102248, Article 102248
Hauptverfasser: King, Daniel, Adidharma, Lingga, Peng, Haonan, Moe, Kris, Li, Yangming, Yang, Zixin, Young, Christopher, Ferreria, Manuel, Humphreys, Ian, Abuzeid, Waleed M., Hannaford, Blake, Bly, Randall A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Endoscopic endonasal surgery is a medical procedure that utilizes an endoscopic video camera to view and manipulate a surgical site accessed through the nose. Despite these surgeries being video recorded, these videos are seldom reviewed or even saved in patient files due to the size and length of the video file. Editing to a manageable size may necessitate viewing 3 h or more of surgical video and manually splicing together the desired segments. We suggest a novel multi-stage video summarization procedure utilizing deep semantic features, tool detections, and video frame temporal correspondences to create a representative summarization. Summarization by our method resulted in a 98.2% reduction in overall video length while preserving 84% of key medical scenes. Furthermore, resulting summaries contained only 1% of scenes with irrelevant detail such as endoscope lens cleaning, blurry frames, or frames external to the patient. This outperformed leading commercial and open source summarization tools not designed for surgery, which only preserved 57% and 46% of key medical scenes in similar length summaries, and included 36% and 59% of scenes containing irrelevant detail. Experts agreed that on average (Likert Scale = 4) that the overall quality of the video was adequate to share with peers in its current state. •A method of automatic summarization of endoscopic surgery videos.•The application of convolutional neural network frame classifiers based on convolutional neural networks towards more accurate summarization.•A method of detecting shot boundaries based on convolutional features.•The creation of a tool detection dataset in endoscopic surgery videos and the analysis of several computer vision object detectors in the application of the created dataset.•The utilization of tool presence to identify surgical stage and their usage to improve summarization accuracy.
ISSN:0895-6111
1879-0771
DOI:10.1016/j.compmedimag.2023.102248