Application of deep learning for semantic segmentation in robotic prostatectomy: Comparison of convolutional neural networks and visual transformers

Semantic segmentation is a fundamental part of the surgical application of deep learning. Traditionally, segmentation in vision tasks has been performed using convolutional neural networks (CNNs), but the transformer architecture has recently been introduced and widely investigated. We aimed to inve...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Investigative and clinical urology 2024, 65(6), , pp.551-558
Hauptverfasser:	Pak, Sahyun, Park, Sung Gon, Park, Jeonghyun, Choi, Hong Rock, Lee, Jun Ho, Lee, Wonchul, Cho, Sung Tae, Lee, Young Goo, Ahn, Hanjong
Format:	Artikel
Sprache:	eng
Schlagworte:	artificial intelligence computer vision systems Deep Learning Humans Male Neural Networks, Computer Original prostatectomy Prostatectomy - methods Prostatic Neoplasms - surgery Robotic Surgical Procedures - methods Semantics 비뇨기과학
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Semantic segmentation is a fundamental part of the surgical application of deep learning. Traditionally, segmentation in vision tasks has been performed using convolutional neural networks (CNNs), but the transformer architecture has recently been introduced and widely investigated. We aimed to investigate the performance of deep learning models in segmentation in robot-assisted radical prostatectomy (RARP) and identify which of the architectures is superior for segmentation in robotic surgery. Intraoperative images during RARP were obtained. The dataset was randomly split into training and validation data. Segmentation of the surgical instruments, bladder, prostate, vas and seminal vesicle was performed using three CNN models (DeepLabv3, MANet, and U-Net++) and three transformers (SegFormer, BEiT, and DPT), and their performances were analyzed. The overall segmentation performance during RARP varied across different model architectures. For the CNN models, DeepLabV3 achieved a mean Dice score of 0.938, MANet scored 0.944, and U-Net++ reached 0.930. For the transformer architectures, SegFormer attained a mean Dice score of 0.919, BEiT scored 0.916, and DPT achieved 0.940. The performance of CNN models was superior to that of transformer models in segmenting the prostate, vas, and seminal vesicle. Deep learning models provided accurate segmentation of the surgical instruments and anatomical structures observed during RARP. Both CNN and transformer models showed reliable predictions in the segmentation task; however, CNN models may be more suitable than transformer models for organ segmentation and may be more applicable in unusual cases. Further research with large datasets is needed.
ISSN:	2466-0493 2466-054X 2466-054X
DOI:	10.4111/icu.20240159