Minimal data requirement for realistic endoscopic image generation with Stable Diffusion

Purpose Computer-assisted surgical systems provide support information to the surgeon, which can improve the execution and overall outcome of the procedure. These systems are based on deep learning models that are trained on complex and challenging-to-annotate data. Generating synthetic data can ove...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal for computer assisted radiology and surgery 2024-03, Vol.19 (3), p.531-539
Hauptverfasser:	Kaleta, Joanna, Dall’Alba, Diego, Płotka, Szymon, Korzeniowski, Przemysław
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Imaging Computer Science Deep learning Endoscopy Guidance systems Health Informatics Humans Image processing Image Processing, Computer-Assisted - methods Imaging Medicine Medicine & Public Health Original Original Article Pattern Recognition and Graphics Radiology Surgery Synthetic data Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Purpose Computer-assisted surgical systems provide support information to the surgeon, which can improve the execution and overall outcome of the procedure. These systems are based on deep learning models that are trained on complex and challenging-to-annotate data. Generating synthetic data can overcome these limitations, but it is necessary to reduce the domain gap between real and synthetic data. Methods We propose a method for image-to-image translation based on a Stable Diffusion model, which generates realistic images starting from synthetic data. Compared to previous works, the proposed method is better suited for clinical application as it requires a much smaller amount of input data and allows finer control over the generation of details by introducing different variants of supporting control networks. Results The proposed method is applied in the context of laparoscopic cholecystectomy, using synthetic and real data from public datasets. It achieves a mean Intersection over Union of 69.76%, significantly improving the baseline results (69.76 vs. 42.21%). Conclusions The proposed method for translating synthetic images into images with realistic characteristics will enable the training of deep learning methods that can generalize optimally to real-world contexts, thereby improving computer-assisted intervention guidance systems.
ISSN:	1861-6429 1861-6410 1861-6429
DOI:	10.1007/s11548-023-03030-w