Deep convolutional neural network for classification of thyroid nodules on ultrasound: Comparison of the diagnostic performance with that of radiologists

•On ultrasound images, deep learning-trained models demonstrated comparable diagnostic performance to radiologists in differentiating malignant from benign thyroid nodules.•VGG16 model showed the best diagnostic performance in internal (AUC, 0.86; sensitivity, 91.8%; specificity, 73.2%) and external...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	European journal of radiology 2022-07, Vol.152, p.110335-110335, Article 110335
Hauptverfasser:	Kim, Yeon-Jae, Choi, Yangsean, Hur, Su-Jin, Park, Ki-Sun, Kim, Hyun-Jin, Seo, Minkook, Lee, Min Kyoung, Jung, So-Lyung, Jung, Chan Kwon
Format:	Artikel
Sprache:	eng
Schlagworte:	Biopsy, Fine-Needle Deep Learning Sensitivity and Specificity Thyroid Nodule
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•On ultrasound images, deep learning-trained models demonstrated comparable diagnostic performance to radiologists in differentiating malignant from benign thyroid nodules.•VGG16 model showed the best diagnostic performance in internal (AUC, 0.86; sensitivity, 91.8%; specificity, 73.2%) and external (AUC: 0.83; sensitivity: 78.6%; specificity: 76.8%) test sets.•Deep learning models may help radiologists’ diagnosis of thyroid nodules on ultrasound. This study aimed to train and validate deep learning (DL) models for differentiating malignant from benign thyroid nodules on US images and compare their performance with that of radiologists. Images of thyroid nodules in patients who underwent US-guided fine-needle aspiration biopsy at our institution between January 2010 and March 2020 were retrospectively reviewed. Four radiologists independently classified the images. Images of thyroid nodules were trained using three different image classification DL models (VGG16, VGG19, and ResNet). The diagnostic performances of the DL models were calculated for the internal and external datasets and compared with the diagnoses of the four radiologists. Pairwise comparisons of the AUCs between the radiologists and DL models were made using bootstrap-based tests. In total, 15,409 images from 7,321 patients (mean age, 60 ± 13 years; malignant nodules, 20.7%) were randomly grouped into training (n = 12,327) and validation (n = 3,082) sets. Independent internal (n = 432; 197 patients) and external (n = 168; 59 patients) test sets were also acquired. The DL models demonstrated a higher diagnostic performance than the radiologists in the internal test set (AUC, 0.83 – 0.86 vs. 0.71 – 0.76, P
ISSN:	0720-048X 1872-7727
DOI:	10.1016/j.ejrad.2022.110335