Deep Learning Pitfall: Impact of Novel Ultrasound Equipment Introduction on Algorithm Performance and the Realities of Domain Adaptation

Objectives To test deep learning (DL) algorithm performance repercussions by introducing novel ultrasound equipment into a clinical setting. Methods Researchers introduced prospectively obtained inferior vena cava (IVC) videos from a similar patient population using novel ultrasound equipment to cha...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of ultrasound in medicine 2022-04, Vol.41 (4), p.855-863
Hauptverfasser: Blaivas, Michael, Blaivas, Laura N, Tsung, James W
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Objectives To test deep learning (DL) algorithm performance repercussions by introducing novel ultrasound equipment into a clinical setting. Methods Researchers introduced prospectively obtained inferior vena cava (IVC) videos from a similar patient population using novel ultrasound equipment to challenge a previously validated DL algorithm (trained on a common point of care ultrasound [POCUS] machine) to assess IVC collapse. Twenty‐one new videos were obtained for each novel ultrasound machine. The videos were analyzed for complete collapse by the algorithm and by 2 blinded POCUS experts. Cohen's kappa was calculated for agreement between the 2 POCUS experts and DL algorithm. Previous testing showed substantial agreement between algorithm and experts with Cohen's kappa of 0.78 (95% CI 0.49–1.0) and 0.66 (95% CI 0.31–1.0) on new patient data using, the same ultrasound equipment. Results Challenged with higher image quality (IQ) POCUS cart ultrasound videos, algorithm performance declined with kappa values of 0.31 (95% CI 0.19–0.81) and 0.39 (95% CI 0.11–0.89), showing fair agreement. Algorithm performance plummeted on a lower IQ, smartphone device with a kappa value of −0.09 (95% CI −0.95 to 0.76) and 0.09 (95% CI −0.65 to 0.82), respectively, showing less agreement than would be expected by chance. Two POCUS experts had near perfect agreement with a kappa value of 0.88 (95% CI 0.64–1.0) regarding IVC collapse. Conclusions Performance of this previously validated DL algorithm worsened when faced with ultrasound studies from 2 novel ultrasound machines. Performance was much worse on images from a lower IQ hand‐held device than from a superior cart‐based device.
ISSN:0278-4297
1550-9613
DOI:10.1002/jum.15765