Artificial intelligence-based detection of paediatric appendicular skeletal fractures: performance and limitations for common fracture types and locations

Background Research into artificial intelligence (AI)-based fracture detection in children is scarce and has disregarded the detection of indirect fracture signs and dislocations. Objective To assess the diagnostic accuracy of an existing AI-tool for the detection of fractures, indirect fracture sig...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pediatric radiology 2024-01, Vol.54 (1), p.136-145
Hauptverfasser: Altmann-Schneider, Irmhild, Kellenberger, Christian J., Pistorius, Sarah-Maria, Saladin, Camilla, Schäfer, Debora, Arslan, Nidanur, Fischer, Hanna L., Seiler, Michelle
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Background Research into artificial intelligence (AI)-based fracture detection in children is scarce and has disregarded the detection of indirect fracture signs and dislocations. Objective To assess the diagnostic accuracy of an existing AI-tool for the detection of fractures, indirect fracture signs, and dislocations. Materials and methods An AI software, BoneView (Gleamer, Paris, France), was assessed for diagnostic accuracy of fracture detection using paediatric radiology consensus diagnoses as reference. Radiographs from a single emergency department were enrolled retrospectively going back from December 2021, limited to 1,000 radiographs per body part. Enrolment criteria were as follows: suspected fractures of the forearm, lower leg, or elbow; age 0–18 years; and radiographs in at least two projections. Results Lower leg radiographs showed 607 fractures. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were high (87.5%, 87.5%, 98.3%, 98.3%, respectively). Detection rate was low for toddler’s fractures, trampoline fractures, and proximal tibial Salter-Harris-II fractures. Forearm radiographs showed 1,137 fractures. Sensitivity, specificity, PPV, and NPV were high (92.9%, 98.1%, 98.4%, 91.7%, respectively). Radial and ulnar bowing fractures were not reliably detected (one out of 11 radial bowing fractures and zero out of seven ulnar bowing fractures were correctly detected). Detection rate was low for styloid process avulsions, proximal radial buckle, and complete olecranon fractures. Elbow radiographs showed 517 fractures. Sensitivity and NPV were moderate (80.5%, 84.7%, respectively). Specificity and PPV were high (94.9%, 93.3%, respectively). For joint effusion, sensitivity, specificity, PPV, and NPV were moderate (85.1%, 85.7%, 89.5%, 80%, respectively). For elbow dislocations, sensitivity and PPV were low (65.8%, 50%, respectively). Specificity and NPV were high (97.7%, 98.8%, respectively). Conclusions The diagnostic performance of BoneView is promising for forearm and lower leg fractures. However, improvement is mandatory before clinicians can rely solely on AI-based paediatric fracture detection using this software. Graphical Abstract
ISSN:1432-1998
0301-0449
1432-1998
DOI:10.1007/s00247-023-05822-3