Artificial intelligence in commercial fracture detection products: a systematic review and meta-analysis of diagnostic test accuracy

Conventional radiography (CR) is primarily utilized for fracture diagnosis. Artificial intelligence (AI) for CR is a rapidly growing field aimed at enhancing efficiency and increasing diagnostic accuracy. However, the diagnostic performance of commercially available AI fracture detection solutions (...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientific reports 2024-10, Vol.14 (1), p.23053-18, Article 23053
Hauptverfasser: Husarek, Julius, Hess, Silvan, Razaeian, Sam, Ruder, Thomas D., Sehmisch, Stephan, Müller, Martin, Liodakis, Emmanouil
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Conventional radiography (CR) is primarily utilized for fracture diagnosis. Artificial intelligence (AI) for CR is a rapidly growing field aimed at enhancing efficiency and increasing diagnostic accuracy. However, the diagnostic performance of commercially available AI fracture detection solutions (CAAI-FDS) for CR in various anatomical regions, their synergy with human assessment, as well as the influence of industry funding on reported accuracy are unknown. Peer-reviewed diagnostic test accuracy (DTA) studies were identified through a systematic review on Pubmed and Embase. Diagnostic performance measures were extracted especially for different subgroups such as product, type of rater (stand-alone AI, human unaided, human aided), funding, and anatomical region. Pooled measures were obtained with a bivariate random effects model. The impact of rater was evaluated with comparative meta-analysis. Seventeen DTA studies of seven CAAI-FDS analyzing 38,978 x-rays with 8,150 fractures were included. Stand-alone AI studies ( n  = 15) evaluated five CAAI-FDS; four with good sensitivities (> 90%) and moderate specificities (80–90%) and one with very poor sensitivity ( 95%). Pooled sensitivities were good to excellent, and specificities were moderate to good in all anatomical regions ( n  = 7) apart from ribs ( n  = 4; poor sensitivity / moderate specificity) and spine ( n  = 4; excellent sensitivity / poor specificity). Funded studies ( n  = 4) had higher sensitivity (+ 5%) and lower specificity (-4%) than non-funded studies ( n  = 11). Sensitivity did not differ significantly between stand-alone AI and human AI aided ratings ( p  = 0.316) but specificity was significantly higher the latter group ( p  
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-024-73058-8