Artificial intelligence in commercial fracture detection products: a systematic review and meta-analysis of diagnostic test accuracy
Conventional radiography (CR) is primarily utilized for fracture diagnosis. Artificial intelligence (AI) for CR is a rapidly growing field aimed at enhancing efficiency and increasing diagnostic accuracy. However, the diagnostic performance of commercially available AI fracture detection solutions (...
Gespeichert in:
Veröffentlicht in: | Scientific reports 2024-10, Vol.14 (1), p.23053-18, Article 23053 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Conventional radiography (CR) is primarily utilized for fracture diagnosis. Artificial intelligence (AI) for CR is a rapidly growing field aimed at enhancing efficiency and increasing diagnostic accuracy. However, the diagnostic performance of commercially available AI fracture detection solutions (CAAI-FDS) for CR in various anatomical regions, their synergy with human assessment, as well as the influence of industry funding on reported accuracy are unknown. Peer-reviewed diagnostic test accuracy (DTA) studies were identified through a systematic review on Pubmed and Embase. Diagnostic performance measures were extracted especially for different subgroups such as product, type of rater (stand-alone AI, human unaided, human aided), funding, and anatomical region. Pooled measures were obtained with a bivariate random effects model. The impact of rater was evaluated with comparative meta-analysis. Seventeen DTA studies of seven CAAI-FDS analyzing 38,978 x-rays with 8,150 fractures were included. Stand-alone AI studies (
n
= 15) evaluated five CAAI-FDS; four with good sensitivities (> 90%) and moderate specificities (80–90%) and one with very poor sensitivity ( 95%). Pooled sensitivities were good to excellent, and specificities were moderate to good in all anatomical regions (
n
= 7) apart from ribs (
n
= 4; poor sensitivity / moderate specificity) and spine (
n
= 4; excellent sensitivity / poor specificity). Funded studies (
n
= 4) had higher sensitivity (+ 5%) and lower specificity (-4%) than non-funded studies (
n
= 11). Sensitivity did not differ significantly between stand-alone AI and human AI aided ratings (
p
= 0.316) but specificity was significantly higher the latter group (
p
|
---|---|
ISSN: | 2045-2322 2045-2322 |
DOI: | 10.1038/s41598-024-73058-8 |