Can Artificial Intelligence Reliably Report Chest X-Rays?: Radiologist Validation of an Algorithm trained on 2.3 Million X-Rays
Background: Chest X-rays are the most commonly performed, cost-effective diagnostic imaging tests ordered by physicians. A clinically validated AI system that can reliably separate normals from abnormals can be invaluble particularly in low-resource settings. The aim of this study was to develop and...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Background: Chest X-rays are the most commonly performed, cost-effective
diagnostic imaging tests ordered by physicians. A clinically validated AI
system that can reliably separate normals from abnormals can be invaluble
particularly in low-resource settings. The aim of this study was to develop and
validate a deep learning system to detect various abnormalities seen on a chest
X-ray. Methods: A deep learning system was trained on 2.3 million chest X-rays
and their corresponding radiology reports to identify various abnormalities
seen on a Chest X-ray. The system was tested against - 1. A three-radiologist
majority on an independent, retrospectively collected set of 2000
X-rays(CQ2000) 2. Radiologist reports on a separate validation set of 100,000
scans(CQ100k). The primary accuracy measure was area under the ROC curve (AUC),
estimated separately for each abnormality and for normal versus abnormal scans.
Results: On the CQ2000 dataset, the deep learning system demonstrated an AUC of
0.92(CI 0.91-0.94) for detection of abnormal scans, and AUC(CI) of
0.96(0.94-0.98), 0.96(0.94-0.98), 0.95(0.87-1), 0.95(0.92-0.98),
0.93(0.90-0.96), 0.89(0.83-0.94), 0.91(0.87-0.96), 0.94(0.93-0.96),
0.98(0.97-1) for the detection of blunted costophrenic angle, cardiomegaly,
cavity, consolidation, fibrosis, hilar enlargement, nodule, opacity and pleural
effusion. The AUCs were similar on the larger CQ100k dataset except for
detecting normals where the AUC was 0.86(0.85-0.86). Interpretation: Our study
demonstrates that a deep learning algorithm trained on a large, well-labelled
dataset can accurately detect multiple abnormalities on chest X-rays. As these
systems improve in accuracy, applying deep learning to widen the reach of chest
X-ray interpretation and improve reporting efficiency will add tremendous value
in radiology workflows and public health screenings globally. |
---|---|
DOI: | 10.48550/arxiv.1807.07455 |