Can Artificial Intelligence Reliably Report Chest X-Rays?: Radiologist Validation of an Algorithm trained on 2.3 Million X-Rays

Background: Chest X-rays are the most commonly performed, cost-effective diagnostic imaging tests ordered by physicians. A clinically validated AI system that can reliably separate normals from abnormals can be invaluble particularly in low-resource settings. The aim of this study was to develop and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Putha, Preetham, Tadepalli, Manoj, Reddy, Bhargava, Raj, Tarun, Chiramal, Justy Antony, Govil, Shalini, Sinha, Namita, KS, Manjunath, Reddivari, Sundeep, Jagirdar, Ammar, Rao, Pooja, Warier, Prashant
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Background: Chest X-rays are the most commonly performed, cost-effective diagnostic imaging tests ordered by physicians. A clinically validated AI system that can reliably separate normals from abnormals can be invaluble particularly in low-resource settings. The aim of this study was to develop and validate a deep learning system to detect various abnormalities seen on a chest X-ray. Methods: A deep learning system was trained on 2.3 million chest X-rays and their corresponding radiology reports to identify various abnormalities seen on a Chest X-ray. The system was tested against - 1. A three-radiologist majority on an independent, retrospectively collected set of 2000 X-rays(CQ2000) 2. Radiologist reports on a separate validation set of 100,000 scans(CQ100k). The primary accuracy measure was area under the ROC curve (AUC), estimated separately for each abnormality and for normal versus abnormal scans. Results: On the CQ2000 dataset, the deep learning system demonstrated an AUC of 0.92(CI 0.91-0.94) for detection of abnormal scans, and AUC(CI) of 0.96(0.94-0.98), 0.96(0.94-0.98), 0.95(0.87-1), 0.95(0.92-0.98), 0.93(0.90-0.96), 0.89(0.83-0.94), 0.91(0.87-0.96), 0.94(0.93-0.96), 0.98(0.97-1) for the detection of blunted costophrenic angle, cardiomegaly, cavity, consolidation, fibrosis, hilar enlargement, nodule, opacity and pleural effusion. The AUCs were similar on the larger CQ100k dataset except for detecting normals where the AUC was 0.86(0.85-0.86). Interpretation: Our study demonstrates that a deep learning algorithm trained on a large, well-labelled dataset can accurately detect multiple abnormalities on chest X-rays. As these systems improve in accuracy, applying deep learning to widen the reach of chest X-ray interpretation and improve reporting efficiency will add tremendous value in radiology workflows and public health screenings globally.
DOI:	10.48550/arxiv.1807.07455