Deep Learning to Classify Radiology Free-Text Reports

Purpose To evaluate the performance of a deep learning convolutional neural network (CNN) model compared with a traditional natural language processing (NLP) model in extracting pulmonary embolism (PE) findings from thoracic computed tomography (CT) reports from two institutions. Materials and Metho...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Radiology 2018-03, Vol.286 (3), p.845-852
Hauptverfasser:	Chen, Matthew C, Ball, Robyn L, Yang, Lingyao, Moradzadeh, Nathaniel, Chapman, Brian E, Larson, David B, Langlotz, Curtis P, Amrhein, Timothy J, Lungren, Matthew P
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Humans Machine Learning Natural Language Processing Neural Networks (Computer) Pulmonary Embolism - diagnostic imaging Radiography, Thoracic - methods Reproducibility of Results ROC Curve Sensitivity and Specificity Tomography, X-Ray Computed - methods
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Purpose To evaluate the performance of a deep learning convolutional neural network (CNN) model compared with a traditional natural language processing (NLP) model in extracting pulmonary embolism (PE) findings from thoracic computed tomography (CT) reports from two institutions. Materials and Methods Contrast material-enhanced CT examinations of the chest performed between January 1, 1998, and January 1, 2016, were selected. Annotations by two human radiologists were made for three categories: the presence, chronicity, and location of PE. Classification of performance of a CNN model with an unsupervised learning algorithm for obtaining vector representations of words was compared with the open-source application PeFinder. Sensitivity, specificity, accuracy, and F1 scores for both the CNN model and PeFinder in the internal and external validation sets were determined. Results The CNN model demonstrated an accuracy of 99% and an area under the curve value of 0.97. For internal validation report data, the CNN model had a statistically significant larger F1 score (0.938) than did PeFinder (0.867) when classifying findings as either PE positive or PE negative, but no significant difference in sensitivity, specificity, or accuracy was found. For external validation report data, no statistical difference between the performance of the CNN model and PeFinder was found. Conclusion A deep learning CNN model can classify radiology free-text reports with accuracy equivalent to or beyond that of an existing traditional NLP model. RSNA, 2017 Online supplemental material is available for this article.
ISSN:	0033-8419 1527-1315
DOI:	10.1148/radiol.2017171115