Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing

Heart failure (HF) is a major cause of morbidity and mortality. However, much of the clinical data is unstructured in the form of radiology reports, while the process of data collection and curation is arduous and time-consuming. We utilized a machine learning (ML)-based natural language processing...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2020-07, Vol.15 (7), p.e0236827-e0236827
Hauptverfasser:	Pandey, Mohit, Xu, Zhuoran, Sholle, Evan, Maliakal, Gabriel, Singh, Gurpreet, Fatima, Zahra, Larine, Daria, Lee, Benjamin C, Wang, Jing, van Rosendael, Alexander R, Baskaran, Lohendran, Shaw, Leslee J, Min, James K, Al'Aref, Subhi J
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Age Aged Artificial neural networks Biology and Life Sciences Cardiology Cohort Studies Computed tomography Computer and Information Sciences Confidence intervals Congestive heart failure Data collection Electronic Health Records Feature extraction Female Gender Heart diseases Heart failure Heart Failure - diagnostic imaging Heart Failure - mortality Heart Failure - pathology High frequencies Humans Image Processing, Computer-Assisted - methods Language Learning algorithms Machine Learning Male Medicine Medicine and Health Sciences Morbidity Mortality Natural Language Processing Neural networks Neural Networks, Computer Patients Pleural effusion Prognosis Radiography, Abdominal - methods Radiography, Thoracic - methods Radiology Regression analysis Research and Analysis Methods Social Sciences Statistical analysis Survival Rate Time dependence Tomography, X-Ray Computed - methods Unstructured data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Heart failure (HF) is a major cause of morbidity and mortality. However, much of the clinical data is unstructured in the form of radiology reports, while the process of data collection and curation is arduous and time-consuming. We utilized a machine learning (ML)-based natural language processing (NLP) approach to extract clinical terms from unstructured radiology reports. Additionally, we investigate the prognostic value of the extracted data in predicting all-cause mortality (ACM) in HF patients. This observational cohort study utilized 122,025 thoracoabdominal computed tomography (CT) reports from 11,808 HF patients obtained between 2008 and 2018. 1,560 CT reports were manually annotated for the presence or absence of 14 radiographic findings, in addition to age and gender. Thereafter, a Convolutional Neural Network (CNN) was trained, validated and tested to determine the presence or absence of these features. Further, the ability of CNN to predict ACM was evaluated using Cox regression analysis on the extracted features. 11,808 CT reports were analyzed from 11,808 patients (mean age 72.8 ± 14.8 years; 52.7% (6,217/11,808) male) from whom 3,107 died during the 10.6-year follow-up. The CNN demonstrated excellent accuracy for retrieval of the 14 radiographic findings with area-under-the-curve (AUC) ranging between 0.83-1.00 (F1 score 0.84-0.97). Cox model showed the time-dependent AUC for predicting ACM was 0.747 (95% confidence interval [CI] of 0.704-0.790) at 30 days. An ML-based NLP approach to unstructured CT reports demonstrates excellent accuracy for the extraction of predetermined radiographic findings, and provides prognostic value in HF patients.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0236827