Mining peripheral arterial disease cases from narrative clinical notes using natural language processing

Abstract Objective Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the N...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of vascular surgery 2017-06, Vol.65 (6), p.1753-1761
Hauptverfasser:	Afzal, Naveed, PhD, Sohn, Sunghwan, PhD, Abram, Sara, MD, Scott, Christopher G., MS, Chaudhry, Rajeev, MBBS, MPH, Liu, Hongfang, PhD, Kullo, Iftikhar J., MD, Arruda-Olson, Adelaide M., MD, PhD
Format:	Artikel
Sprache:	eng
Schlagworte:	Administrative Claims, Healthcare Algorithms Ankle Brachial Index Data Mining - methods Databases, Factual Electronic Health Records Humans International Classification of Diseases Lower Extremity - blood supply Minnesota Models, Statistical Natural Language Processing Peripheral Arterial Disease - classification Peripheral Arterial Disease - diagnosis Retrospective Studies Surgery
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Abstract Objective Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm with billing code algorithms, using ankle-brachial index test results as the gold standard. Methods We compared the performance of the NLP algorithm to (1) results of gold standard ankle-brachial index; (2) previously validated algorithms based on relevant International Classification of Diseases, Ninth Revision diagnostic codes (simple model); and (3) a combination of International Classification of Diseases, Ninth Revision codes with procedural codes (full model). A dataset of 1569 patients with PAD and controls was randomly divided into training (n = 935) and testing (n = 634) subsets. Results We iteratively refined the NLP algorithm in the training set including narrative note sections, note types, and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP, 91.8%; full model, 81.8%; simple model, 83%; P < .001), positive predictive value (NLP, 92.9%; full model, 74.3%; simple model, 79.9%; P < .001), and specificity (NLP, 92.5%; full model, 64.2%; simple model, 75.9%; P < .001). Conclusions A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support.
ISSN:	0741-5214 1097-6809 1097-6809
DOI:	10.1016/j.jvs.2016.11.031