The utility of including pathology reports in improving the computational identification of patients

Background: Celiac disease (CD) is a common autoimmune disorder. Efficient identification of patients may improve chronic management of the disease. Prior studies have shown searching International Classification of Diseases-9 (ICD-9) codes alone is inaccurate for identifying patients with CD. In th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of pathology informatics 2016, Vol.7 (1), p.46-46, Article 46
Hauptverfasser:	Chen, Wei, Huang, Yungui, Boyle, Brendan, Lin, Simon
Format:	Artikel
Sprache:	eng
Schlagworte:	Celiac classification machine learning natural language processing Original patient registry
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Background: Celiac disease (CD) is a common autoimmune disorder. Efficient identification of patients may improve chronic management of the disease. Prior studies have shown searching International Classification of Diseases-9 (ICD-9) codes alone is inaccurate for identifying patients with CD. In this study, we developed automated classification algorithms leveraging pathology reports and other clinical data in Electronic Health Records (EHRs) to refine the subset population preselected using ICD-9 code (579.0). Materials and Methods: EHRs were searched for established ICD-9 code (579.0) suggesting CD, based on which an initial identification of cases was obtained. In addition, laboratory results for tissue transglutaminse were extracted. Using natural language processing we analyzed pathology reports from upper endoscopy. Twelve machine learning classifiers using different combinations of variables related to ICD-9 CD status, laboratory result status, and pathology reports were experimented to find the best possible CD classifier. Ten-fold cross-validation was used to assess the results. Results: A total of 1498 patient records were used including 363 confirmed cases and 1135 false positive cases that served as controls. Logistic model based on both clinical and pathology report features produced the best results: Kappa of 0.78, F1 of 0.92, and area under the curve (AUC) of 0.94, whereas in contrast using ICD-9 only generated poor results: Kappa of 0.28, F1 of 0.75, and AUC of 0.63. Conclusion: Our automated classification system presented an efficient and reliable way to improve the performance of CD patient identification.
ISSN:	2153-3539 2229-5089 2153-3539
DOI:	10.4103/2153-3539.194838