Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records
Abstract Objectives To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes. Method...
Gespeichert in:
Veröffentlicht in: | Rheumatology (Oxford, England) England), 2020-05, Vol.59 (5), p.1059-1065 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1065 |
---|---|
container_issue | 5 |
container_start_page | 1059 |
container_title | Rheumatology (Oxford, England) |
container_volume | 59 |
creator | Zhao, Sizheng Steven Hong, Chuan Cai, Tianrun Xu, Chang Huang, Jie Ermann, Joerg Goodson, Nicola J Solomon, Daniel H Cai, Tianxi Liao, Katherine P |
description | Abstract
Objectives
To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes.
Methods
An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms—on a training set of 127 axSpA cases and 423 non-cases—and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only.
Results
NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80–0.87).
Conclusion
Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research. |
doi_str_mv | 10.1093/rheumatology/kez375 |
format | Article |
fullrecord | <record><control><sourceid>oup_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7850056</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/rheumatology/kez375</oup_id><sourcerecordid>10.1093/rheumatology/kez375</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-cdd5d7f2da59110f86473021117fc2bb9eb4b91b1ae3fb5ab452707cb802d8d13</originalsourceid><addsrcrecordid>eNqNkctOwzAQRS0EolD4AiTkH2hrx3GTbJBQxaNSJTawjvxKYkjtyHYqyoJvxyVQlR2r8TzOHXkuAFcYTTEqyMw1ql-zYFtbb2dv6oNk9Aic4XSeTBAhyfH-naQjcO79K0KIYpKfghHBlNB5Qc7A59II6zrrWNCmhoaF3rEWtszUPasV7JwVyvtdL1io1zHfKChaFmuVFpGyBtoKsncdMd9ZI7etZS40TgftYf-NqlaJ4KzRAjaKtaGBTsW10l-Ak4q1Xl3-xDF4ub97XjxOVk8Py8XtaiLSNA0TISWVWZVIRguMUZXP04ygBGOcVSLhvFA85QXmmClSccp4SpMMZYLnKJG5xGQMbgbdrudrJYUyIX6z7JxeM7ctLdPl347RTVnbTZnlNF5tHgXIICCc9d6pas9iVO7sKA_tKAc7InV9uHbP_N4_DkyHAdt3_1L8AviDopg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records</title><source>MEDLINE</source><source>Oxford University Press Journals All Titles (1996-Current)</source><source>Alma/SFX Local Collection</source><creator>Zhao, Sizheng Steven ; Hong, Chuan ; Cai, Tianrun ; Xu, Chang ; Huang, Jie ; Ermann, Joerg ; Goodson, Nicola J ; Solomon, Daniel H ; Cai, Tianxi ; Liao, Katherine P</creator><creatorcontrib>Zhao, Sizheng Steven ; Hong, Chuan ; Cai, Tianrun ; Xu, Chang ; Huang, Jie ; Ermann, Joerg ; Goodson, Nicola J ; Solomon, Daniel H ; Cai, Tianxi ; Liao, Katherine P</creatorcontrib><description>Abstract
Objectives
To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes.
Methods
An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms—on a training set of 127 axSpA cases and 423 non-cases—and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only.
Results
NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80–0.87).
Conclusion
Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research.</description><identifier>ISSN: 1462-0324</identifier><identifier>EISSN: 1462-0332</identifier><identifier>DOI: 10.1093/rheumatology/kez375</identifier><identifier>PMID: 31535693</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Aged ; Algorithms ; Area Under Curve ; Clinical Science ; Cohort Studies ; Electronic Health Records - statistics & numerical data ; Female ; Humans ; International Classification of Diseases ; Male ; Middle Aged ; Natural Language Processing ; Quality Improvement ; Sensitivity and Specificity ; Spondylarthritis - classification ; Spondylarthritis - epidemiology ; Spondylitis, Ankylosing - classification ; Spondylitis, Ankylosing - epidemiology</subject><ispartof>Rheumatology (Oxford, England), 2020-05, Vol.59 (5), p.1059-1065</ispartof><rights>The Author(s) 2019. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For permissions, please email: journals.permissions@oup.com 2019</rights><rights>The Author(s) 2019. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For permissions, please email: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-cdd5d7f2da59110f86473021117fc2bb9eb4b91b1ae3fb5ab452707cb802d8d13</citedby><cites>FETCH-LOGICAL-c444t-cdd5d7f2da59110f86473021117fc2bb9eb4b91b1ae3fb5ab452707cb802d8d13</cites><orcidid>0000-0002-3558-7353</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,1583,27923,27924</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31535693$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhao, Sizheng Steven</creatorcontrib><creatorcontrib>Hong, Chuan</creatorcontrib><creatorcontrib>Cai, Tianrun</creatorcontrib><creatorcontrib>Xu, Chang</creatorcontrib><creatorcontrib>Huang, Jie</creatorcontrib><creatorcontrib>Ermann, Joerg</creatorcontrib><creatorcontrib>Goodson, Nicola J</creatorcontrib><creatorcontrib>Solomon, Daniel H</creatorcontrib><creatorcontrib>Cai, Tianxi</creatorcontrib><creatorcontrib>Liao, Katherine P</creatorcontrib><title>Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records</title><title>Rheumatology (Oxford, England)</title><addtitle>Rheumatology (Oxford)</addtitle><description>Abstract
Objectives
To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes.
Methods
An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms—on a training set of 127 axSpA cases and 423 non-cases—and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only.
Results
NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80–0.87).
Conclusion
Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research.</description><subject>Aged</subject><subject>Algorithms</subject><subject>Area Under Curve</subject><subject>Clinical Science</subject><subject>Cohort Studies</subject><subject>Electronic Health Records - statistics & numerical data</subject><subject>Female</subject><subject>Humans</subject><subject>International Classification of Diseases</subject><subject>Male</subject><subject>Middle Aged</subject><subject>Natural Language Processing</subject><subject>Quality Improvement</subject><subject>Sensitivity and Specificity</subject><subject>Spondylarthritis - classification</subject><subject>Spondylarthritis - epidemiology</subject><subject>Spondylitis, Ankylosing - classification</subject><subject>Spondylitis, Ankylosing - epidemiology</subject><issn>1462-0324</issn><issn>1462-0332</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkctOwzAQRS0EolD4AiTkH2hrx3GTbJBQxaNSJTawjvxKYkjtyHYqyoJvxyVQlR2r8TzOHXkuAFcYTTEqyMw1ql-zYFtbb2dv6oNk9Aic4XSeTBAhyfH-naQjcO79K0KIYpKfghHBlNB5Qc7A59II6zrrWNCmhoaF3rEWtszUPasV7JwVyvtdL1io1zHfKChaFmuVFpGyBtoKsncdMd9ZI7etZS40TgftYf-NqlaJ4KzRAjaKtaGBTsW10l-Ak4q1Xl3-xDF4ub97XjxOVk8Py8XtaiLSNA0TISWVWZVIRguMUZXP04ygBGOcVSLhvFA85QXmmClSccp4SpMMZYLnKJG5xGQMbgbdrudrJYUyIX6z7JxeM7ctLdPl347RTVnbTZnlNF5tHgXIICCc9d6pas9iVO7sKA_tKAc7InV9uHbP_N4_DkyHAdt3_1L8AviDopg</recordid><startdate>20200501</startdate><enddate>20200501</enddate><creator>Zhao, Sizheng Steven</creator><creator>Hong, Chuan</creator><creator>Cai, Tianrun</creator><creator>Xu, Chang</creator><creator>Huang, Jie</creator><creator>Ermann, Joerg</creator><creator>Goodson, Nicola J</creator><creator>Solomon, Daniel H</creator><creator>Cai, Tianxi</creator><creator>Liao, Katherine P</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-3558-7353</orcidid></search><sort><creationdate>20200501</creationdate><title>Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records</title><author>Zhao, Sizheng Steven ; Hong, Chuan ; Cai, Tianrun ; Xu, Chang ; Huang, Jie ; Ermann, Joerg ; Goodson, Nicola J ; Solomon, Daniel H ; Cai, Tianxi ; Liao, Katherine P</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-cdd5d7f2da59110f86473021117fc2bb9eb4b91b1ae3fb5ab452707cb802d8d13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Aged</topic><topic>Algorithms</topic><topic>Area Under Curve</topic><topic>Clinical Science</topic><topic>Cohort Studies</topic><topic>Electronic Health Records - statistics & numerical data</topic><topic>Female</topic><topic>Humans</topic><topic>International Classification of Diseases</topic><topic>Male</topic><topic>Middle Aged</topic><topic>Natural Language Processing</topic><topic>Quality Improvement</topic><topic>Sensitivity and Specificity</topic><topic>Spondylarthritis - classification</topic><topic>Spondylarthritis - epidemiology</topic><topic>Spondylitis, Ankylosing - classification</topic><topic>Spondylitis, Ankylosing - epidemiology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Sizheng Steven</creatorcontrib><creatorcontrib>Hong, Chuan</creatorcontrib><creatorcontrib>Cai, Tianrun</creatorcontrib><creatorcontrib>Xu, Chang</creatorcontrib><creatorcontrib>Huang, Jie</creatorcontrib><creatorcontrib>Ermann, Joerg</creatorcontrib><creatorcontrib>Goodson, Nicola J</creatorcontrib><creatorcontrib>Solomon, Daniel H</creatorcontrib><creatorcontrib>Cai, Tianxi</creatorcontrib><creatorcontrib>Liao, Katherine P</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Rheumatology (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Sizheng Steven</au><au>Hong, Chuan</au><au>Cai, Tianrun</au><au>Xu, Chang</au><au>Huang, Jie</au><au>Ermann, Joerg</au><au>Goodson, Nicola J</au><au>Solomon, Daniel H</au><au>Cai, Tianxi</au><au>Liao, Katherine P</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records</atitle><jtitle>Rheumatology (Oxford, England)</jtitle><addtitle>Rheumatology (Oxford)</addtitle><date>2020-05-01</date><risdate>2020</risdate><volume>59</volume><issue>5</issue><spage>1059</spage><epage>1065</epage><pages>1059-1065</pages><issn>1462-0324</issn><eissn>1462-0332</eissn><abstract>Abstract
Objectives
To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes.
Methods
An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms—on a training set of 127 axSpA cases and 423 non-cases—and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only.
Results
NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80–0.87).
Conclusion
Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>31535693</pmid><doi>10.1093/rheumatology/kez375</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0002-3558-7353</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1462-0324 |
ispartof | Rheumatology (Oxford, England), 2020-05, Vol.59 (5), p.1059-1065 |
issn | 1462-0324 1462-0332 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7850056 |
source | MEDLINE; Oxford University Press Journals All Titles (1996-Current); Alma/SFX Local Collection |
subjects | Aged Algorithms Area Under Curve Clinical Science Cohort Studies Electronic Health Records - statistics & numerical data Female Humans International Classification of Diseases Male Middle Aged Natural Language Processing Quality Improvement Sensitivity and Specificity Spondylarthritis - classification Spondylarthritis - epidemiology Spondylitis, Ankylosing - classification Spondylitis, Ankylosing - epidemiology |
title | Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T03%3A15%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-oup_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Incorporating%20natural%20language%20processing%20to%20improve%20classification%20of%20axial%20spondyloarthritis%20using%20electronic%20health%20records&rft.jtitle=Rheumatology%20(Oxford,%20England)&rft.au=Zhao,%20Sizheng%20Steven&rft.date=2020-05-01&rft.volume=59&rft.issue=5&rft.spage=1059&rft.epage=1065&rft.pages=1059-1065&rft.issn=1462-0324&rft.eissn=1462-0332&rft_id=info:doi/10.1093/rheumatology/kez375&rft_dat=%3Coup_pubme%3E10.1093/rheumatology/kez375%3C/oup_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/31535693&rft_oup_id=10.1093/rheumatology/kez375&rfr_iscdi=true |