Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2

Summary Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semi-structured sleep study reports. The system showed to work more efficiently than the traditiona...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied clinical informatics 2015-01, Vol.6 (2), p.345-363
Hauptverfasser:	Chen, W., Kowatch, R., Lin, S., Splaingard, M., Huang, Y.
Format:	Artikel
Sprache:	eng
Schlagworte:	Biological Ontologies Cohort Studies Data Mining Humans Medical Informatics - methods Natural Language Processing Research Article Terminology as Topic User-Computer Interface
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	363
container_issue	2
container_start_page	345
container_title	Applied clinical informatics
container_volume	6
creator	Chen, W. Kowatch, R. Lin, S. Splaingard, M. Huang, Y.
description	Summary Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semi-structured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. Objective: We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. Methods: We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. Results: 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Conclusion: Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use. Citation: Chen W, Kowatch R, Lin S, Splaingard M, Huang Y. Interactive cohort identification of sleep disorder patients using natural language processing and i2b2. Appl Clin Inf 2015; 6: 345–363 http://dx.doi.org/10.4338/ACI-2014-11-RA-0106
doi_str_mv	10.4338/ACI-2014-11-RA-0106
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4493335</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1696682756</sourcerecordid><originalsourceid>FETCH-LOGICAL-c448t-3dc5f2e291c7638c6364dc08889f0722d1080c9747332d9f70329bd2823d06d43</originalsourceid><addsrcrecordid>eNp9kV1vFCEYhYnR2Kb2F5gYLr0Z5WuBuTHZrFU32WhT7TVh4Z1ZmllYgWniv5d1a1Nv5AaS87yHAweh15S8E5zr98vVumOEio7S7mbZEUrkM3ROtew7wpl6_uR8hi5LuSNtLSTVWr1EZ0xSRYkm52hcxwrZuhruAa_SLuWK1x5iDUNwtoYUcRrw9wnggD-GkrKHjK-b0JCCb0uII_5q65zthDc2jrMdAV_n5KD80Wz0OLAte4VeDHYqcPmwX6DbT1c_Vl-6zbfP69Vy0zkhdO24d4uBAeupU5JrJ7kU3hGtdT8QxZg_pna9Eopz5vtBtQf2W880455IL_gF-nDyPczbPXjXYrZo5pDD3uZfJtlg_lVi2Jkx3Rshes75ohm8fTDI6ecMpZp9KA6myUZIczFU9lJqphayofyEupxKyTA8XkOJObZkWkvm2JKh1NwszbGlNvXmacLHmb-dNICdgLoLsAdzl-Yc25_91_U36_SdhQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1696682756</pqid></control><display><type>article</type><title>Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2</title><source>MEDLINE</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Chen, W. ; Kowatch, R. ; Lin, S. ; Splaingard, M. ; Huang, Y.</creator><creatorcontrib>Chen, W. ; Kowatch, R. ; Lin, S. ; Splaingard, M. ; Huang, Y.</creatorcontrib><description>Summary Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semi-structured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. Objective: We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. Methods: We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. Results: 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Conclusion: Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use. Citation: Chen W, Kowatch R, Lin S, Splaingard M, Huang Y. Interactive cohort identification of sleep disorder patients using natural language processing and i2b2. Appl Clin Inf 2015; 6: 345–363 http://dx.doi.org/10.4338/ACI-2014-11-RA-0106</description><identifier>ISSN: 1869-0327</identifier><identifier>EISSN: 1869-0327</identifier><identifier>DOI: 10.4338/ACI-2014-11-RA-0106</identifier><identifier>PMID: 26171080</identifier><language>eng</language><publisher>Germany: Schattauer GmbH</publisher><subject>Biological Ontologies ; Cohort Studies ; Data Mining ; Humans ; Medical Informatics - methods ; Natural Language Processing ; Research Article ; Terminology as Topic ; User-Computer Interface</subject><ispartof>Applied clinical informatics, 2015-01, Vol.6 (2), p.345-363</ispartof><rights>Copyright Schattauer 2015 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c448t-3dc5f2e291c7638c6364dc08889f0722d1080c9747332d9f70329bd2823d06d43</citedby><cites>FETCH-LOGICAL-c448t-3dc5f2e291c7638c6364dc08889f0722d1080c9747332d9f70329bd2823d06d43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4493335/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4493335/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,315,728,781,785,886,27928,27929,53795,53797</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26171080$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, W.</creatorcontrib><creatorcontrib>Kowatch, R.</creatorcontrib><creatorcontrib>Lin, S.</creatorcontrib><creatorcontrib>Splaingard, M.</creatorcontrib><creatorcontrib>Huang, Y.</creatorcontrib><title>Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2</title><title>Applied clinical informatics</title><addtitle>Appl Clin Inform</addtitle><description>Summary Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semi-structured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. Objective: We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. Methods: We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. Results: 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Conclusion: Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use. Citation: Chen W, Kowatch R, Lin S, Splaingard M, Huang Y. Interactive cohort identification of sleep disorder patients using natural language processing and i2b2. Appl Clin Inf 2015; 6: 345–363 http://dx.doi.org/10.4338/ACI-2014-11-RA-0106</description><subject>Biological Ontologies</subject><subject>Cohort Studies</subject><subject>Data Mining</subject><subject>Humans</subject><subject>Medical Informatics - methods</subject><subject>Natural Language Processing</subject><subject>Research Article</subject><subject>Terminology as Topic</subject><subject>User-Computer Interface</subject><issn>1869-0327</issn><issn>1869-0327</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kV1vFCEYhYnR2Kb2F5gYLr0Z5WuBuTHZrFU32WhT7TVh4Z1ZmllYgWniv5d1a1Nv5AaS87yHAweh15S8E5zr98vVumOEio7S7mbZEUrkM3ROtew7wpl6_uR8hi5LuSNtLSTVWr1EZ0xSRYkm52hcxwrZuhruAa_SLuWK1x5iDUNwtoYUcRrw9wnggD-GkrKHjK-b0JCCb0uII_5q65zthDc2jrMdAV_n5KD80Wz0OLAte4VeDHYqcPmwX6DbT1c_Vl-6zbfP69Vy0zkhdO24d4uBAeupU5JrJ7kU3hGtdT8QxZg_pna9Eopz5vtBtQf2W880455IL_gF-nDyPczbPXjXYrZo5pDD3uZfJtlg_lVi2Jkx3Rshes75ohm8fTDI6ecMpZp9KA6myUZIczFU9lJqphayofyEupxKyTA8XkOJObZkWkvm2JKh1NwszbGlNvXmacLHmb-dNICdgLoLsAdzl-Yc25_91_U36_SdhQ</recordid><startdate>20150101</startdate><enddate>20150101</enddate><creator>Chen, W.</creator><creator>Kowatch, R.</creator><creator>Lin, S.</creator><creator>Splaingard, M.</creator><creator>Huang, Y.</creator><general>Schattauer GmbH</general><general>Schattauer</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20150101</creationdate><title>Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2</title><author>Chen, W. ; Kowatch, R. ; Lin, S. ; Splaingard, M. ; Huang, Y.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c448t-3dc5f2e291c7638c6364dc08889f0722d1080c9747332d9f70329bd2823d06d43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Biological Ontologies</topic><topic>Cohort Studies</topic><topic>Data Mining</topic><topic>Humans</topic><topic>Medical Informatics - methods</topic><topic>Natural Language Processing</topic><topic>Research Article</topic><topic>Terminology as Topic</topic><topic>User-Computer Interface</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, W.</creatorcontrib><creatorcontrib>Kowatch, R.</creatorcontrib><creatorcontrib>Lin, S.</creatorcontrib><creatorcontrib>Splaingard, M.</creatorcontrib><creatorcontrib>Huang, Y.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Applied clinical informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, W.</au><au>Kowatch, R.</au><au>Lin, S.</au><au>Splaingard, M.</au><au>Huang, Y.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2</atitle><jtitle>Applied clinical informatics</jtitle><addtitle>Appl Clin Inform</addtitle><date>2015-01-01</date><risdate>2015</risdate><volume>6</volume><issue>2</issue><spage>345</spage><epage>363</epage><pages>345-363</pages><issn>1869-0327</issn><eissn>1869-0327</eissn><abstract>Summary Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semi-structured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. Objective: We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. Methods: We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. Results: 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Conclusion: Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use. Citation: Chen W, Kowatch R, Lin S, Splaingard M, Huang Y. Interactive cohort identification of sleep disorder patients using natural language processing and i2b2. Appl Clin Inf 2015; 6: 345–363 http://dx.doi.org/10.4338/ACI-2014-11-RA-0106</abstract><cop>Germany</cop><pub>Schattauer GmbH</pub><pmid>26171080</pmid><doi>10.4338/ACI-2014-11-RA-0106</doi><tpages>19</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1869-0327
ispartof	Applied clinical informatics, 2015-01, Vol.6 (2), p.345-363
issn	1869-0327 1869-0327
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4493335
source	MEDLINE; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects	Biological Ontologies Cohort Studies Data Mining Humans Medical Informatics - methods Natural Language Processing Research Article Terminology as Topic User-Computer Interface
title	Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T14%3A12%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Interactive%20Cohort%20Identification%20of%20Sleep%20Disorder%20Patients%20Using%20Natural%20Language%20Processing%20and%20i2b2&rft.jtitle=Applied%20clinical%20informatics&rft.au=Chen,%20W.&rft.date=2015-01-01&rft.volume=6&rft.issue=2&rft.spage=345&rft.epage=363&rft.pages=345-363&rft.issn=1869-0327&rft.eissn=1869-0327&rft_id=info:doi/10.4338/ACI-2014-11-RA-0106&rft_dat=%3Cproquest_pubme%3E1696682756%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1696682756&rft_id=info:pmid/26171080&rfr_iscdi=true