Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing
The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for furthe...
Gespeichert in:
Veröffentlicht in: | IEEE access 2019, Vol.7, p.89043-89050 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 89050 |
---|---|
container_issue | |
container_start_page | 89043 |
container_title | IEEE access |
container_volume | 7 |
creator | Chen, Pengyu Liu, Qiao Wei, Lan Zhao, Beier Jia, Yin Lv, Hairong Fei, Xiaolu |
description | The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for further automatic analysis due to the lack of structure and standardization. The goal of this paper is to assess the feasibility of applying natural language processing technology to automatically extract disease entities and relate information such as the stenosis rate and plaque location from free-text ultrasound reports of cerebrovascular diseases. A structured model using conditional random fields (CRFs) is first constructed. Then, the clause optimizing and segmentation process is performed on a word level to achieve data structuring. Seven categories of terms, including symptoms, plaque locations, diseases, and degree, in 1980 de-identified ultrasound reports were manually annotated as a training dataset. With this model, 7937 ultrasound reports were automatically processed to structure data within 40 min. The true positive rate of the model for each category of terms is 96%, 94%, 97%, 100%, 100%, 100%, and 97%, respectively. The CRF model can be used in Chinese natural language processing to provide support for unstructured data analysis. The standardized segmentation results can be obtained based on medical ontology libraries. However, real-time processing and scientific annotation remain a challenge if intelligent clinical decision making needs to be applied to a real-world clinical environment. |
doi_str_mv | 10.1109/ACCESS.2019.2923221 |
format | Article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2455613387</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8736947</ieee_id><doaj_id>oai_doaj_org_article_12689a31117c4554897f7b93a216a0ec</doaj_id><sourcerecordid>2455613387</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-a6578c2c69053c871568754b29aaf21be34aa18715a9c837621c983034ec80593</originalsourceid><addsrcrecordid>eNpNkVtP3DAQhSNUpCLKL-DFUp9360t8e1yl0CKtCmLLszVrJktWIV7GCRL_Hm-DUP0y1tGcb-w5VXUp-FII7n-smuZqs1lKLvxSeqmkFCfVmRTGL5RW5st_96_VRc57Xo4rkrZnFa2mMT3D2EXo-ze2GWmK40TdsGNpYM1TN2BG9tCPBDlNwyO7x0OikaWWNUi4pfQKOU49EPvZZYSMmb12wP5AoUDP1jDsJtghu6MUMecC_ladttBnvPio59XD9dXf5vdiffvrplmtF7HmblyA0dZFGY3nWkVnhTbO6norPUArxRZVDSCOOvjolDVSRO8UVzVGx7VX59XNzH1MsA8H6p6B3kKCLvwTEu0CUPl4j0FI4zwoIYSNtda187a1W6-gbA44xsL6PrMOlF4mzGPYp4mG8vwgi8EIpZwtXWruipRyJmw_pwoejlmFOatwzCp8ZFVcl7OrQ8RPR8EZX1v1DrSxj28</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455613387</pqid></control><display><type>article</type><title>Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Chen, Pengyu ; Liu, Qiao ; Wei, Lan ; Zhao, Beier ; Jia, Yin ; Lv, Hairong ; Fei, Xiaolu</creator><creatorcontrib>Chen, Pengyu ; Liu, Qiao ; Wei, Lan ; Zhao, Beier ; Jia, Yin ; Lv, Hairong ; Fei, Xiaolu</creatorcontrib><description>The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for further automatic analysis due to the lack of structure and standardization. The goal of this paper is to assess the feasibility of applying natural language processing technology to automatically extract disease entities and relate information such as the stenosis rate and plaque location from free-text ultrasound reports of cerebrovascular diseases. A structured model using conditional random fields (CRFs) is first constructed. Then, the clause optimizing and segmentation process is performed on a word level to achieve data structuring. Seven categories of terms, including symptoms, plaque locations, diseases, and degree, in 1980 de-identified ultrasound reports were manually annotated as a training dataset. With this model, 7937 ultrasound reports were automatically processed to structure data within 40 min. The true positive rate of the model for each category of terms is 96%, 94%, 97%, 100%, 100%, 100%, and 97%, respectively. The CRF model can be used in Chinese natural language processing to provide support for unstructured data analysis. The standardized segmentation results can be obtained based on medical ontology libraries. However, real-time processing and scientific annotation remain a challenge if intelligent clinical decision making needs to be applied to a real-world clinical environment.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2019.2923221</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Annotations ; Carotid arteries ; Conditional random fields ; conditional random fields (CRF) ; Data analysis ; Data models ; Decision making ; Diseases ; Hidden Markov models ; Natural language processing ; Natural language processing (NLP) ; Segmentation ; Sentences ; Signs and symptoms ; Standardization ; Training ; Ultrasonic imaging ; Ultrasound ; ultrasound report ; Unstructured data</subject><ispartof>IEEE access, 2019, Vol.7, p.89043-89050</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-a6578c2c69053c871568754b29aaf21be34aa18715a9c837621c983034ec80593</citedby><cites>FETCH-LOGICAL-c408t-a6578c2c69053c871568754b29aaf21be34aa18715a9c837621c983034ec80593</cites><orcidid>0000-0001-7498-0249</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8736947$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,777,781,861,2096,4010,27614,27904,27905,27906,54914</link.rule.ids></links><search><creatorcontrib>Chen, Pengyu</creatorcontrib><creatorcontrib>Liu, Qiao</creatorcontrib><creatorcontrib>Wei, Lan</creatorcontrib><creatorcontrib>Zhao, Beier</creatorcontrib><creatorcontrib>Jia, Yin</creatorcontrib><creatorcontrib>Lv, Hairong</creatorcontrib><creatorcontrib>Fei, Xiaolu</creatorcontrib><title>Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing</title><title>IEEE access</title><addtitle>Access</addtitle><description>The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for further automatic analysis due to the lack of structure and standardization. The goal of this paper is to assess the feasibility of applying natural language processing technology to automatically extract disease entities and relate information such as the stenosis rate and plaque location from free-text ultrasound reports of cerebrovascular diseases. A structured model using conditional random fields (CRFs) is first constructed. Then, the clause optimizing and segmentation process is performed on a word level to achieve data structuring. Seven categories of terms, including symptoms, plaque locations, diseases, and degree, in 1980 de-identified ultrasound reports were manually annotated as a training dataset. With this model, 7937 ultrasound reports were automatically processed to structure data within 40 min. The true positive rate of the model for each category of terms is 96%, 94%, 97%, 100%, 100%, 100%, and 97%, respectively. The CRF model can be used in Chinese natural language processing to provide support for unstructured data analysis. The standardized segmentation results can be obtained based on medical ontology libraries. However, real-time processing and scientific annotation remain a challenge if intelligent clinical decision making needs to be applied to a real-world clinical environment.</description><subject>Annotations</subject><subject>Carotid arteries</subject><subject>Conditional random fields</subject><subject>conditional random fields (CRF)</subject><subject>Data analysis</subject><subject>Data models</subject><subject>Decision making</subject><subject>Diseases</subject><subject>Hidden Markov models</subject><subject>Natural language processing</subject><subject>Natural language processing (NLP)</subject><subject>Segmentation</subject><subject>Sentences</subject><subject>Signs and symptoms</subject><subject>Standardization</subject><subject>Training</subject><subject>Ultrasonic imaging</subject><subject>Ultrasound</subject><subject>ultrasound report</subject><subject>Unstructured data</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkVtP3DAQhSNUpCLKL-DFUp9360t8e1yl0CKtCmLLszVrJktWIV7GCRL_Hm-DUP0y1tGcb-w5VXUp-FII7n-smuZqs1lKLvxSeqmkFCfVmRTGL5RW5st_96_VRc57Xo4rkrZnFa2mMT3D2EXo-ze2GWmK40TdsGNpYM1TN2BG9tCPBDlNwyO7x0OikaWWNUi4pfQKOU49EPvZZYSMmb12wP5AoUDP1jDsJtghu6MUMecC_ladttBnvPio59XD9dXf5vdiffvrplmtF7HmblyA0dZFGY3nWkVnhTbO6norPUArxRZVDSCOOvjolDVSRO8UVzVGx7VX59XNzH1MsA8H6p6B3kKCLvwTEu0CUPl4j0FI4zwoIYSNtda187a1W6-gbA44xsL6PrMOlF4mzGPYp4mG8vwgi8EIpZwtXWruipRyJmw_pwoejlmFOatwzCp8ZFVcl7OrQ8RPR8EZX1v1DrSxj28</recordid><startdate>2019</startdate><enddate>2019</enddate><creator>Chen, Pengyu</creator><creator>Liu, Qiao</creator><creator>Wei, Lan</creator><creator>Zhao, Beier</creator><creator>Jia, Yin</creator><creator>Lv, Hairong</creator><creator>Fei, Xiaolu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7498-0249</orcidid></search><sort><creationdate>2019</creationdate><title>Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing</title><author>Chen, Pengyu ; Liu, Qiao ; Wei, Lan ; Zhao, Beier ; Jia, Yin ; Lv, Hairong ; Fei, Xiaolu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-a6578c2c69053c871568754b29aaf21be34aa18715a9c837621c983034ec80593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Annotations</topic><topic>Carotid arteries</topic><topic>Conditional random fields</topic><topic>conditional random fields (CRF)</topic><topic>Data analysis</topic><topic>Data models</topic><topic>Decision making</topic><topic>Diseases</topic><topic>Hidden Markov models</topic><topic>Natural language processing</topic><topic>Natural language processing (NLP)</topic><topic>Segmentation</topic><topic>Sentences</topic><topic>Signs and symptoms</topic><topic>Standardization</topic><topic>Training</topic><topic>Ultrasonic imaging</topic><topic>Ultrasound</topic><topic>ultrasound report</topic><topic>Unstructured data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Pengyu</creatorcontrib><creatorcontrib>Liu, Qiao</creatorcontrib><creatorcontrib>Wei, Lan</creatorcontrib><creatorcontrib>Zhao, Beier</creatorcontrib><creatorcontrib>Jia, Yin</creatorcontrib><creatorcontrib>Lv, Hairong</creatorcontrib><creatorcontrib>Fei, Xiaolu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Pengyu</au><au>Liu, Qiao</au><au>Wei, Lan</au><au>Zhao, Beier</au><au>Jia, Yin</au><au>Lv, Hairong</au><au>Fei, Xiaolu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2019</date><risdate>2019</risdate><volume>7</volume><spage>89043</spage><epage>89050</epage><pages>89043-89050</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for further automatic analysis due to the lack of structure and standardization. The goal of this paper is to assess the feasibility of applying natural language processing technology to automatically extract disease entities and relate information such as the stenosis rate and plaque location from free-text ultrasound reports of cerebrovascular diseases. A structured model using conditional random fields (CRFs) is first constructed. Then, the clause optimizing and segmentation process is performed on a word level to achieve data structuring. Seven categories of terms, including symptoms, plaque locations, diseases, and degree, in 1980 de-identified ultrasound reports were manually annotated as a training dataset. With this model, 7937 ultrasound reports were automatically processed to structure data within 40 min. The true positive rate of the model for each category of terms is 96%, 94%, 97%, 100%, 100%, 100%, and 97%, respectively. The CRF model can be used in Chinese natural language processing to provide support for unstructured data analysis. The standardized segmentation results can be obtained based on medical ontology libraries. However, real-time processing and scientific annotation remain a challenge if intelligent clinical decision making needs to be applied to a real-world clinical environment.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2019.2923221</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0001-7498-0249</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2019, Vol.7, p.89043-89050 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_2455613387 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Annotations Carotid arteries Conditional random fields conditional random fields (CRF) Data analysis Data models Decision making Diseases Hidden Markov models Natural language processing Natural language processing (NLP) Segmentation Sentences Signs and symptoms Standardization Training Ultrasonic imaging Ultrasound ultrasound report Unstructured data |
title | Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T18%3A42%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatically%20Structuring%20on%20Chinese%20Ultrasound%20Report%20of%20Cerebrovascular%20Diseases%20via%20Natural%20Language%20Processing&rft.jtitle=IEEE%20access&rft.au=Chen,%20Pengyu&rft.date=2019&rft.volume=7&rft.spage=89043&rft.epage=89050&rft.pages=89043-89050&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2019.2923221&rft_dat=%3Cproquest_doaj_%3E2455613387%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2455613387&rft_id=info:pmid/&rft_ieee_id=8736947&rft_doaj_id=oai_doaj_org_article_12689a31117c4554897f7b93a216a0ec&rfr_iscdi=true |