Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing

The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for furthe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2019, Vol.7, p.89043-89050
Hauptverfasser: Chen, Pengyu, Liu, Qiao, Wei, Lan, Zhao, Beier, Jia, Yin, Lv, Hairong, Fei, Xiaolu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 89050
container_issue
container_start_page 89043
container_title IEEE access
container_volume 7
creator Chen, Pengyu
Liu, Qiao
Wei, Lan
Zhao, Beier
Jia, Yin
Lv, Hairong
Fei, Xiaolu
description The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for further automatic analysis due to the lack of structure and standardization. The goal of this paper is to assess the feasibility of applying natural language processing technology to automatically extract disease entities and relate information such as the stenosis rate and plaque location from free-text ultrasound reports of cerebrovascular diseases. A structured model using conditional random fields (CRFs) is first constructed. Then, the clause optimizing and segmentation process is performed on a word level to achieve data structuring. Seven categories of terms, including symptoms, plaque locations, diseases, and degree, in 1980 de-identified ultrasound reports were manually annotated as a training dataset. With this model, 7937 ultrasound reports were automatically processed to structure data within 40 min. The true positive rate of the model for each category of terms is 96%, 94%, 97%, 100%, 100%, 100%, and 97%, respectively. The CRF model can be used in Chinese natural language processing to provide support for unstructured data analysis. The standardized segmentation results can be obtained based on medical ontology libraries. However, real-time processing and scientific annotation remain a challenge if intelligent clinical decision making needs to be applied to a real-world clinical environment.
doi_str_mv 10.1109/ACCESS.2019.2923221
format Article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2455613387</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8736947</ieee_id><doaj_id>oai_doaj_org_article_12689a31117c4554897f7b93a216a0ec</doaj_id><sourcerecordid>2455613387</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-a6578c2c69053c871568754b29aaf21be34aa18715a9c837621c983034ec80593</originalsourceid><addsrcrecordid>eNpNkVtP3DAQhSNUpCLKL-DFUp9360t8e1yl0CKtCmLLszVrJktWIV7GCRL_Hm-DUP0y1tGcb-w5VXUp-FII7n-smuZqs1lKLvxSeqmkFCfVmRTGL5RW5st_96_VRc57Xo4rkrZnFa2mMT3D2EXo-ze2GWmK40TdsGNpYM1TN2BG9tCPBDlNwyO7x0OikaWWNUi4pfQKOU49EPvZZYSMmb12wP5AoUDP1jDsJtghu6MUMecC_ladttBnvPio59XD9dXf5vdiffvrplmtF7HmblyA0dZFGY3nWkVnhTbO6norPUArxRZVDSCOOvjolDVSRO8UVzVGx7VX59XNzH1MsA8H6p6B3kKCLvwTEu0CUPl4j0FI4zwoIYSNtda187a1W6-gbA44xsL6PrMOlF4mzGPYp4mG8vwgi8EIpZwtXWruipRyJmw_pwoejlmFOatwzCp8ZFVcl7OrQ8RPR8EZX1v1DrSxj28</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455613387</pqid></control><display><type>article</type><title>Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Chen, Pengyu ; Liu, Qiao ; Wei, Lan ; Zhao, Beier ; Jia, Yin ; Lv, Hairong ; Fei, Xiaolu</creator><creatorcontrib>Chen, Pengyu ; Liu, Qiao ; Wei, Lan ; Zhao, Beier ; Jia, Yin ; Lv, Hairong ; Fei, Xiaolu</creatorcontrib><description>The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for further automatic analysis due to the lack of structure and standardization. The goal of this paper is to assess the feasibility of applying natural language processing technology to automatically extract disease entities and relate information such as the stenosis rate and plaque location from free-text ultrasound reports of cerebrovascular diseases. A structured model using conditional random fields (CRFs) is first constructed. Then, the clause optimizing and segmentation process is performed on a word level to achieve data structuring. Seven categories of terms, including symptoms, plaque locations, diseases, and degree, in 1980 de-identified ultrasound reports were manually annotated as a training dataset. With this model, 7937 ultrasound reports were automatically processed to structure data within 40 min. The true positive rate of the model for each category of terms is 96%, 94%, 97%, 100%, 100%, 100%, and 97%, respectively. The CRF model can be used in Chinese natural language processing to provide support for unstructured data analysis. The standardized segmentation results can be obtained based on medical ontology libraries. However, real-time processing and scientific annotation remain a challenge if intelligent clinical decision making needs to be applied to a real-world clinical environment.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2019.2923221</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Annotations ; Carotid arteries ; Conditional random fields ; conditional random fields (CRF) ; Data analysis ; Data models ; Decision making ; Diseases ; Hidden Markov models ; Natural language processing ; Natural language processing (NLP) ; Segmentation ; Sentences ; Signs and symptoms ; Standardization ; Training ; Ultrasonic imaging ; Ultrasound ; ultrasound report ; Unstructured data</subject><ispartof>IEEE access, 2019, Vol.7, p.89043-89050</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-a6578c2c69053c871568754b29aaf21be34aa18715a9c837621c983034ec80593</citedby><cites>FETCH-LOGICAL-c408t-a6578c2c69053c871568754b29aaf21be34aa18715a9c837621c983034ec80593</cites><orcidid>0000-0001-7498-0249</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8736947$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,777,781,861,2096,4010,27614,27904,27905,27906,54914</link.rule.ids></links><search><creatorcontrib>Chen, Pengyu</creatorcontrib><creatorcontrib>Liu, Qiao</creatorcontrib><creatorcontrib>Wei, Lan</creatorcontrib><creatorcontrib>Zhao, Beier</creatorcontrib><creatorcontrib>Jia, Yin</creatorcontrib><creatorcontrib>Lv, Hairong</creatorcontrib><creatorcontrib>Fei, Xiaolu</creatorcontrib><title>Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing</title><title>IEEE access</title><addtitle>Access</addtitle><description>The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for further automatic analysis due to the lack of structure and standardization. The goal of this paper is to assess the feasibility of applying natural language processing technology to automatically extract disease entities and relate information such as the stenosis rate and plaque location from free-text ultrasound reports of cerebrovascular diseases. A structured model using conditional random fields (CRFs) is first constructed. Then, the clause optimizing and segmentation process is performed on a word level to achieve data structuring. Seven categories of terms, including symptoms, plaque locations, diseases, and degree, in 1980 de-identified ultrasound reports were manually annotated as a training dataset. With this model, 7937 ultrasound reports were automatically processed to structure data within 40 min. The true positive rate of the model for each category of terms is 96%, 94%, 97%, 100%, 100%, 100%, and 97%, respectively. The CRF model can be used in Chinese natural language processing to provide support for unstructured data analysis. The standardized segmentation results can be obtained based on medical ontology libraries. However, real-time processing and scientific annotation remain a challenge if intelligent clinical decision making needs to be applied to a real-world clinical environment.</description><subject>Annotations</subject><subject>Carotid arteries</subject><subject>Conditional random fields</subject><subject>conditional random fields (CRF)</subject><subject>Data analysis</subject><subject>Data models</subject><subject>Decision making</subject><subject>Diseases</subject><subject>Hidden Markov models</subject><subject>Natural language processing</subject><subject>Natural language processing (NLP)</subject><subject>Segmentation</subject><subject>Sentences</subject><subject>Signs and symptoms</subject><subject>Standardization</subject><subject>Training</subject><subject>Ultrasonic imaging</subject><subject>Ultrasound</subject><subject>ultrasound report</subject><subject>Unstructured data</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkVtP3DAQhSNUpCLKL-DFUp9360t8e1yl0CKtCmLLszVrJktWIV7GCRL_Hm-DUP0y1tGcb-w5VXUp-FII7n-smuZqs1lKLvxSeqmkFCfVmRTGL5RW5st_96_VRc57Xo4rkrZnFa2mMT3D2EXo-ze2GWmK40TdsGNpYM1TN2BG9tCPBDlNwyO7x0OikaWWNUi4pfQKOU49EPvZZYSMmb12wP5AoUDP1jDsJtghu6MUMecC_ladttBnvPio59XD9dXf5vdiffvrplmtF7HmblyA0dZFGY3nWkVnhTbO6norPUArxRZVDSCOOvjolDVSRO8UVzVGx7VX59XNzH1MsA8H6p6B3kKCLvwTEu0CUPl4j0FI4zwoIYSNtda187a1W6-gbA44xsL6PrMOlF4mzGPYp4mG8vwgi8EIpZwtXWruipRyJmw_pwoejlmFOatwzCp8ZFVcl7OrQ8RPR8EZX1v1DrSxj28</recordid><startdate>2019</startdate><enddate>2019</enddate><creator>Chen, Pengyu</creator><creator>Liu, Qiao</creator><creator>Wei, Lan</creator><creator>Zhao, Beier</creator><creator>Jia, Yin</creator><creator>Lv, Hairong</creator><creator>Fei, Xiaolu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7498-0249</orcidid></search><sort><creationdate>2019</creationdate><title>Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing</title><author>Chen, Pengyu ; Liu, Qiao ; Wei, Lan ; Zhao, Beier ; Jia, Yin ; Lv, Hairong ; Fei, Xiaolu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-a6578c2c69053c871568754b29aaf21be34aa18715a9c837621c983034ec80593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Annotations</topic><topic>Carotid arteries</topic><topic>Conditional random fields</topic><topic>conditional random fields (CRF)</topic><topic>Data analysis</topic><topic>Data models</topic><topic>Decision making</topic><topic>Diseases</topic><topic>Hidden Markov models</topic><topic>Natural language processing</topic><topic>Natural language processing (NLP)</topic><topic>Segmentation</topic><topic>Sentences</topic><topic>Signs and symptoms</topic><topic>Standardization</topic><topic>Training</topic><topic>Ultrasonic imaging</topic><topic>Ultrasound</topic><topic>ultrasound report</topic><topic>Unstructured data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Pengyu</creatorcontrib><creatorcontrib>Liu, Qiao</creatorcontrib><creatorcontrib>Wei, Lan</creatorcontrib><creatorcontrib>Zhao, Beier</creatorcontrib><creatorcontrib>Jia, Yin</creatorcontrib><creatorcontrib>Lv, Hairong</creatorcontrib><creatorcontrib>Fei, Xiaolu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Pengyu</au><au>Liu, Qiao</au><au>Wei, Lan</au><au>Zhao, Beier</au><au>Jia, Yin</au><au>Lv, Hairong</au><au>Fei, Xiaolu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2019</date><risdate>2019</risdate><volume>7</volume><spage>89043</spage><epage>89050</epage><pages>89043-89050</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for further automatic analysis due to the lack of structure and standardization. The goal of this paper is to assess the feasibility of applying natural language processing technology to automatically extract disease entities and relate information such as the stenosis rate and plaque location from free-text ultrasound reports of cerebrovascular diseases. A structured model using conditional random fields (CRFs) is first constructed. Then, the clause optimizing and segmentation process is performed on a word level to achieve data structuring. Seven categories of terms, including symptoms, plaque locations, diseases, and degree, in 1980 de-identified ultrasound reports were manually annotated as a training dataset. With this model, 7937 ultrasound reports were automatically processed to structure data within 40 min. The true positive rate of the model for each category of terms is 96%, 94%, 97%, 100%, 100%, 100%, and 97%, respectively. The CRF model can be used in Chinese natural language processing to provide support for unstructured data analysis. The standardized segmentation results can be obtained based on medical ontology libraries. However, real-time processing and scientific annotation remain a challenge if intelligent clinical decision making needs to be applied to a real-world clinical environment.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2019.2923221</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0001-7498-0249</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2019, Vol.7, p.89043-89050
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2455613387
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Annotations
Carotid arteries
Conditional random fields
conditional random fields (CRF)
Data analysis
Data models
Decision making
Diseases
Hidden Markov models
Natural language processing
Natural language processing (NLP)
Segmentation
Sentences
Signs and symptoms
Standardization
Training
Ultrasonic imaging
Ultrasound
ultrasound report
Unstructured data
title Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T18%3A42%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatically%20Structuring%20on%20Chinese%20Ultrasound%20Report%20of%20Cerebrovascular%20Diseases%20via%20Natural%20Language%20Processing&rft.jtitle=IEEE%20access&rft.au=Chen,%20Pengyu&rft.date=2019&rft.volume=7&rft.spage=89043&rft.epage=89050&rft.pages=89043-89050&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2019.2923221&rft_dat=%3Cproquest_doaj_%3E2455613387%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2455613387&rft_id=info:pmid/&rft_ieee_id=8736947&rft_doaj_id=oai_doaj_org_article_12689a31117c4554897f7b93a216a0ec&rfr_iscdi=true