Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing

The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for furthe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2019, Vol.7, p.89043-89050
Hauptverfasser:	Chen, Pengyu, Liu, Qiao, Wei, Lan, Zhao, Beier, Jia, Yin, Lv, Hairong, Fei, Xiaolu
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Carotid arteries Conditional random fields conditional random fields (CRF) Data analysis Data models Decision making Diseases Hidden Markov models Natural language processing Natural language processing (NLP) Segmentation Sentences Signs and symptoms Standardization Training Ultrasonic imaging Ultrasound ultrasound report Unstructured data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The current ultrasound reports in Chinese hospitals are mostly written in free-text format. Important clinical information, such as stenosis rate and plaque location, is recorded in long sentences, especially for ultrasound reports of cerebrovascular diseases. They cannot be directly used for further automatic analysis due to the lack of structure and standardization. The goal of this paper is to assess the feasibility of applying natural language processing technology to automatically extract disease entities and relate information such as the stenosis rate and plaque location from free-text ultrasound reports of cerebrovascular diseases. A structured model using conditional random fields (CRFs) is first constructed. Then, the clause optimizing and segmentation process is performed on a word level to achieve data structuring. Seven categories of terms, including symptoms, plaque locations, diseases, and degree, in 1980 de-identified ultrasound reports were manually annotated as a training dataset. With this model, 7937 ultrasound reports were automatically processed to structure data within 40 min. The true positive rate of the model for each category of terms is 96%, 94%, 97%, 100%, 100%, 100%, and 97%, respectively. The CRF model can be used in Chinese natural language processing to provide support for unstructured data analysis. The standardized segmentation results can be obtained based on medical ontology libraries. However, real-time processing and scientific annotation remain a challenge if intelligent clinical decision making needs to be applied to a real-world clinical environment.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2019.2923221