Chinese clinical named entity recognition with variant neural structures based on BERT methods

[Display omitted] •Obtain a pre-trained BERT model of Chinese clinical records, public and available for community.•Incorporate dictionary features and radical features into deep learning model, BERT + BiLSTM + CRF.•Outperform all other methods on CCKS-2017 and CCKS-2018 clinical named entity recogn...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of biomedical informatics 2020-07, Vol.107, p.103422-103422, Article 103422
Hauptverfasser:	Li, Xiangyang, Zhang, Huan, Zhou, Xiao-Hua
Format:	Artikel
Sprache:	eng
Schlagworte:	BERT Clinical named entity recognition Computer Science Computer Science, Interdisciplinary Applications CRF Life Sciences & Biomedicine LSTM Medical Informatics Science & Technology Technology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	[Display omitted] •Obtain a pre-trained BERT model of Chinese clinical records, public and available for community.•Incorporate dictionary features and radical features into deep learning model, BERT + BiLSTM + CRF.•Outperform all other methods on CCKS-2017 and CCKS-2018 clinical named entity recognition datasets. Clinical Named Entity Recognition (CNER) is a critical task which aims to identify and classify clinical terms in electronic medical records. In recent years, deep neural networks have achieved significant success in CNER. However, these methods require high-quality and large-scale labeled clinical data, which is challenging and expensive to obtain, especially data on Chinese clinical records. To tackle the Chinese CNER task, we pre-train BERT model on the unlabeled Chinese clinical records, which can leverage the unlabeled domain-specific knowledge. Different layers such as Long Short-Term Memory (LSTM) and Conditional Random Field (CRF) are used to extract the text features and decode the predicted tags respectively. In addition, we propose a new strategy to incorporate dictionary features into the model. Radical features of Chinese characters are used to improve the model performance as well. To the best of our knowledge, our ensemble model outperforms the state of the art models which achieves 89.56% strict F1 score on the CCKS-2018 dataset and 91.60% F1 score on CCKS-2017 dataset.
ISSN:	1532-0464 1532-0480
DOI:	10.1016/j.jbi.2020.103422