De-identification of medical records using conditional random fields and long short-term memory networks

[Display omitted] •The described LSTM model attains F1 measure of 0.8986 in CEGS N-GRID 2016 Shared Task.•The LSTM-based model attains higher F1 measure than the CRF-based model.•Accurate sentence detection and tokenization can significantly improve the performance. The CEGS N-GRID 2016 Shared Task...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of biomedical informatics 2017-11, Vol.75, p.S43-S53
Hauptverfasser:	Jiang, Zhipeng, Zhao, Chao, He, Bin, Guan, Yi, Jiang, Jingchi
Format:	Artikel
Sprache:	eng
Schlagworte:	Conditional random fields De-identification Long short-term memory networks Protected health information
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	[Display omitted] •The described LSTM model attains F1 measure of 0.8986 in CEGS N-GRID 2016 Shared Task.•The LSTM-based model attains higher F1 measure than the CRF-based model.•Accurate sentence detection and tokenization can significantly improve the performance. The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processing focuses on the de-identification of psychiatric evaluation records. This paper describes two participating systems of our team, based on conditional random fields (CRFs) and long short-term memory networks (LSTMs). A pre-processing module was introduced for sentence detection and tokenization before de-identification. For CRFs, manually extracted rich features were utilized to train the model. For LSTMs, a character-level bi-directional LSTM network was applied to represent tokens and classify tags for each token, following which a decoding layer was stacked to decode the most probable protected health information (PHI) terms. The LSTM-based system attained an i2b2 strict micro-F1 measure of 0.8986, which was higher than that of the CRF-based system.
ISSN:	1532-0464 1532-0480
DOI:	10.1016/j.jbi.2017.10.003