Defining a state-of-the-art POS-tagging environment for Brazilian Portuguese clinical texts
Purpose Natural language processing techniques are essential for unlocking patients’ data from electronic health records. An important NLP task is the ability to recognize morphosyntactic information from the texts, a process called part-of-speech (POS) tagging. Currently, neural network architectur...
Gespeichert in:
Veröffentlicht in: | Research on Biomedical Engineering 2020-09, Vol.36 (3), p.267-276 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Purpose
Natural language processing techniques are essential for unlocking patients’ data from electronic health records. An important NLP task is the ability to recognize morphosyntactic information from the texts, a process called part-of-speech (POS) tagging. Currently, neural network architectures are the state-of-the-art method, although there is a lack of studies exploiting this approach within Brazilian Portuguese clinical texts. The objective of this study is to define a state-of-the-art POS-tagging environment for Brazilian Portuguese clinical texts.
Methods
We reviewed multiple neural network-based POS-tagging algorithms, and the Flair tool was selected due to its exceptional performance in the journalistic domain, as there is any specific algorithm to Portuguese clinical texts. We executed a normalization process on available corpora from multiple domains (two journalistic, one biomedical, one clinical, and a new corpus composed of all three of these). The Flair algorithm was trained with all corpora, generating five models, which were evaluated with all domains.
Results
The clinical model achieved 92.39% accuracy (previous POS-tagging clinical work reached 91.5%); the biomedical model achieved 97.9% accuracy. All the models were assessed on their own test set.
Conclusion
We developed a new state-of-the-art modeling environment for POS tagging of Brazilian Portuguese clinical texts and achieved comparable results to other state-of-the-art studies in journalistic contexts. |
---|---|
ISSN: | 2446-4732 2446-4740 |
DOI: | 10.1007/s42600-020-00067-7 |