SNER: Semi-supervised Named Entity Recognition for Large Volume of Diabetes Data

The medical literature and records on diabetes provide crucial resources for diabetes prevention and treatment. However, extracting entities from these textual diabetes data is crucial but challenging. Named entity recognition (NER) - an important corner-stone technology of natural language processi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of biomedical and health informatics 2024-06, Vol.PP, p.1-14
Hauptverfasser: Zuo, Jingyi, Qian, Qijie, Liu, Yun, Lu, Shan, Li, Bo, Guo, Yongan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The medical literature and records on diabetes provide crucial resources for diabetes prevention and treatment. However, extracting entities from these textual diabetes data is crucial but challenging. Named entity recognition (NER) - an important corner-stone technology of natural language processing - has been studied well in the general medical field. However, there is still a lack of effective NER methods to handle diabetes data. Briefly, there are three challenges in the real world, including 1) the large volume of diabetes-related data to be processed, 2) the lack of labeled data, and 3) the high costs of manual labeling. To mitigate those challenges, this paper proposes a novel NER method based on semi-supervised learning, namely SNER, for diabetes data processing. It utilizes large amounts of unlabeled data to solve the problem of lack of labeled data. Specifically, it filters the predicted labels based on their confidence and uncertainty scores to reduce the noise entering the model and divide them into positive pseudo-labels and negative pseudo-labels. Also, it utilizes negative pseudo-labels reasonably to improve the training effect of pseudo-labels. Experiments on two public diabetes datasets show that SNER achieves the best performance compared with existing state-of-the-art models.
ISSN:2168-2194
2168-2208
2168-2208
DOI:10.1109/JBHI.2024.3412716