SNER: Semi-supervised Named Entity Recognition for Large Volume of Diabetes Data

The medical literature and records on diabetes provide crucial resources for diabetes prevention and treatment. However, extracting entities from these textual diabetes data is crucial but challenging. Named entity recognition (NER) - an important corner-stone technology of natural language processi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of biomedical and health informatics 2024-06, Vol.PP, p.1-14
Hauptverfasser:	Zuo, Jingyi, Qian, Qijie, Liu, Yun, Lu, Shan, Li, Bo, Guo, Yongan
Format:	Artikel
Sprache:	eng
Schlagworte:	Bioinformatics Diabetes Dictionaries Logic gates Long short term memory medical text named entity recognition pseudo-label semi-supervised learning Task analysis Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The medical literature and records on diabetes provide crucial resources for diabetes prevention and treatment. However, extracting entities from these textual diabetes data is crucial but challenging. Named entity recognition (NER) - an important corner-stone technology of natural language processing - has been studied well in the general medical field. However, there is still a lack of effective NER methods to handle diabetes data. Briefly, there are three challenges in the real world, including 1) the large volume of diabetes-related data to be processed, 2) the lack of labeled data, and 3) the high costs of manual labeling. To mitigate those challenges, this paper proposes a novel NER method based on semi-supervised learning, namely SNER, for diabetes data processing. It utilizes large amounts of unlabeled data to solve the problem of lack of labeled data. Specifically, it filters the predicted labels based on their confidence and uncertainty scores to reduce the noise entering the model and divide them into positive pseudo-labels and negative pseudo-labels. Also, it utilizes negative pseudo-labels reasonably to improve the training effect of pseudo-labels. Experiments on two public diabetes datasets show that SNER achieves the best performance compared with existing state-of-the-art models.
ISSN:	2168-2194 2168-2208 2168-2208
DOI:	10.1109/JBHI.2024.3412716