Comparison of named entity recognition methodologies in biomedical documents

Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by usi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Biomedical engineering online 2018-11, Vol.17 (Suppl 2), p.158-158, Article 158
Hauptverfasser:	Song, Hye-Jeong, Jo, Byeong-Cheol, Park, Chan-Young, Kim, Jong-Dae, Kim, Yu-Seop
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial intelligence Artificial neural networks Base sequence Biomedical named entity recognition (Bio NER) Biomedical Research Conditional random fields (CRFs) Construction costs Data Mining - methods Deoxyribonucleic acid Dictionaries DNA Documentation Embedding Language Learning Learning algorithms Linguistics Machine learning Natural language processing Neural networks Neural Networks, Computer Performance evaluation Proteins Recognition Recurrent neural network (RNN) Ribonucleic acid RNA Teaching methods Word embedding
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task. Experiments are conducted on a training and evaluation set provided by the task organizers. Our results show that, compared with a baseline having a 70.09% F1 score, the RNN Jordan- and Elman-type algorithms have F1 scores of approximately 60.53% and 58.80%, respectively. When we use CRF as a machine learning algorithm, CCA, GloVe, and Word2Vec have F1 scores of 72.73%, 72.74%, and 72.82%, respectively. By using the word embedding constructed through the unsupervised learning, the time and cost required to construct the learning data can be saved.
ISSN:	1475-925X 1475-925X
DOI:	10.1186/s12938-018-0573-6