ArRaNER: A novel named entity recognition model for biomedical literature documents

Developments in advanced innovations have prompted the generation of an immense amount of digital information. The data deluge contains hidden information that is difficult to extract. In the biomedical domain, the development of technology has caused the production of voluminous data. Processing th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2022, Vol.78 (14), p.16498-16511
Hauptverfasser:	Ramachandran, R., Arutchelvan, K.
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Compilers Computer architecture Computer Science Data mining Datasets Documents Domains Encyclopedias Interpreters Machine learning Machine learning in Intelligent Autonomous Systems Model accuracy Natural language processing Pharmaceuticals Processor Architectures Programming Languages Recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Developments in advanced innovations have prompted the generation of an immense amount of digital information. The data deluge contains hidden information that is difficult to extract. In the biomedical domain, the development of technology has caused the production of voluminous data. Processing these voluminous textual data is referred to as ‘biomedical content mining’. Emerging artificial intelligence (AI) models play a major role in the automation of Pharma 4.0. In AI, natural language processing (NLP) plays a dynamic role in extracting knowledge from biomedical documents. Research articles published by scientists and researchers contain an enormous amount of hidden information. Most of the original and peer-reviewed articles are indexed in PubMed. Extracting meaningful information from a large number of literature documents is very difficult for human beings. This research aims to extract the named entities of literature documents available in the life science domain. A high-level architecture is proposed along with a novel named entity recognition (NER) model. The model is built using rule-based machine learning (ML). The proposed ArRaNER model produced better accuracy and was also able to identify more entities. The NER model was tested on two different datasets: a PubMed dataset and a Wikipedia talk dataset. The ArRaNER model obtains an accuracy of 83.42% on the PubMed articles and 77.65% on the Wikipedia articles.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-022-04527-y