Weighted Co-Occurrence Bio-term Graph for Unsupervised Word Sense Disambiguation in the Biomedical Domain

Word Sense Disambiguation (WSD) is a significant and challenging task for text understanding and processing. This paper presents an unsupervised approach based on weighted co-occurrence bio-term graph (WCOTG) for performing WSD in the biomedical domain. The graph is automatically created from biomed...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2023-01, Vol.11, p.1-1
Hauptverfasser:	Zhang, Zhenling, Jia, Yangli, Zhang, Xiangliang, Papadopoulou, Maria, Roche, Christophe
Format:	Artikel
Sprache:	eng
Schlagworte:	Abstracts Algorithms Biological system modeling Biomedical informatics Biomedical Natural language processing Bit error rate Corpus linguistics Domains Natural language processing Neural networks Personalised PageRank algorithm Search algorithms Task analysis Transformers Unified medical language system Unified modeling language Word sense disambiguation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Word Sense Disambiguation (WSD) is a significant and challenging task for text understanding and processing. This paper presents an unsupervised approach based on weighted co-occurrence bio-term graph (WCOTG) for performing WSD in the biomedical domain. The graph is automatically created from biomedical terms that are extracted from a corpus of downloaded scientific abstracts. Two kinds of weights are introduced on the links of the built bio-term graph and are taken as important factors in the process of disambiguation. The modified Personalised PageRank (PPR) algorithm is used for performing WSD. When evaluated on the NLM-WSD and MSH-WSD 1 test datasets, and an acronym test set, the method outperforms the widely used unsupervised ones addressing the same problem, and the average result is almost equal to that of the BlueBERT_LE 2 -based method. In contrast, our method has no additional enhancement or training for BERT 3 -based models. Comparative experiments validate the positive effect of links' weight on disambiguation efficiency. Last, the statistical experiments on the relation among system accuracy, numbers of medical abstracts in the corpus, and the corresponding extracted terms suggest an excellent minimum corpus scale when resources are limited.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2023.3272056