Domain specific word embeddings for natural language processing in radiology

[Display omitted] •Radiopaedia can be used as a domain-specific corpus in radiology NLP tasks.•Domain specific embeddings offer comparable performance on analogy completion.•Domain specific embeddings did significantly better on multi-label classification.•The source code, embeddings, and analogy da...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of biomedical informatics 2021-01, Vol.113, p.103665-103665, Article 103665
Hauptverfasser:	Chen, Timothy L., Emerling, Max, Chaudhari, Gunvant R., Chillakuru, Yeshwant R., Seo, Youngho, Vu, Thienkhai H., Sohn, Jae Ho
Format:	Artikel
Sprache:	eng
Schlagworte:	Analogy completion Computer Science Computer Science, Interdisciplinary Applications Life Sciences & Biomedicine Machine Learning Medical Informatics Multi-label classification Natural Language Processing Radiology Science & Technology Semantics Technology Unified Medical Language System Word embeddings
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	[Display omitted] •Radiopaedia can be used as a domain-specific corpus in radiology NLP tasks.•Domain specific embeddings offer comparable performance on analogy completion.•Domain specific embeddings did significantly better on multi-label classification.•The source code, embeddings, and analogy dataset are publicly released. There has been increasing interest in machine learning based natural language processing (NLP) methods in radiology; however, models have often used word embeddings trained on general web corpora due to lack of a radiology-specific corpus. We examined the potential of Radiopaedia to serve as a general radiology corpus to produce radiology specific word embeddings that could be used to enhance performance on a NLP task on radiological text. Embeddings of dimension 50, 100, 200, and 300 were trained on articles collected from Radiopaedia using a GloVe algorithm and evaluated on analogy completion. A shallow neural network using input from either our trained embeddings or pre-trained Wikipedia 2014 + Gigaword 5 (WG) embeddings was used to label the Radiopaedia articles. Labeling performance was evaluated based on exact match accuracy and Hamming loss. The McNemar’s test with continuity and the Benjamini-Hochberg correction and a 5×2 cross validation paired two-tailed t-test were used to assess statistical significance. For accuracy in the analogy task, 50-dimensional (50-D) Radiopaedia embeddings outperformed WG embeddings on tumor origin analogies (p
ISSN:	1532-0464 1532-0480
DOI:	10.1016/j.jbi.2020.103665