Neural Language Models with Distant Supervision to Identify Major Depressive Disorder from Clinical Notes
Major depressive disorder (MDD) is a prevalent psychiatric disorder that is associated with significant healthcare burden worldwide. Phenotyping of MDD can help early diagnosis and consequently may have significant advantages in patient management. In prior research MDD phenotypes have been extracte...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Major depressive disorder (MDD) is a prevalent psychiatric disorder that is
associated with significant healthcare burden worldwide. Phenotyping of MDD can
help early diagnosis and consequently may have significant advantages in
patient management. In prior research MDD phenotypes have been extracted from
structured Electronic Health Records (EHR) or using Electroencephalographic
(EEG) data with traditional machine learning models to predict MDD phenotypes.
However, MDD phenotypic information is also documented in free-text EHR data,
such as clinical notes. While clinical notes may provide more accurate
phenotyping information, natural language processing (NLP) algorithms must be
developed to abstract such information. Recent advancements in NLP resulted in
state-of-the-art neural language models, such as Bidirectional Encoder
Representations for Transformers (BERT) model, which is a transformer-based
model that can be pre-trained from a corpus of unsupervised text data and then
fine-tuned on specific tasks. However, such neural language models have been
underutilized in clinical NLP tasks due to the lack of large training datasets.
In the literature, researchers have utilized the distant supervision paradigm
to train machine learning models on clinical text classification tasks to
mitigate the issue of lacking annotated training data. It is still unknown
whether the paradigm is effective for neural language models. In this paper, we
propose to leverage the neural language models in a distant supervision
paradigm to identify MDD phenotypes from clinical notes. The experimental
results indicate that our proposed approach is effective in identifying MDD
phenotypes and that the Bio- Clinical BERT, a specific BERT model for clinical
data, achieved the best performance in comparison with conventional machine
learning models. |
---|---|
DOI: | 10.48550/arxiv.2104.09644 |