Classifying Unstructured Clinical Notes via Automatic Weak Supervision
Healthcare providers usually record detailed notes of the clinical care delivered to each patient for clinical, research, and billing purposes. Due to the unstructured nature of these narratives, providers employ dedicated staff to assign diagnostic codes to patients' diagnoses using the Intern...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Healthcare providers usually record detailed notes of the clinical care
delivered to each patient for clinical, research, and billing purposes. Due to
the unstructured nature of these narratives, providers employ dedicated staff
to assign diagnostic codes to patients' diagnoses using the International
Classification of Diseases (ICD) coding system. This manual process is not only
time-consuming but also costly and error-prone. Prior work demonstrated
potential utility of Machine Learning (ML) methodology in automating this
process, but it has relied on large quantities of manually labeled data to
train the models. Additionally, diagnostic coding systems evolve with time,
which makes traditional supervised learning strategies unable to generalize
beyond local applications. In this work, we introduce a general
weakly-supervised text classification framework that learns from class-label
descriptions only, without the need to use any human-labeled documents. It
leverages the linguistic domain knowledge stored within pre-trained language
models and the data programming framework to assign code labels to individual
texts. We demonstrate the efficacy and flexibility of our method by comparing
it to state-of-the-art weak text classifiers across four real-world text
classification datasets, in addition to assigning ICD codes to medical notes in
the publicly available MIMIC-III database. |
---|---|
DOI: | 10.48550/arxiv.2206.12088 |