Mining heterogeneous clinical notes by multi-modal latent topic model

Latent knowledge can be extracted from the electronic notes that are recorded during patient encounters with the health system. Using these clinical notes to decipher a patient's underlying comorbidites, symptom burdens, and treatment courses is an ongoing challenge. Latent topic model as an ef...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one 2021-04, Vol.16 (4), p.e0249622-e0249622
Hauptverfasser: Wen, Zhi, Nair, Pratheeksha, Deng, Chih-Ying, Lu, Xing Han, Moseley, Edward, George, Naomi, Lindvall, Charlotta, Li, Yue
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Latent knowledge can be extracted from the electronic notes that are recorded during patient encounters with the health system. Using these clinical notes to decipher a patient's underlying comorbidites, symptom burdens, and treatment courses is an ongoing challenge. Latent topic model as an efficient Bayesian method can be used to model each patient's clinical notes as "documents" and the words in the notes as "tokens". However, standard latent topic models assume that all of the notes follow the same topic distribution, regardless of the type of note or the domain expertise of the author (such as doctors or nurses). We propose a novel application of latent topic modeling, using multi-note topic model (MNTM) to jointly infer distinct topic distributions of notes of different types. We applied our model to clinical notes from the MIMIC-III dataset to infer distinct topic distributions over the physician and nursing note types. Based on manual assessments made by clinicians, we observed a significant improvement in topic interpretability using MNTM modeling over the baseline single-note topic models that ignore the note types. Moreover, our MNTM model led to a significantly higher prediction accuracy for prolonged mechanical ventilation and mortality using only the first 48 hours of patient data. By correlating the patients' topic mixture with hospital mortality and prolonged mechanical ventilation, we identified several diagnostic topics that are associated with poor outcomes. Because of its elegant and intuitive formation, we envision a broad application of our approach in mining multi-modality text-based healthcare information that goes beyond clinical notes. Code available at https://github.com/li-lab-mcgill/heterogeneous_ehr.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0249622