Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada
The objective of this study was to investigate whether a rule-based natural language processing (NLP) system, applied to primary care clinical text data, could be used to monitor COVID-19 viral activity in Toronto, Canada. We employed a retrospective cohort design. We included primary care patients...
Gespeichert in:
Veröffentlicht in: | PLOS digital health 2022-12, Vol.1 (12), p.e0000150-e0000150 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The objective of this study was to investigate whether a rule-based natural language processing (NLP) system, applied to primary care clinical text data, could be used to monitor COVID-19 viral activity in Toronto, Canada. We employed a retrospective cohort design. We included primary care patients with a clinical encounter between January 1, 2020 and December 31, 2020 at one of 44 participating clinical sites. During the study timeframe, Toronto first experienced a COVID-19 outbreak between March-2020 and June-2020; followed by a second viral resurgence from October-2020 through December-2020. We used an expert derived dictionary, pattern matching tools and contextual analyzer to classify primary care documents as 1) COVID-19 positive, 2) COVID-19 negative, or 3) unknown COVID-19 status. We applied the COVID-19 biosurveillance system across three primary care electronic medical record text streams: 1) lab text, 2) health condition diagnosis text and 3) clinical notes. We enumerated COVID-19 entities in the clinical text and estimated the proportion of patients with a positive COVID-19 record. We constructed a primary care COVID-19 NLP-derived time series and investigated its correlation with independent/external public health series: 1) lab confirmed COVID-19 cases, 2) COVID-19 hospitalizations, 3) COVID-19 ICU admissions, and 4) COVID-19 intubations. A total of 196,440 unique patients were observed over the study timeframe, of which 4,580 (2.3%) had at least one positive COVID-19 document in their primary care electronic medical record. Our NLP-derived COVID-19 time series describing the temporal dynamics of COVID-19 positivity status over the study timeframe demonstrated a pattern/trend which strongly mirrored that of other external public health series under investigation. We conclude that primary care text data passively collected from electronic medical record systems represent a high quality, low-cost source of information for monitoring/surveilling COVID-19 impacts on community health. |
---|---|
ISSN: | 2767-3170 2767-3170 |
DOI: | 10.1371/journal.pdig.0000150 |