1380. Machine Learning-based Estimation of Unconfirmed COVID-19 Cases from a 10,000-Household Survey in Gilgit-Baltistan, Pakistan

Abstract Background Robust estimates of COVID-19 prevalence during the pandemic are scarce, particularly in settings with limited SARS-CoV-2 testing. Gilgit-Baltistan (GB) is a remote region of Pakistan where healthcare access is limited by underdeveloped facility and road infrastructure. We leverag...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Open forum infectious diseases 2023-11, Vol.10 (Supplement_2)
Hauptverfasser: Farrar, Daniel S, Pell, Lisa G, Muhammad, Yasin, Hafiz, Sher, Erdman, Lauren, Bassani, Diego G, Tanner, Zachary, Ahmed, Imran, Muhammad, Karim, Madhani, Falak, Paracha, Shariq, Khan, Masood Ali, Soofi, Sajid B, Taljaard, Monica, Spitzer, Rachel, Abu Fadaleh, Sarah M, Bhutta, Zulfiqar A, Morris, Shaun
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Background Robust estimates of COVID-19 prevalence during the pandemic are scarce, particularly in settings with limited SARS-CoV-2 testing. Gilgit-Baltistan (GB) is a remote region of Pakistan where healthcare access is limited by underdeveloped facility and road infrastructure. We leveraged a large household survey to describe the burden of confirmed and unconfirmed COVID-19 in GB. Methods We conducted a cross-sectional survey in GB from June–August 2021 during the baseline phase of a cluster randomized trial. Households were randomly selected using a stratified, two-stage sampling design. Data regarding SARS-CoV-2 testing, healthcare worker (HCW) diagnoses without testing, symptoms, and outcomes since March 2020 were self-reported for all household members. “Confirmed/probable” COVID-19 was defined as a positive test, HCW diagnosis of COVID-19, or HCW diagnosis of pneumonia with COVID-19 positive contact. Using machine learning (ML) and bootstrap validation, we developed a symptom-based diagnostic model to differentiate confirmed/probable infections from those with negative SARS-CoV-2 tests (Fig. 1). We applied this model to untested respondents to estimate the total prevalence of COVID-19.Figure 1.Workflow diagram for machine learning analysis. auROC=Area under the receiver operating characteristic curve; CI=Confidence interval; LR=Logistic regression; RF=Random forest; SVM=Support vector machines; XGB=eXtreme Gradient Boosting Results Data were collected from 77924 people in 10264 households. Overall, 314 had confirmed/probable COVID-19, 3263 had negative tests, and 74347 were untested. SARS-CoV-2 testing was less common in females (vs. males; 38 vs. 58 tests per 1000 people) and children (vs. adults; 17 vs. 76 tests per 1000 people). Using an extreme gradient boosting model, area under the receiver operating characteristic curve was 0.92 (95% confidence interval [CI] 0.90–0.93), sensitivity was 0.81 (CI 0.75–0.85), and specificity was 0.88 (CI 0.85–0.90). With this model, total estimated cases were 8–17 times more than the number of individuals with positive tests (Fig. 2). The ratio of estimated to confirmed cases was higher for children (90–213 times) and females (13–25 times).Figure 2.Estimation of probable and possible COVID-19 cases, overall and by age and sex.Confirmed COVID-19 indicates individuals with positive SARS-CoV-2 tests; probable COVID-19 includes HCW diagnoses of COVID-19 and positive predictions from the machine learning an
ISSN:2328-8957
2328-8957
DOI:10.1093/ofid/ofad500.1217