MACHINE LEARNING CLASSIFICATION OF THE EPIDEMIOLOGIC STAGES OF INFLAMMATORY BOWEL DISEASE ACROSS GEOGRAPHY AND TIME
Abstract BACKGROUND Epidemiologic stages of inflammatory bowel disease (IBD) have been proposed: 1. Emergence (low incidence and prevalence); 2. Acceleration in Incidence (rapidly rising incidence, low prevalence); and 3. Compounding Prevalence (stabilizing incidence, rapidly rising prevalence). To...
Gespeichert in:
Veröffentlicht in: | Inflammatory bowel diseases 2024-01, Vol.30 (Supplement_1), p.S00-S00 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Abstract
BACKGROUND
Epidemiologic stages of inflammatory bowel disease (IBD) have been proposed: 1. Emergence (low incidence and prevalence); 2. Acceleration in Incidence (rapidly rising incidence, low prevalence); and 3. Compounding Prevalence (stabilizing incidence, rapidly rising prevalence). To date, these stages have been theoretical without quantified definitions of incidence and prevalence.
AIM
To use machine learning to determine incidence and prevalence ranges corresponding to the epidemiologic stages and provide stage classifications across time for global regions.
METHODS
We built a supervised random forest classifier in R to determine epidemiologic stages of IBD from population-based studies (n=340), a subset derived from a systematic review on the incidence and prevalence of IBD. A labelled training data set comprising rates of incidence and prevalence of Crohn’s disease (CD) and ulcerative colitis (UC) extracted from the systematic review was used to predict classifications of stage 1, stage 2, or stage 3 for each region, stratified by decade (1960–2019). Model accuracy was measured using a blind validation data set. The validated model was then used to predict stage classifications for regions in the data set. Interquartile ranges for incidence and prevalence of CD and UC were calculated on the random forest output, and the distributions were compared using negative binomial regression.
RESULTS
The random forest’s classification accuracy on the blinded validation data was 93.7% (95%CI: 90.6, 96.1) indicating an appropriate model fit and performance. Significant differences between all stages for the incidence and prevalence of CD and UC (p |
---|---|
ISSN: | 1078-0998 1536-4844 |
DOI: | 10.1093/ibd/izae020.072 |