A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO 2 concentration in France 2005-2022

Understanding and managing the health effects of Nitrogen Dioxide (NO ) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO concentration across continental France from 2005 to 2022. Innovations of this work i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environmental research 2024-05, p.119241
Hauptverfasser: Barbalat, Guillaume, Hough, Ian, Dorman, Michael, Lepeule, Johanna, Kloog, Itai
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Understanding and managing the health effects of Nitrogen Dioxide (NO ) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO concentrations at a 1km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold cross-validated R for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO health effects in epidemiological studies.
ISSN:1096-0953
DOI:10.1016/j.envres.2024.119241