A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO 2 concentration in France 2005-2022

Understanding and managing the health effects of Nitrogen Dioxide (NO ) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO concentration across continental France from 2005 to 2022. Innovations of this work i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environmental research 2024-05, p.119241
Hauptverfasser: Barbalat, Guillaume, Hough, Ian, Dorman, Michael, Lepeule, Johanna, Kloog, Itai
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 119241
container_title Environmental research
container_volume
creator Barbalat, Guillaume
Hough, Ian
Dorman, Michael
Lepeule, Johanna
Kloog, Itai
description Understanding and managing the health effects of Nitrogen Dioxide (NO ) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO concentrations at a 1km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold cross-validated R for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO health effects in epidemiological studies.
doi_str_mv 10.1016/j.envres.2024.119241
format Article
fullrecord <record><control><sourceid>pubmed</sourceid><recordid>TN_cdi_pubmed_primary_38810827</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>38810827</sourcerecordid><originalsourceid>FETCH-pubmed_primary_388108273</originalsourceid><addsrcrecordid>eNqFjs1OwzAQhC0k1JafN0BoX8Bh12lKckSIihNcuFdOvG1d-SeyHaQeeXMiBGdOoxl9Mxoh7ggrQto8nCoOn4lzpVCtK6JOrelCrAi7jcSuqZfiKucTIlFT40Is67YlbNXjSnw9gZ9csXJuRzcVGwNwyOx7x-CjYQdxD-WYmMHwYPMMyDI72evMBrQ7xGTL0WcoEcbExg4FjLbuDG_voGCIYeBQkv6ZtgG2Sc8JKMRGznfVjbjca5f59levxf325eP5VY5T79nsxmS9Tufd3-f6X-AbHuFTrA</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO 2 concentration in France 2005-2022</title><source>Elsevier ScienceDirect Journals</source><creator>Barbalat, Guillaume ; Hough, Ian ; Dorman, Michael ; Lepeule, Johanna ; Kloog, Itai</creator><creatorcontrib>Barbalat, Guillaume ; Hough, Ian ; Dorman, Michael ; Lepeule, Johanna ; Kloog, Itai</creatorcontrib><description>Understanding and managing the health effects of Nitrogen Dioxide (NO ) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO concentrations at a 1km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold cross-validated R for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO health effects in epidemiological studies.</description><identifier>EISSN: 1096-0953</identifier><identifier>DOI: 10.1016/j.envres.2024.119241</identifier><identifier>PMID: 38810827</identifier><language>eng</language><publisher>Netherlands</publisher><ispartof>Environmental research, 2024-05, p.119241</ispartof><rights>Copyright © 2024. Published by Elsevier Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38810827$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Barbalat, Guillaume</creatorcontrib><creatorcontrib>Hough, Ian</creatorcontrib><creatorcontrib>Dorman, Michael</creatorcontrib><creatorcontrib>Lepeule, Johanna</creatorcontrib><creatorcontrib>Kloog, Itai</creatorcontrib><title>A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO 2 concentration in France 2005-2022</title><title>Environmental research</title><addtitle>Environ Res</addtitle><description>Understanding and managing the health effects of Nitrogen Dioxide (NO ) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO concentrations at a 1km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold cross-validated R for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO health effects in epidemiological studies.</description><issn>1096-0953</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqFjs1OwzAQhC0k1JafN0BoX8Bh12lKckSIihNcuFdOvG1d-SeyHaQeeXMiBGdOoxl9Mxoh7ggrQto8nCoOn4lzpVCtK6JOrelCrAi7jcSuqZfiKucTIlFT40Is67YlbNXjSnw9gZ9csXJuRzcVGwNwyOx7x-CjYQdxD-WYmMHwYPMMyDI72evMBrQ7xGTL0WcoEcbExg4FjLbuDG_voGCIYeBQkv6ZtgG2Sc8JKMRGznfVjbjca5f59levxf325eP5VY5T79nsxmS9Tufd3-f6X-AbHuFTrA</recordid><startdate>20240527</startdate><enddate>20240527</enddate><creator>Barbalat, Guillaume</creator><creator>Hough, Ian</creator><creator>Dorman, Michael</creator><creator>Lepeule, Johanna</creator><creator>Kloog, Itai</creator><scope>NPM</scope></search><sort><creationdate>20240527</creationdate><title>A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO 2 concentration in France 2005-2022</title><author>Barbalat, Guillaume ; Hough, Ian ; Dorman, Michael ; Lepeule, Johanna ; Kloog, Itai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-pubmed_primary_388108273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Barbalat, Guillaume</creatorcontrib><creatorcontrib>Hough, Ian</creatorcontrib><creatorcontrib>Dorman, Michael</creatorcontrib><creatorcontrib>Lepeule, Johanna</creatorcontrib><creatorcontrib>Kloog, Itai</creatorcontrib><collection>PubMed</collection><jtitle>Environmental research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Barbalat, Guillaume</au><au>Hough, Ian</au><au>Dorman, Michael</au><au>Lepeule, Johanna</au><au>Kloog, Itai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO 2 concentration in France 2005-2022</atitle><jtitle>Environmental research</jtitle><addtitle>Environ Res</addtitle><date>2024-05-27</date><risdate>2024</risdate><spage>119241</spage><pages>119241-</pages><eissn>1096-0953</eissn><abstract>Understanding and managing the health effects of Nitrogen Dioxide (NO ) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO concentrations at a 1km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold cross-validated R for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO health effects in epidemiological studies.</abstract><cop>Netherlands</cop><pmid>38810827</pmid><doi>10.1016/j.envres.2024.119241</doi></addata></record>
fulltext fulltext
identifier EISSN: 1096-0953
ispartof Environmental research, 2024-05, p.119241
issn 1096-0953
language eng
recordid cdi_pubmed_primary_38810827
source Elsevier ScienceDirect Journals
title A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO 2 concentration in France 2005-2022
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T02%3A38%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pubmed&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20multi-resolution%20ensemble%20model%20of%20three%20decision-tree-based%20algorithms%20to%20predict%20daily%20NO%202%20concentration%20in%20France%202005-2022&rft.jtitle=Environmental%20research&rft.au=Barbalat,%20Guillaume&rft.date=2024-05-27&rft.spage=119241&rft.pages=119241-&rft.eissn=1096-0953&rft_id=info:doi/10.1016/j.envres.2024.119241&rft_dat=%3Cpubmed%3E38810827%3C/pubmed%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/38810827&rfr_iscdi=true