Imputing environmental impact missing data of the industrial sector for Chinese cities: A machine learning approach

Data are the lifeblood of evidence-based decision-making and the raw material for accountability. Collecting data to regularly evaluate industrial consumption and pollution at the city level is not an easy task, which needs a significant investment of institutional and financial resources and engage...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environmental impact assessment review 2023-05, Vol.100, p.107050, Article 107050
Hauptverfasser: Chen, Xi, Shuai, Chenyang, Zhao, Bu, Zhang, Yu, Li, Kaijian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Data are the lifeblood of evidence-based decision-making and the raw material for accountability. Collecting data to regularly evaluate industrial consumption and pollution at the city level is not an easy task, which needs a significant investment of institutional and financial resources and engagement with a vast number of local governments. Despite the Chinese government putting extensive human and financial resources into data collection, there are still substantial data gaps. This study compared two traditional linear models and four machine learning models to computationally estimate missing data of six industrial consumption and pollution indicators (responses) of 701 cities from 2006 to 2018 with ten predictors. Results showed that a decision-tree based extreme gradient boosting model developed performed best among the six models. The median values of coefficient of determination (R2) and root mean squared error of six responses ranged between 0.85 and 0.94 and 8.5 to 17,776, respectively. This study provided high-quality and detailed data for industrial environmental analysis of Chinese cities. In addition, the extreme gradient boosting model could be adapted to impute the missing data for other environmental variables of other sectors and at an even smaller scale given its good generalization ability. •Missing data of six industrial consumption and pollution indicators of Chinese cities were estimated.•Two traditional linear models and four machine learning models were developed and tested.•The median values of coefficient of determination and root mean squared error were obtained.•Extreme gradient boosting model performed best among all the six models.•Missing data records were successfully recovered by using the extreme gradient boosting model.
ISSN:0195-9255
1873-6432
DOI:10.1016/j.eiar.2023.107050