Which model to choose? Performance comparison of statistical and machine learning models in predicting PM2.5 from high-resolution satellite aerosol optical depth

The mathematical solution to estimate surface fine particulate matter (PM2.5) from columnar aerosol optical depth (AOD) includes complex variables and involves a bunch of assumptions. Hence, researchers tend to use training-based models to predict PM2.5 from AOD. Here, we integrated regulatory compo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Atmospheric environment (1994) 2022-08, Vol.282, p.119164, Article 119164
Hauptverfasser:	Kulkarni, Padmavati, Sreekanth, V., Upadhya, Adithi R., Gautam, Hrishikesh Chandra
Format:	Artikel
Sprache:	eng
Schlagworte:	Machine learning MODIS-MAIAC Regression
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The mathematical solution to estimate surface fine particulate matter (PM2.5) from columnar aerosol optical depth (AOD) includes complex variables and involves a bunch of assumptions. Hence, researchers tend to use training-based models to predict PM2.5 from AOD. Here, we integrated regulatory composite PM2.5 measurements, high-resolution satellite AOD, reanalysis meteorological parameters, and a few other auxiliary parameters to train ten different regression models. The performance of these (seven statistical and three machine learning) models was evaluated and inter-compared to identify the best performing model. The accuracies of the model predicted PM2.5 were quantified based on the coefficient of determination (R2), mean absolute bias (MAB), normalized root mean square error (NRMSE), and other relevant regression coefficients. The model's performance on unseen data was investigated in terms of 10-fold cross-validation (CV) and Leave-one station-out CV (LOOCV). For this exercise, we considered the case of NCT-Delhi due to: (i) the availability of dense regulatory PM2.5 measurements, (ii) the possibility of understanding the model performance over a large range of PM2.5 (the daily mean PM2.5 values ranged between ∼ 4 and 492 μg m−3 during the study period), and (iii) the scope of better understanding the influence of extreme meteorological conditions (e.g. the ambient surface temperature varies between ∼5 and 40 °C during a calendar year) on the AOD-PM2.5 relationship. All the models were trained using data collected for the year 2019 (a non-COVID year). Among models under investigation, Machine Learning (ML) models performed better with R2, MAB, and NRMSE values for the CV exercises ranging between 0.88 and 0.93, 14.1 and 18.2 μg m−3, and 0.18 and 0.23, respectively. The generalizability of the results obtained in this study was discussed. [Display omitted] •Ten models were investigated for their accuracy in predicting PM2.5 from AOD.•Models included linear mixed-effects, Random Forest, Deep Learning, etc.•Machine learning models performed better than statistical models.
ISSN:	1352-2310 1873-2844
DOI:	10.1016/j.atmosenv.2022.119164