Mapping soil arsenic pollution at a brownfield site using satellite hyperspectral imagery and machine learning

Heavy metal contamination is ubiquitous in brownfields. Traditional site investigation employs geostatistical interpolation methods (GIMs) to predict the distribution of soil pollutants after soil sampling and chemical analysis. However, the heterogeneity of soil pollution in brownfields makes the a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Science of the total environment 2023-01, Vol.857, p.159387-159387, Article 159387
Hauptverfasser: Jia, Xiyue, Hou, Deyi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Heavy metal contamination is ubiquitous in brownfields. Traditional site investigation employs geostatistical interpolation methods (GIMs) to predict the distribution of soil pollutants after soil sampling and chemical analysis. However, the heterogeneity of soil pollution in brownfields makes the assumptions of GIMs no longer valid and further undermines the accuracy of soil investigation. In the present study, a satellite hyperspectral image processing and machine learning method was developed to map arsenic pollution at a brownfield site. To eliminate the noise caused by atmospheric factors and increase the efficiency of spectral data, 1.3 million spectral indexes (SIs) were constructed and 1171 of them were selected due to their high correlations with soil arsenic. Five machine learning methods, i.e., Random forest (RF), ExtraTrees, Adaptive Boosting, Extreme Gradient Trees, and Gradient Descent Boosting Trees (GDB) were built to predict soil arsenic. The RF method was found to render the best performance (r = 0.78), reducing 30 % of prediction errors compared with traditional GIMs. RF also maintained a relatively higher level of accuracy (r = 0.56) when the sampling grids increased to 100 m, which was higher than that of GIMs under a 50 m sampling grid (r = 0.42), revealing that the proposed method can provide more accurate results with fewer sampling points, namely less investigation cost. It was indicated that the second derivate was the most efficient preprocessing method to remove spectral noise and normalized difference (ND) was the most reliable spectral index construction strategy. Based on uncertainty analysis, the heterogeneity of soil arsenic distribution was considered the most influential factor causing prediction errors. This study demonstrates that machine learning based on satellite visible and near-infrared reflectance spectroscopy (VNIR) is a promising approach to map soil arsenic contamination at brownfield sites with high accuracy and low cost. [Display omitted] •1.3 million spectral indexes were constructed.•RF was the best model (r = 0.78).•RF reduces 30 % of prediction error compared to Kriging.•Normalized difference was the most effective spectral index construction strategy.•Satellite hyperspectral imagery can be used to monitoring soil pollution in industrial sites.
ISSN:0048-9697
1879-1026
DOI:10.1016/j.scitotenv.2022.159387