Comparative analysis of machine learning algorithms for identifying cobalt contamination in soil using spectroscopy

Cobalt (Co) has been recognized as one of the most hazardous elements by the United Nations Environmental Program; however, it has received limited attention in previous studies of identifying heavy metal contamination and has been limited to small, site-scale datasets and few machine learning algor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of environmental chemical engineering 2024-10, Vol.12 (5), p.113328, Article 113328
Hauptverfasser: Zhou, Nana, Hu, Tao, Wu, Mengting, Chen, Qiusong, Qi, Chongchong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Cobalt (Co) has been recognized as one of the most hazardous elements by the United Nations Environmental Program; however, it has received limited attention in previous studies of identifying heavy metal contamination and has been limited to small, site-scale datasets and few machine learning algorithms. To fill this research gap, eight machine learning algorithms were combined with visible and near-infrared reflectance spectroscopy in this study to develop a large-scale model for classifying Co content in soil. In total, 18,675 topsoil samples were used to train and validate the models. Spectral preprocessing, principal component analysis, and hyper-parameter tuning were utilized to improve the tested models’ performance, which was evaluated using multiple indicators. The optimal model was then applied to the United States soil spectral dataset. The results show that the eXtreme Gradient Boosting (XGB) performed the best on both the training and testing sets, with an area under the curve value of 0.901 on the training set and 0.904 on the testing set. The application of XGB revealed that Utah, Arizona, New Mexico, North Dakota, Arkansas, Mississippi, and Alabama were at higher risk of Co contamination. In summary, XGB can be used to effectively identify areas of potential Co contamination. The approach presented here not only fills a gap in terms of large-scale identification of Co contamination but can also greatly assist environmental managers in risk assessment and reclamation strategies. [Display omitted] •Soil Co content was identified from soil spectra using machine learning (ML).•A comprehensive comparison of eight ML algorithms was conducted.•The largest dataset consists of 18,675 soil samples was employed.•XGB performed the best and could be used for large-scale Co identification.
ISSN:2213-3437
DOI:10.1016/j.jece.2024.113328