Feature engineering for improved machine-learning-aided studying heavy metal adsorption on biochar

Due to the broad interest in using biochar from biomass pyrolysis for the adsorption of heavy metals (HMs) in wastewater, machine learning (ML) has recently been adopted by many researchers to predict the adsorption capacity (η) of HMs on biochar. However, previous studies focused mainly on developi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of hazardous materials 2024-03, Vol.466, p.133442, Article 133442
Hauptverfasser: Shen, Tian, Peng, Haoyi, Yuan, Xingzhong, Liang, Yunshan, Liu, Shengqiang, Wu, Zhibin, Leng, Lijian, Qin, Pufeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Due to the broad interest in using biochar from biomass pyrolysis for the adsorption of heavy metals (HMs) in wastewater, machine learning (ML) has recently been adopted by many researchers to predict the adsorption capacity (η) of HMs on biochar. However, previous studies focused mainly on developing different ML algorithms to increase predictive performance, and no study shed light on engineering features to enhance predictive performance and improve model interpretability and generalizability. Here, based on a dataset widely used in previous ML studies, features of biochar were engineered—elemental compositions of biochar were calculated on mole basis—to improve predictive performance, achieving test R2 of 0.997 for the gradient boosting regression (GBR) model. The elemental ratio feature (H-O-2N)/C, representing the H site links to C (non-active site to HMs), was proposed for the first time to help interpret the GBR model. The (H-O-2N)/C and pH of biochar played essential roles in replacing cation exchange capacity (CEC) for predicting η. Moreover, expanding the coverages of variables by adding cases from references improved the generalizability of the model, and further validation using cases without CEC and specific surface area (R2 0.78) and adsorption experimental results (R2 0.72) proved the ML model desirable. Future studies in this area may take into account algorithm innovation, better description of variables, and higher coverage of variables to further increase the model's generalizability. [Display omitted] •C, H, O, N on mole basis greatly improved machine learning predictive performance.•(H-O-2N)/C, H connected with C (non-active site), improved model interpretability.•Cation exchange capacity (CEC) can be predicted and replaced by other biochar indices.•Expanding coverages of variables by adding data improved model generalizability.•Validation by references (R2 0.78) and adsorption experiments (R2 0.72) was desirable.
ISSN:0304-3894
1873-3336
1873-3336
DOI:10.1016/j.jhazmat.2024.133442