Feature engineering for improved machine-learning-aided studying heavy metal adsorption on biochar
Due to the broad interest in using biochar from biomass pyrolysis for the adsorption of heavy metals (HMs) in wastewater, machine learning (ML) has recently been adopted by many researchers to predict the adsorption capacity (η) of HMs on biochar. However, previous studies focused mainly on developi...
Gespeichert in:
Veröffentlicht in: | Journal of hazardous materials 2024-03, Vol.466, p.133442, Article 133442 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Due to the broad interest in using biochar from biomass pyrolysis for the adsorption of heavy metals (HMs) in wastewater, machine learning (ML) has recently been adopted by many researchers to predict the adsorption capacity (η) of HMs on biochar. However, previous studies focused mainly on developing different ML algorithms to increase predictive performance, and no study shed light on engineering features to enhance predictive performance and improve model interpretability and generalizability. Here, based on a dataset widely used in previous ML studies, features of biochar were engineered—elemental compositions of biochar were calculated on mole basis—to improve predictive performance, achieving test R2 of 0.997 for the gradient boosting regression (GBR) model. The elemental ratio feature (H-O-2N)/C, representing the H site links to C (non-active site to HMs), was proposed for the first time to help interpret the GBR model. The (H-O-2N)/C and pH of biochar played essential roles in replacing cation exchange capacity (CEC) for predicting η. Moreover, expanding the coverages of variables by adding cases from references improved the generalizability of the model, and further validation using cases without CEC and specific surface area (R2 0.78) and adsorption experimental results (R2 0.72) proved the ML model desirable. Future studies in this area may take into account algorithm innovation, better description of variables, and higher coverage of variables to further increase the model's generalizability.
[Display omitted]
•C, H, O, N on mole basis greatly improved machine learning predictive performance.•(H-O-2N)/C, H connected with C (non-active site), improved model interpretability.•Cation exchange capacity (CEC) can be predicted and replaced by other biochar indices.•Expanding coverages of variables by adding data improved model generalizability.•Validation by references (R2 0.78) and adsorption experiments (R2 0.72) was desirable. |
---|---|
ISSN: | 0304-3894 1873-3336 1873-3336 |
DOI: | 10.1016/j.jhazmat.2024.133442 |