Machine learning for modeling N2O emissions from wastewater treatment plants: Aligning model performance, complexity, and interpretability

•A novel definition of the best machine model for N2O soft sensors was established.•Multivariate outlier detection considered special characteristics of wastewater.•Feature reduction balanced data acquisition, computation time, and model accuracy.•Boosted decision trees showed better overall perform...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Water research (Oxford) 2023-10, Vol.245, p.120667-120667, Article 120667
Hauptverfasser: Khalil, Mostafa, AlSayed, Ahmed, Liu, Yang, Vanrolleghem, Peter A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A novel definition of the best machine model for N2O soft sensors was established.•Multivariate outlier detection considered special characteristics of wastewater.•Feature reduction balanced data acquisition, computation time, and model accuracy.•Boosted decision trees showed better overall performance than deep neural network.•Interpretable ML is a step towards guided decisions for N2O emissions mitigation. Nitrous oxide (N2O) emissions may account for up to 80 % of a wastewater treatment plant's (WWTP) total carbon footprint. Given the complexity of the pathways involved, estimating N2O emissions through mechanistic models still often fails to precisely depict process dynamics. Alternatively, data-driven methods for predicting N2O emissions hold substantial potential. However, so far, a comprehensive approach is still overlooked, impeding the advancement of full-scale application. Therefore, this study develops a comprehensive approach for using machine learning to perform online process modeling of N2O emissions. The approach is tested on a long-term N2O emission dataset from a full-scale WWTP. Uniquely, the proposed approach emphasizes not just model accuracy, but it also considers model complexity, computational speed, and interpretability, equipping operators with the insights needed for informed corrective actions. Algorithms with varying levels of complexity and interpretability including k-Nearest Neighbors (kNN), decision trees, ensemble learning models, and deep neural networks (DNN) were considered. Furthermore, a parametric multivariate outlier removal method was adjusted to account for data statistical distributions, significantly reducing data loss. By employing an effective feature selection methodology, a trade-off between data acquisition, model performance, and complexity was found, reducing the number of features by 40 % and decreasing data collection cost, model complexity and computational burden without significant effect on modeling accuracy. The best performing models are kNN (R2 = 0.88), AdaBoost (R2 = 0.94), and DNN (R2 = 0.90). Feature importance of models was analyzed and compared with process knowledge to test interpretability, guiding N2O mitigation decisions. [Display omitted]
ISSN:0043-1354
1879-2448
DOI:10.1016/j.watres.2023.120667