Developing stacking ensemble models for multivariate contamination detection in water distribution systems

This study presents a new stacking ensemble model for contamination event detection using multiple water quality parameters. The stacking model consists of a number of machine learning base predictors and a meta-predictor, and it is trained using cross-validation to capture different features in mul...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Science of the total environment 2022-07, Vol.828, p.154284-154284, Article 154284
Hauptverfasser: Li, Zilin, Zhang, Chi, Liu, Haixing, Zhang, Chao, Zhao, Mengke, Gong, Qiang, Fu, Guangtao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This study presents a new stacking ensemble model for contamination event detection using multiple water quality parameters. The stacking model consists of a number of machine learning base predictors and a meta-predictor, and it is trained using cross-validation to capture different features in multiple water quality parameters and then used for water quality predictions. For each water quality parameter, the residuals between predicted and measured data are classified to identify anomalies with thresholds derived from the sequential model-based optimization method and detection probabilities updated using Bayesian analysis. Alarms derived from individual water quality parameters are fused to enhance the anomaly signals and improve the detection accuracy. The proposed stacking-based method is evaluated using a data set of six water quality parameters from a real water distribution system with randomly simulated events. The stacking-based method could detect 2496 events out of a total 2500 events without a false alarm. The results show that the stacking method outperforms an artificial neural network (ANN) benchmark method in contamination event detection. The stacking method has a higher true positive rate, lower false positive rate and higher F1 score than the ANN method. This implies that the stacking method has great promise of detecting contamination events in the water distribution system. [Display omitted] •New stacking ensemble model is developed for contamination event detection in water distribution systems.•Sequential model-based optimization provides appropriate thresholds for anomaly identification.•Bayesian sequential analysis and alarm fusion of multivariate water quality parameters reduce false alarm rate.•The stacking method shows high true positive rate and F1 score, and outperforms traditional neural networks.
ISSN:0048-9697
1879-1026
DOI:10.1016/j.scitotenv.2022.154284