Machine learning and explainable artificial intelligence for the prevention of waterborne cryptosporidiosis and giardiosis

•Explainable AI as a “toolbox” in early warning systems for waterborne outbreaks.•Random Forest for modelling Cryptosporidium.•Extreme Gradient Boosting and Support Vector Regression for modelling Giardia.•Different combinations of biotic/abiotic markers were informative in each model. Cryptosporidi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Water research (Oxford) 2024-09, Vol.262, p.122110, Article 122110
Hauptverfasser: Ligda, Panagiota, Mittas, Nikolaos, Kyzas, George Z., Claerebout, Edwin, Sotiraki, Smaragda
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Explainable AI as a “toolbox” in early warning systems for waterborne outbreaks.•Random Forest for modelling Cryptosporidium.•Extreme Gradient Boosting and Support Vector Regression for modelling Giardia.•Different combinations of biotic/abiotic markers were informative in each model. Cryptosporidium and Giardia are important parasitic protozoa due to their zoonotic potential and impact on human health, and have often caused waterborne outbreaks of disease. Detection of (oo)cysts in water matrices is challenging and extremely costly, thus only few countries have legislated for regular monitoring of drinking water for their presence. Several attempts have been made trying to investigate the association between the presence of such (oo)cysts in waters with other biotic or abiotic factors, with inconclusive findings. In this regard, the aim of this study was the development of an holistic approach leveraging Machine Learning (ML) and eXplainable Artificial Intelligence (XAI) techniques, in order to provide empirical evidence related to the presence and prediction of Cryptosporidium oocysts and Giardia cysts in water samples. To meet this objective, we initially modelled the complex relationship between Cryptosporidium and Giardia (oo)cysts and a set of parasitological, microbiological, physicochemical and meteorological parameters via a model-agnostic meta-learner algorithm that provides flexibility regarding the selection of the ML model executing the fitting task. Based on this generic approach, a set of four well-known ML candidates were, empirically, evaluated in terms of their predictive capabilities. Then, the best-performed algorithms, were further examined through XAI techniques for gaining meaningful insights related to the explainability and interpretability of the derived solutions. The findings reveal that the Random Forest achieves the highest prediction performance when the objective is the prediction of both contamination and contamination intensity with Cryptosporidium oocysts in a given water sample, with meteorological/physicochemical and microbiological markers being informative, respectively. For the prediction of contamination with Giardia, the eXtreme Gradient Boosting with physicochemical parameters was the most efficient algorithm, while, the Support Vector Regression that takes into consideration both microbiological and meteorological markers was more efficient for evaluating the contamination intensity with cysts. The results of th
ISSN:0043-1354
1879-2448
1879-2448
DOI:10.1016/j.watres.2024.122110