Source identification and prediction of nitrogen and phosphorus pollution of Lake Taihu by an ensemble machine learning technique

● A machine learning model was used to identify lake nutrient pollution sources. ● XGBoost model showed the best performance for lake water quality prediction. ● Model feature size was reduced by screening the key features with the MIC method. ● TN and TP concentrations of Lake Taihu are mainly affe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Frontiers of environmental science & engineering 2023-05, Vol.17 (5), p.55, Article 55
Hauptverfasser: Hu, Yirong, Du, Wenjie, Yang, Cheng, Wang, Yang, Huang, Tianyin, Xu, Xiaoyi, Li, Wenwei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:● A machine learning model was used to identify lake nutrient pollution sources. ● XGBoost model showed the best performance for lake water quality prediction. ● Model feature size was reduced by screening the key features with the MIC method. ● TN and TP concentrations of Lake Taihu are mainly affected by endogenous sources. ● Next-month lake TN and TP concentrations were predicted accurately. Effective control of lake eutrophication necessitates a full understanding of the complicated nitrogen and phosphorus pollution sources, for which mathematical modeling is commonly adopted. In contrast to the conventional knowledge-based models that usually perform poorly due to insufficient knowledge of pollutant geochemical cycling, we employed an ensemble machine learning (ML) model to identify the key nitrogen and phosphorus sources of lakes. Six ML models were developed based on 13 years of historical data of Lake Taihu's water quality, environmental input, and meteorological conditions, among which the XGBoost model stood out as the best model for total nitrogen (TN) and total phosphorus (TP) prediction. The results suggest that the lake TN is mainly affected by the endogenous load and inflow river water quality, while the lake TP is predominantly from endogenous sources. The prediction of the lake TN and TP concentration changes in response to these key feature variations suggests that endogenous source control is a highly desirable option for lake eutrophication control. Finally, one-month-ahead prediction of lake TN and TP concentrations ( R 2 of 0.85 and 0.95, respectively) was achieved based on this model with sliding time window lengths of 9 and 6 months, respectively. Our work demonstrates the great potential of using ensemble ML models for lake pollution source tracking and prediction, which may provide valuable references for early warning and rational control of lake eutrophication.
ISSN:2095-2201
2095-221X
DOI:10.1007/s11783-023-1655-7