Spatial-temporal distribution of labeled set bias remote sensing estimation: An implication for supervised machine learning in water quality monitoring

•Increasing temporal gap between sampling and imaging results in greater uncertainty in the SML-derived WQP concentration.•The estimation errors of OAPs increase faster than those of nOAPs.•SML conducted on imbalanced training sets provides smooth estimates. Supervised machine learning (SML) has bec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of applied earth observation and geoinformation 2024-07, Vol.131, p.103959, Article 103959
Hauptverfasser: Zhou, Yadong, Li, Wen, Cao, Xiaoyu, He, Boayin, Feng, Qi, Yang, Fan, Liu, Hui, Kutser, Tiit, Xu, Min, Xiao, Fei, Geng, Xueer, Yu, kai, Du, Yun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Increasing temporal gap between sampling and imaging results in greater uncertainty in the SML-derived WQP concentration.•The estimation errors of OAPs increase faster than those of nOAPs.•SML conducted on imbalanced training sets provides smooth estimates. Supervised machine learning (SML) has become a crucial tool for estimating water quality parameters (WQPs) from satellite images. Its effectiveness relies heavily on synchronised in-situ datasets covering diverse water bodies. However, collecting such datasets is time-consuming, resulting in temporal gaps between sampling and imaging. In addition, the in situ dataset may exhibit an imbalance. These imperfections could introduce uncertainties to SML-derived models, compromising the accuracy of the WQP estimates. Using in situ data collected automatically every four hours, the estimation of both optically active parameters (OAPs) and non-optically active parameters (nOAPs) in the Middle Reaches of the Yangtze River (MRYR) serves as an example to illustrate the importance of this challenge in freshwater remote sensing. Additionally, the investigation was extended to estimate OAPs and nOAPs in lakes of Wuhan through manual sampling measurements, thereby bridging theoretical insights with real-world applications. Employing four ML algorithms, the SML-based models for each WQP were calibrated using in situ datasets with different spatio-temporal distributions. The results demonstrated that precision decreased with increasing time gaps, whereas most nOAPs (COD, TP, TN, pH, and DO) showed greater robustness to the time gap than the OAPs (turbidity, Secchi depth, Chl-a, and algae density). The mean absolute percentage errors (MAPEs) of these nOAPs were as follows: for all models, pH MAPEs 
ISSN:1569-8432
1872-826X
DOI:10.1016/j.jag.2024.103959