Predictive modelling benchmark of nitrate Vulnerable Zones at a regional scale based on Machine learning and remote sensing
•Predictive modelling benchmark of Nitrate Vulnerable Zones using Random Forest.•Extrinsic driving forces to groundwater were selected as environmental predictors.•Phenological features derived from remote sensing were included as novel features.•Feature selection methods revealed good performance p...
Gespeichert in:
Veröffentlicht in: | Journal of hydrology (Amsterdam) 2021-12, Vol.603, p.127092, Article 127092 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •Predictive modelling benchmark of Nitrate Vulnerable Zones using Random Forest.•Extrinsic driving forces to groundwater were selected as environmental predictors.•Phenological features derived from remote sensing were included as novel features.•Feature selection methods revealed good performance predicting nitrate pollution.•Phenology and manure production from livestock farms as most important features.
Nitrate leaching losses from arable lands into groundwater were a main driver in designating Nitrate Vulnerable Zones (NVZs) according to the Nitrates Directive, with a view to enhancing their water quality. Despite this, developing common strategies for effective water quality control in these areas remains a challenge in the European Union. This paper evaluates the performance of the Random Forest (RF) machine learning algorithm combined with Feature Selection (FS) techniques in predicting nitrate pollution in NVZs groundwater bodies in different periods and using updated environmental features in Andalusia, Spain. A set of forty-four features extrinsic to groundwater bodies were used as environmental predictors, with an aim to make this methodology exportable to other regions. Phenological features obtained through remote-sensing techniques were included to measure the dynamics of agricultural activity. In addition, other dynamic features derived from weather and livestock effluents were included to analyse seasonal and interannual changes in nitrate pollution. Three feature stacks and two nitrate databases were used in the predictive modelling: Period 1 (2009), with 321 nitrate samples for training; Period 2 (2010), with 282 nitrate samples for validation and initial spatial prediction; and Period 3 (2017), to assess the changes in the probability of groundwater nitrate content exceeding 50 mg/L. Random Forest as a wrapper with four sequential search methods was considered: sequential backward selection (SBS), sequential forward selection (SFS), sequential forward floating selection (SFFS) and sequential backward floating selection (SBFS). From among all the Feature Selection methods applied, Random Forest with SFS had the best performance (overall accuracy = 0.891 and six predictor features) and linked the highest probability of nitrate pollution with three dynamic features: the Normalized Difference Vegetation Index (NDVI) base level, NDVI value for the end of the growing season and accumulated manure production of livestock farms; and three static |
---|---|
ISSN: | 0022-1694 1879-2707 |
DOI: | 10.1016/j.jhydrol.2021.127092 |