Synthetic rainfall data generator development through decentralised model training

Recent heavy rainfall-induced flood events, for example in Germany, Australia and USA, have highlighted the relevance of countermeasures in saving human lives and preventing property damage. Newly introduced ML-based flood forecasting methods rely on high-intensity synthetic rainfall events due to t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of hydrology (Amsterdam) 2022-09, Vol.612, p.128210, Article 128210
Hauptverfasser:	Welten, Sascha, Holt, Adrian, Hofmann, Julian, Schelter, Lennart, Klopries, Elena-Maria, Wintgens, Thomas, Decker, Stefan
Format:	Artikel
Sprache:	eng
Schlagworte:	Data generation Distributed analytics Distributed data Hydrological data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent heavy rainfall-induced flood events, for example in Germany, Australia and USA, have highlighted the relevance of countermeasures in saving human lives and preventing property damage. Newly introduced ML-based flood forecasting methods rely on high-intensity synthetic rainfall events due to the sparsity of their real counterpart. Such synthetic data instances can be produced by precipitation generators trained in an adversarial setting on historical rainfall data. Capturing processes for rainfall data are often highly distributed, with multiple radar stations contributing to a centralised data set. However, data centralisation entails challenges regarding data-stream logistics, data locality, and memory overhead. Distributed Analytics (DA) aims to overcome these challenges through decentralised model training by bringing the algorithm to the data instead of vice versa. In this work, we propose a feasibility study evaluating the applicability of DA on hydrological data. As example of use, we choose the decentralised training of rainfall data generators. We introduce a rainfall generator training procedure relying on Generative Adversarial Networks (GANs) and evaluate two DA algorithms: Federated Learning (FL) and Cyclic Institutional Incremental Learning (CIIL). We compare the resulting training outcomes with the centralised model training (CL) approach and find CIIL performed similarly to CL but less stable, while FL outperformed CL by 7.5%. We conclude that the proven feasibility of FL in our simulated distributed setting lays the groundwork for utilising this approach in realistic environments of grander scale while overcoming potential privacy concerns or logistical challenges in the setting of centralised analytics. •Innovative and novel approach for data model training in the domain of hydrology.•Decentralised Model Training avoiding centralisation of mass of distributed data.•Comparison between two decentralised approaches with the centralised model training.•Decentral training creates competitive models and poses and valuable option.
ISSN:	0022-1694 1879-2707
DOI:	10.1016/j.jhydrol.2022.128210