Probing the limit of hydrologic predictability with the Transformer network

•First time Transformer achieved the same performance as LSTM on CAMELS dataset.•Unmodified vanilla Transformer could not reach LSTM's performance.•A non-recurrent connection was added to the Transformer, supporting parallelism.•Transformers may have scale advantages for larger datasets.•LSTMs...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of hydrology (Amsterdam) 2024-06, Vol.637, p.131389, Article 131389
Hauptverfasser: Liu, Jiangtao, Bian, Yuchen, Lawson, Kathryn, Shen, Chaopeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•First time Transformer achieved the same performance as LSTM on CAMELS dataset.•Unmodified vanilla Transformer could not reach LSTM's performance.•A non-recurrent connection was added to the Transformer, supporting parallelism.•Transformers may have scale advantages for larger datasets.•LSTMs and Transformers are likely nearing the prediction limits of the dataset. For a number of years since their introduction to hydrology, recurrent neural networks like long short-term memory (LSTM) networks have proven remarkably difficult to surpass in terms of daily hydrograph metrics on community-shared benchmarks. Outside of hydrology, Transformers have now become the model of choice for sequential prediction tasks, making it a curious architecture to investigate for application to hydrology. Here, we first show that a vanilla (basic) Transformer architecture is not competitive against LSTM on the widely benchmarked CAMELS streamflow dataset, and lagged especially prominently for the high-flow metrics, perhaps due to the lack of memory mechanisms. However, a recurrence-free variant of the Transformer model obtained mixed comparisons with LSTM, producing very slightly higher Kling-Gupta efficiency coefficients (KGE), along with other metrics. The lack of advantages for the vanilla Transformer network is linked to the nature of hydrologic processes. Additionally, similar to LSTM, the Transformer can also merge multiple meteorological forcing datasets to improve model performance. Therefore, the modified Transformer represents a rare competitive architecture to LSTM in rigorous benchmarks. Valuable lessons were learned: (1) the basic Transformer architecture is not suitable for hydrologic modeling; (2) the recurrence-free modification is beneficial, so future work should continue to test such modifications; and (3) the performance of state-of-the-art models may be close to the prediction limits of the dataset. As a non-recurrent model, the Transformer may bear scale advantages for learning from bigger datasets and storing knowledge. This work lays the groundwork for future explorations into pretraining models, serving as a foundational benchmark that underscores the potential benefits in hydrology.
ISSN:0022-1694
DOI:10.1016/j.jhydrol.2024.131389