Impact of deep learning-based dropout on shallow neural networks applied to stream temperature modelling
Although deep learning applicability in various fields of earth sciences is rapidly increasing, shallow multilayer-perceptron neural networks remain widely used for regression problems. Despite many clear distinctions between deep and shallow neural networks, some techniques developed for deep learn...
Gespeichert in:
Veröffentlicht in: | Earth-science reviews 2020-02, Vol.201, p.103076, Article 103076 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Although deep learning applicability in various fields of earth sciences is rapidly increasing, shallow multilayer-perceptron neural networks remain widely used for regression problems. Despite many clear distinctions between deep and shallow neural networks, some techniques developed for deep learning may help improve shallow models. Dropout, a simple approach to avoid overfitting by randomly skipping some nodes in a net during each training iteration, is among methodological features that made deep learning networks successful. In this study we give a review of dropout methods and empirically show that, when used together with early-stopping, dropout and its variant dropconnect could improve performance of shallow multi-layer perceptron neural networks. Shallow neural networks are applied to streamwater temperature modelling problems in six catchments, based on air temperature, river discharge and declination of the Sun. We found that when training of a particular neural network architecture that includes at least a few hidden nodes is repeated many times, dropout reduces the number of models that perform poorly on testing data, and hence improves the mean performance. If the number of inputs or hidden nodes is very low, dropout only disturbs training. However, nodes need to be dropped out with a much lower probability than in the case of deep neural networks (about 1%, instead of 10–50% for deep learning), due to a much smaller number of nodes in the network. Larger probabilities of dropping out nodes hinder convergence of the training algorithm and lead to poor results for both calibration and testing data. Dropconnect turned out to be slightly more effective than dropout. |
---|---|
ISSN: | 0012-8252 1872-6828 |
DOI: | 10.1016/j.earscirev.2019.103076 |