An evaluation framework for input variable selection algorithms for environmental data-driven models

Input Variable Selection (IVS) is an essential step in the development of data-driven models and is particularly relevant in environmental modelling. While new methods for identifying important model inputs continue to emerge, each has its own advantages and limitations and no single method is best...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environmental modelling & software : with environment data news 2014-12, Vol.62, p.33-51
Hauptverfasser: Galelli, Stefano, Humphrey, Greer B., Maier, Holger R., Castelletti, Andrea, Dandy, Graeme C., Gibbs, Matthew S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Input Variable Selection (IVS) is an essential step in the development of data-driven models and is particularly relevant in environmental modelling. While new methods for identifying important model inputs continue to emerge, each has its own advantages and limitations and no single method is best suited to all datasets and modelling purposes. Rigorous evaluation of new and existing input variable selection methods would allow the effectiveness of these algorithms to be properly identified in various circumstances. However, such evaluations are largely neglected due to the lack of guidelines or precedent to facilitate consistent and standardised assessment. In this paper, a new framework is proposed for the evaluation and inter-comparison of IVS methods which takes into account: (1) a wide range of dataset properties that are relevant to real world environmental data, (2) assessment criteria selected to highlight algorithm suitability in different situations of interest, and (3) a website for sharing data, algorithms and results (http://ivs4em.deib.polimi.it/). The framework is demonstrated on four IVS algorithms commonly used in environmental modelling studies and twenty-six datasets exhibiting different typical properties of environmental data. The main aim at this stage is to demonstrate the application of the proposed evaluation framework, rather than provide a definitive answer as to which of these algorithms has the best overall performance. Nevertheless, the results indicate interesting differences in the algorithms' performance that have not been identified previously. •A framework for the evaluation of input variable selection algorithms is proposed.•The framework consists of assessment criteria and twenty-six datasets.•The framework is supported by a dedicated website (http://ivs4em.deib.polimi.it).•Four popular IVS algorithms are considered for evaluation purposes.
ISSN:1364-8152
1873-6726
DOI:10.1016/j.envsoft.2014.08.015