Estimation of missing Ellenberg Indicator Values for tree species in South-eastern Europe: a comparison of methods

•Estimating missing Ellenberg Indicator Values (EIV) could help plant ecology studies.•We tested and compared several methods for estimating missing EIV from existing data.•Multiple Linear Regression and k-Nearest Neighbour performed better than the others.•Statistical methods are more effective tha...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Ecological indicators 2024-03, Vol.160, p.111851, Article 111851
Hauptverfasser:	Leccese, Letizia, Fanelli, Giuliano, Cambria, Vito Emanuele, Massimi, Marco, Attorre, Fabio, Alfò, Marco, Aćić, Svetlana, Bergmeier, Erwin, Čarni, Andraž, Cuk, Mirjana, Custerevska, Renata, Dimopoulos, Panayotis, Hoda, Petrit, Mullaj, Alfred, Šilc, Urban, Skvorc, Zeljko, Stancic, Zvjezdana, Dajic Stevanovic, Zora, Tzonev, Rossen, Vassilev, Kiril, Malatesta, Luca, De Sanctis, Michele
Format:	Artikel
Sprache:	eng
Schlagworte:	Biodiversity informatics Bioindication Missing values Plant indicators Vegetation databases Vegetation ecology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•Estimating missing Ellenberg Indicator Values (EIV) could help plant ecology studies.•We tested and compared several methods for estimating missing EIV from existing data.•Multiple Linear Regression and k-Nearest Neighbour performed better than the others.•Statistical methods are more effective than imputation based on expert knowledge.•This approach would greatly facilitate monitoring species with unknown EIV. Ellenberg indicator values (EIV) are widely used in vegetation ecology, but the values for many species in Southeastern Europe are not available due to incomplete knowledge of their ecology: it is therefore of paramount importance to estimate missing values in existing databases. The entire EIV set for a single species can be missing or a single EIV can be missing for species for which other indicator values are available. Our aim here is to provide a simple method to impute missing values for species who have missing data in a single or multiple EIV. For this purpose, we adopt a multiple imputation procedure and compare a number of imputation methods on the basis of two datasets: i) “indices”, the set of 9 Ellenberg indicators taken from literature, available for 10,824 species and ii) “vegetation”, a set describing the physical and climatic characteristics (Light, Temperature, Continentality, Soil moisture, Nitrogen, Soil pH, Hemeroby index, Humidity, Organic_matter) of 29,935 relevés from Southeastern Europe where at least one tree species is present. The imputation methods we considered are: k-Nearest Neighbour, multiple linear regression (with or without collinearity correction), Reprediction Algorithm, Weighted Averaging (WA) and Weighted Averaging Partial Least Squares (WAPLS) regression. The different methods of imputation were compared by looking at the output produced and its deviation from the “true” observed values for a set of species with known EIVs. We have considered a set of species with known EIVs and proceeded to multiple imputation using the methods above; as a measure of performance we adopted the mean squared error (MSE) estimate, and expert judgement of ecological consistency. Models based on Regression and k-Nearest Neighbour seem to outperform the others. On the contrary, Reprediction algorithm in its different forms: produced less satisfactory results. Imputation of missing values is generally based on expert knowledge or on some variant of weighted averaging (also known as Hill’s method). Here we show that other methods m
ISSN:	1470-160X 1872-7034
DOI:	10.1016/j.ecolind.2024.111851