Correcting for bias in distribution modelling for rare species using citizen science data

Aim: To improve the accuracy of inferences on habitat associations and distribution patterns of rare species by combining machine-learning, spatial filtering and resampling to address class imbalance and spatial bias of large volumes of citizen science data. Innovation: Modelling rare species'...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Diversity & distributions 2018-04, Vol.24 (3/4), p.460-472
Hauptverfasser: Robinson, Orin J., Ruiz-Gutierrez, Viviana, Fink, Daniel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Aim: To improve the accuracy of inferences on habitat associations and distribution patterns of rare species by combining machine-learning, spatial filtering and resampling to address class imbalance and spatial bias of large volumes of citizen science data. Innovation: Modelling rare species' distributions is a pressing challenge for conservation and applied research. Often, a large number of surveys are required before enough detections occur to model distributions of rare species accurately, resulting in a data set with a high proportion of non-detections (i.e. class imbalance). Citizen science data can provide a cost-effective source of surveys but likely suffer from class imbalance. Citizen science data also suffer from spatial bias, likely from preferential sampling. To correct for class imbalance and spatial bias, we used spatial filtering to under-sample the majority class (non-detection) while maintaining all of the limited information from the minority class (detection). We investigated the use of spatial under-sampling with randomForest models and compared it to common approaches used for imbalanced data, the synthetic minority oversampling technique (SMOTE), weighted random forest and balanced random forest models. Model accuracy was assessed using kappa, Brier score and AUC. We demonstrate the method by evaluating habitat associations and seasonal distribution patterns using citizen science data for a rare species, the tricoloured blackbird (Agelaius tricolor). Main Conclusions: Spatial under-sampling increased the accuracy of each model and outperformed the approach typically used to direct under-sampling in the SMOTE algorithm. Our approach is the first to characterize winter distribution and movement of tricoloured blackbirds. Our results show that tricoloured blackbirds are positively associated with grassland, pasture and wetland habitats, and negatively associated with high elevations or evergreen forests during both winter and breeding seasons. The seasonal differences in distribution indicate that individuals move to the coast during the winter, as suggested by historical accounts.
ISSN:1366-9516
1472-4642
DOI:10.1111/ddi.12698