Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models
Predictive performance is important to many applications of species distribution models (SDMs). The SDM ‘ensemble’ approach, which combines predictions across different modelling methods, is believed to improve predictive performance, and is used in many recent SDM studies. Here, we aim to compare t...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Hao, Tianxiao Elith, Jane Lahoz‐Monfort, José J. Guillera‐Arroita, Gurutzeta |
description | Predictive performance is important to many applications of species
distribution models (SDMs). The SDM ‘ensemble’ approach, which combines
predictions across different modelling methods, is believed to improve
predictive performance, and is used in many recent SDM studies. Here, we
aim to compare the predictive performance of ensemble species distribution
models to that of individual models, using a large presence-absence
dataset of eucalypt tree species. To test model performance, we divided
our dataset into calibration and evaluation folds using two spatial
blocking strategies (checkerboard-pattern and latitudinal slicing). We
calibrated and cross-validated all models within the calibration folds,
using both repeated random division of data (a common approach) and
spatial blocking. Ensembles were built using the software package
‘biomod2’, with standard (“untuned”) settings. Boosted regression tree
(BRT) models were also fitted to the same data, tuned according to
published procedures. We then used evaluation folds to compare ensembles
against both their component untuned individual models, and against the
BRTs. We used area under the receiver-operating characteristic curve (AUC)
and log-likelihood for assessing model performance. In all our tests,
ensemble models performed well, but not consistently better than their
component untuned individual models or tuned BRTs across all tests.
Moreover, choosing untuned individual models with best cross-validation
performance also yielded good external performance, with blocked
cross-validation proving better suited for this choice, in this study,
than repeated random cross-validation. The latitudinal slice test was only
possible for four species; this showed some individual models, and
particularly the tuned one, performing better than ensembles. This study
shows no particular benefit to using ensembles over individual tuned
models. It also suggests that further robust testing of performance is
required for situations where models are used to predict to distant places
or environments. |
doi_str_mv | 10.5061/dryad.tqjq2bvv2 |
format | Dataset |
fullrecord | <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_5061_dryad_tqjq2bvv2</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_5061_dryad_tqjq2bvv2</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_5061_dryad_tqjq2bvv23</originalsourceid><addsrcrecordid>eNqVjkESwUAQRWdjobC27QsgobiAohzAfqqT6dAqM5NMt-D2BGVv9Rev_qtnzDTP5utsky9ceqCba3tpl0XXLYfmdiRRDie4nUnPlICCkC9qAh8d1XWPWABdh0HxRPEqUMUEHu_sWXrcJHJcKncEDaUX9BhKgliBNFQyCTgWTVxclWP4eGVsBhXWQpPvjsxivztuDzOHiiUr2Saxx_SweWb7dPtOt7_01f-PJ2TTW7U</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models</title><source>DataCite</source><creator>Hao, Tianxiao ; Elith, Jane ; Lahoz‐Monfort, José J. ; Guillera‐Arroita, Gurutzeta</creator><creatorcontrib>Hao, Tianxiao ; Elith, Jane ; Lahoz‐Monfort, José J. ; Guillera‐Arroita, Gurutzeta</creatorcontrib><description>Predictive performance is important to many applications of species
distribution models (SDMs). The SDM ‘ensemble’ approach, which combines
predictions across different modelling methods, is believed to improve
predictive performance, and is used in many recent SDM studies. Here, we
aim to compare the predictive performance of ensemble species distribution
models to that of individual models, using a large presence-absence
dataset of eucalypt tree species. To test model performance, we divided
our dataset into calibration and evaluation folds using two spatial
blocking strategies (checkerboard-pattern and latitudinal slicing). We
calibrated and cross-validated all models within the calibration folds,
using both repeated random division of data (a common approach) and
spatial blocking. Ensembles were built using the software package
‘biomod2’, with standard (“untuned”) settings. Boosted regression tree
(BRT) models were also fitted to the same data, tuned according to
published procedures. We then used evaluation folds to compare ensembles
against both their component untuned individual models, and against the
BRTs. We used area under the receiver-operating characteristic curve (AUC)
and log-likelihood for assessing model performance. In all our tests,
ensemble models performed well, but not consistently better than their
component untuned individual models or tuned BRTs across all tests.
Moreover, choosing untuned individual models with best cross-validation
performance also yielded good external performance, with blocked
cross-validation proving better suited for this choice, in this study,
than repeated random cross-validation. The latitudinal slice test was only
possible for four species; this showed some individual models, and
particularly the tuned one, performing better than ensembles. This study
shows no particular benefit to using ensembles over individual tuned
models. It also suggests that further robust testing of performance is
required for situations where models are used to predict to distant places
or environments.</description><identifier>DOI: 10.5061/dryad.tqjq2bvv2</identifier><language>eng</language><publisher>Dryad</publisher><subject>BIOMOD ; consensus forecast ; model performance ; model tuning ; spatial blocking</subject><creationdate>2020</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-8387-5739 ; 0000-0002-8706-0326 ; 0000-0002-0845-7035 ; 0000-0003-4363-1956</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1894</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.5061/dryad.tqjq2bvv2$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Hao, Tianxiao</creatorcontrib><creatorcontrib>Elith, Jane</creatorcontrib><creatorcontrib>Lahoz‐Monfort, José J.</creatorcontrib><creatorcontrib>Guillera‐Arroita, Gurutzeta</creatorcontrib><title>Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models</title><description>Predictive performance is important to many applications of species
distribution models (SDMs). The SDM ‘ensemble’ approach, which combines
predictions across different modelling methods, is believed to improve
predictive performance, and is used in many recent SDM studies. Here, we
aim to compare the predictive performance of ensemble species distribution
models to that of individual models, using a large presence-absence
dataset of eucalypt tree species. To test model performance, we divided
our dataset into calibration and evaluation folds using two spatial
blocking strategies (checkerboard-pattern and latitudinal slicing). We
calibrated and cross-validated all models within the calibration folds,
using both repeated random division of data (a common approach) and
spatial blocking. Ensembles were built using the software package
‘biomod2’, with standard (“untuned”) settings. Boosted regression tree
(BRT) models were also fitted to the same data, tuned according to
published procedures. We then used evaluation folds to compare ensembles
against both their component untuned individual models, and against the
BRTs. We used area under the receiver-operating characteristic curve (AUC)
and log-likelihood for assessing model performance. In all our tests,
ensemble models performed well, but not consistently better than their
component untuned individual models or tuned BRTs across all tests.
Moreover, choosing untuned individual models with best cross-validation
performance also yielded good external performance, with blocked
cross-validation proving better suited for this choice, in this study,
than repeated random cross-validation. The latitudinal slice test was only
possible for four species; this showed some individual models, and
particularly the tuned one, performing better than ensembles. This study
shows no particular benefit to using ensembles over individual tuned
models. It also suggests that further robust testing of performance is
required for situations where models are used to predict to distant places
or environments.</description><subject>BIOMOD</subject><subject>consensus forecast</subject><subject>model performance</subject><subject>model tuning</subject><subject>spatial blocking</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2020</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNqVjkESwUAQRWdjobC27QsgobiAohzAfqqT6dAqM5NMt-D2BGVv9Rev_qtnzDTP5utsky9ceqCba3tpl0XXLYfmdiRRDie4nUnPlICCkC9qAh8d1XWPWABdh0HxRPEqUMUEHu_sWXrcJHJcKncEDaUX9BhKgliBNFQyCTgWTVxclWP4eGVsBhXWQpPvjsxivztuDzOHiiUr2Saxx_SweWb7dPtOt7_01f-PJ2TTW7U</recordid><startdate>20200312</startdate><enddate>20200312</enddate><creator>Hao, Tianxiao</creator><creator>Elith, Jane</creator><creator>Lahoz‐Monfort, José J.</creator><creator>Guillera‐Arroita, Gurutzeta</creator><general>Dryad</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0002-8387-5739</orcidid><orcidid>https://orcid.org/0000-0002-8706-0326</orcidid><orcidid>https://orcid.org/0000-0002-0845-7035</orcidid><orcidid>https://orcid.org/0000-0003-4363-1956</orcidid></search><sort><creationdate>20200312</creationdate><title>Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models</title><author>Hao, Tianxiao ; Elith, Jane ; Lahoz‐Monfort, José J. ; Guillera‐Arroita, Gurutzeta</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_5061_dryad_tqjq2bvv23</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2020</creationdate><topic>BIOMOD</topic><topic>consensus forecast</topic><topic>model performance</topic><topic>model tuning</topic><topic>spatial blocking</topic><toplevel>online_resources</toplevel><creatorcontrib>Hao, Tianxiao</creatorcontrib><creatorcontrib>Elith, Jane</creatorcontrib><creatorcontrib>Lahoz‐Monfort, José J.</creatorcontrib><creatorcontrib>Guillera‐Arroita, Gurutzeta</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hao, Tianxiao</au><au>Elith, Jane</au><au>Lahoz‐Monfort, José J.</au><au>Guillera‐Arroita, Gurutzeta</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models</title><date>2020-03-12</date><risdate>2020</risdate><abstract>Predictive performance is important to many applications of species
distribution models (SDMs). The SDM ‘ensemble’ approach, which combines
predictions across different modelling methods, is believed to improve
predictive performance, and is used in many recent SDM studies. Here, we
aim to compare the predictive performance of ensemble species distribution
models to that of individual models, using a large presence-absence
dataset of eucalypt tree species. To test model performance, we divided
our dataset into calibration and evaluation folds using two spatial
blocking strategies (checkerboard-pattern and latitudinal slicing). We
calibrated and cross-validated all models within the calibration folds,
using both repeated random division of data (a common approach) and
spatial blocking. Ensembles were built using the software package
‘biomod2’, with standard (“untuned”) settings. Boosted regression tree
(BRT) models were also fitted to the same data, tuned according to
published procedures. We then used evaluation folds to compare ensembles
against both their component untuned individual models, and against the
BRTs. We used area under the receiver-operating characteristic curve (AUC)
and log-likelihood for assessing model performance. In all our tests,
ensemble models performed well, but not consistently better than their
component untuned individual models or tuned BRTs across all tests.
Moreover, choosing untuned individual models with best cross-validation
performance also yielded good external performance, with blocked
cross-validation proving better suited for this choice, in this study,
than repeated random cross-validation. The latitudinal slice test was only
possible for four species; this showed some individual models, and
particularly the tuned one, performing better than ensembles. This study
shows no particular benefit to using ensembles over individual tuned
models. It also suggests that further robust testing of performance is
required for situations where models are used to predict to distant places
or environments.</abstract><pub>Dryad</pub><doi>10.5061/dryad.tqjq2bvv2</doi><orcidid>https://orcid.org/0000-0002-8387-5739</orcidid><orcidid>https://orcid.org/0000-0002-8706-0326</orcidid><orcidid>https://orcid.org/0000-0002-0845-7035</orcidid><orcidid>https://orcid.org/0000-0003-4363-1956</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.5061/dryad.tqjq2bvv2 |
ispartof | |
issn | |
language | eng |
recordid | cdi_datacite_primary_10_5061_dryad_tqjq2bvv2 |
source | DataCite |
subjects | BIOMOD consensus forecast model performance model tuning spatial blocking |
title | Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T02%3A47%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Hao,%20Tianxiao&rft.date=2020-03-12&rft_id=info:doi/10.5061/dryad.tqjq2bvv2&rft_dat=%3Cdatacite_PQ8%3E10_5061_dryad_tqjq2bvv2%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |