Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models

Predictive performance is important to many applications of species distribution models (SDMs). The SDM ‘ensemble’ approach, which combines predictions across different modelling methods, is believed to improve predictive performance, and is used in many recent SDM studies. Here, we aim to compare t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Ecography (Copenhagen) 2020-04, Vol.43 (4), p.549-558
Hauptverfasser:	Hao, Tianxiao, Elith, Jane, Lahoz‐Monfort, José J., Guillera‐Arroita, Gurutzeta
Format:	Artikel
Sprache:	eng
Schlagworte:	BIOMOD block cross-validation Calibration consensus forecast Datasets Geographical distribution model performance Model testing model tuning Modelling Performance prediction Regression analysis Slicing spatial autocorrelation spatial blocking Species
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	558
container_issue	4
container_start_page	549
container_title	Ecography (Copenhagen)
container_volume	43
creator	Hao, Tianxiao Elith, Jane Lahoz‐Monfort, José J. Guillera‐Arroita, Gurutzeta
description	Predictive performance is important to many applications of species distribution models (SDMs). The SDM ‘ensemble’ approach, which combines predictions across different modelling methods, is believed to improve predictive performance, and is used in many recent SDM studies. Here, we aim to compare the predictive performance of ensemble species distribution models to that of individual models, using a large presence–absence dataset of eucalypt tree species. To test model performance, we divided our dataset into calibration and evaluation folds using two spatial blocking strategies (checkerboard‐pattern and latitudinal slicing). We calibrated and cross‐validated all models within the calibration folds, using both repeated random division of data (a common approach) and spatial blocking. Ensembles were built using the software package ‘biomod2’, with standard (‘untuned’) settings. Boosted regression tree (BRT) models were also fitted to the same data, tuned according to published procedures. We then used evaluation folds to compare ensembles against both their component untuned individual models, and against the BRTs. We used area under the receiver‐operating characteristic curve (AUC) and log‐likelihood for assessing model performance. In all our tests, ensemble models performed well, but not consistently better than their component untuned individual models or tuned BRTs across all tests. Moreover, choosing untuned individual models with best cross‐validation performance also yielded good external performance, with blocked cross‐validation proving better suited for this choice, in this study, than repeated random cross‐validation. The latitudinal slice test was only possible for four species; this showed some individual models, and particularly the tuned one, performing better than ensembles. This study shows no particular benefit to using ensembles over individual tuned models. It also suggests that further robust testing of performance is required for situations where models are used to predict to distant places or environments.
doi_str_mv	10.1111/ecog.04890
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2384847339</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2384847339</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3373-2bb1f4982473f207fef400957b122b75f5f305dcbb729a86ed7c6d37094ca7763</originalsourceid><addsrcrecordid>eNp9kD1PwzAQhi0EEqWw8AsssSEFznYSxyOqyodUqUuZI8c5F6MkDnba0n9PQpm55Yb30Xu6h5BbBg9snEc0fvsAaaHgjMxYDpBAVshzMgMFeSIzBZfkKsZPAMZVXszIYYNxcN2WHj5w-MBAsYvYVg3S1tfYNFPkItX1XneD3qLfRWp9oK3-dq2LU9wHrJ0Z3B5pj2EMW90ZpN7S2KNxGGnt4hBctRuc70698ZpcWN1EvPnbc_L-vNwsXpPV-uVt8bRKjBBSJLyqmE1VwVMpLAdp0aYAKpMV47ySmc2sgKw2VSW50kWOtTR5LSSo1GgpczEnd6fePviv3fhr-el3oRtPllwUaTH2CjVS9yfKBB9jQFv2wbU6HEsG5SS2nMSWv2JHmJ3gg2vw-A9ZLhfrF8azVIgfys59lQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2384847339</pqid></control><display><type>article</type><title>Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models</title><source>Wiley Online Library Open Access</source><source>DOAJ Directory of Open Access Journals</source><source>Wiley Online Library Journals Frontfile Complete</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Hao, Tianxiao ; Elith, Jane ; Lahoz‐Monfort, José J. ; Guillera‐Arroita, Gurutzeta</creator><creatorcontrib>Hao, Tianxiao ; Elith, Jane ; Lahoz‐Monfort, José J. ; Guillera‐Arroita, Gurutzeta</creatorcontrib><description>Predictive performance is important to many applications of species distribution models (SDMs). The SDM ‘ensemble’ approach, which combines predictions across different modelling methods, is believed to improve predictive performance, and is used in many recent SDM studies. Here, we aim to compare the predictive performance of ensemble species distribution models to that of individual models, using a large presence–absence dataset of eucalypt tree species. To test model performance, we divided our dataset into calibration and evaluation folds using two spatial blocking strategies (checkerboard‐pattern and latitudinal slicing). We calibrated and cross‐validated all models within the calibration folds, using both repeated random division of data (a common approach) and spatial blocking. Ensembles were built using the software package ‘biomod2’, with standard (‘untuned’) settings. Boosted regression tree (BRT) models were also fitted to the same data, tuned according to published procedures. We then used evaluation folds to compare ensembles against both their component untuned individual models, and against the BRTs. We used area under the receiver‐operating characteristic curve (AUC) and log‐likelihood for assessing model performance. In all our tests, ensemble models performed well, but not consistently better than their component untuned individual models or tuned BRTs across all tests. Moreover, choosing untuned individual models with best cross‐validation performance also yielded good external performance, with blocked cross‐validation proving better suited for this choice, in this study, than repeated random cross‐validation. The latitudinal slice test was only possible for four species; this showed some individual models, and particularly the tuned one, performing better than ensembles. This study shows no particular benefit to using ensembles over individual tuned models. It also suggests that further robust testing of performance is required for situations where models are used to predict to distant places or environments.</description><identifier>ISSN: 0906-7590</identifier><identifier>EISSN: 1600-0587</identifier><identifier>DOI: 10.1111/ecog.04890</identifier><language>eng</language><publisher>Oxford, UK: Blackwell Publishing Ltd</publisher><subject>BIOMOD ; block cross-validation ; Calibration ; consensus forecast ; Datasets ; Geographical distribution ; model performance ; Model testing ; model tuning ; Modelling ; Performance prediction ; Regression analysis ; Slicing ; spatial autocorrelation ; spatial blocking ; Species</subject><ispartof>Ecography (Copenhagen), 2020-04, Vol.43 (4), p.549-558</ispartof><rights>2020 The Authors. Ecography published by John Wiley & Sons Ltd on behalf of Nordic Society Oikos</rights><rights>Ecography © 2020 Nordic Society Oikos</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3373-2bb1f4982473f207fef400957b122b75f5f305dcbb729a86ed7c6d37094ca7763</citedby><cites>FETCH-LOGICAL-c3373-2bb1f4982473f207fef400957b122b75f5f305dcbb729a86ed7c6d37094ca7763</cites><orcidid>0000-0002-8387-5739 ; 0000-0002-8706-0326 ; 0000-0003-4363-1956 ; 0000-0002-0845-7035</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fecog.04890$$EPDF$$P50$$Gwiley$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fecog.04890$$EHTML$$P50$$Gwiley$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,1411,11541,27901,27902,45550,45551,46027,46451</link.rule.ids></links><search><creatorcontrib>Hao, Tianxiao</creatorcontrib><creatorcontrib>Elith, Jane</creatorcontrib><creatorcontrib>Lahoz‐Monfort, José J.</creatorcontrib><creatorcontrib>Guillera‐Arroita, Gurutzeta</creatorcontrib><title>Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models</title><title>Ecography (Copenhagen)</title><description>Predictive performance is important to many applications of species distribution models (SDMs). The SDM ‘ensemble’ approach, which combines predictions across different modelling methods, is believed to improve predictive performance, and is used in many recent SDM studies. Here, we aim to compare the predictive performance of ensemble species distribution models to that of individual models, using a large presence–absence dataset of eucalypt tree species. To test model performance, we divided our dataset into calibration and evaluation folds using two spatial blocking strategies (checkerboard‐pattern and latitudinal slicing). We calibrated and cross‐validated all models within the calibration folds, using both repeated random division of data (a common approach) and spatial blocking. Ensembles were built using the software package ‘biomod2’, with standard (‘untuned’) settings. Boosted regression tree (BRT) models were also fitted to the same data, tuned according to published procedures. We then used evaluation folds to compare ensembles against both their component untuned individual models, and against the BRTs. We used area under the receiver‐operating characteristic curve (AUC) and log‐likelihood for assessing model performance. In all our tests, ensemble models performed well, but not consistently better than their component untuned individual models or tuned BRTs across all tests. Moreover, choosing untuned individual models with best cross‐validation performance also yielded good external performance, with blocked cross‐validation proving better suited for this choice, in this study, than repeated random cross‐validation. The latitudinal slice test was only possible for four species; this showed some individual models, and particularly the tuned one, performing better than ensembles. This study shows no particular benefit to using ensembles over individual tuned models. It also suggests that further robust testing of performance is required for situations where models are used to predict to distant places or environments.</description><subject>BIOMOD</subject><subject>block cross-validation</subject><subject>Calibration</subject><subject>consensus forecast</subject><subject>Datasets</subject><subject>Geographical distribution</subject><subject>model performance</subject><subject>Model testing</subject><subject>model tuning</subject><subject>Modelling</subject><subject>Performance prediction</subject><subject>Regression analysis</subject><subject>Slicing</subject><subject>spatial autocorrelation</subject><subject>spatial blocking</subject><subject>Species</subject><issn>0906-7590</issn><issn>1600-0587</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>24P</sourceid><recordid>eNp9kD1PwzAQhi0EEqWw8AsssSEFznYSxyOqyodUqUuZI8c5F6MkDnba0n9PQpm55Yb30Xu6h5BbBg9snEc0fvsAaaHgjMxYDpBAVshzMgMFeSIzBZfkKsZPAMZVXszIYYNxcN2WHj5w-MBAsYvYVg3S1tfYNFPkItX1XneD3qLfRWp9oK3-dq2LU9wHrJ0Z3B5pj2EMW90ZpN7S2KNxGGnt4hBctRuc70698ZpcWN1EvPnbc_L-vNwsXpPV-uVt8bRKjBBSJLyqmE1VwVMpLAdp0aYAKpMV47ySmc2sgKw2VSW50kWOtTR5LSSo1GgpczEnd6fePviv3fhr-el3oRtPllwUaTH2CjVS9yfKBB9jQFv2wbU6HEsG5SS2nMSWv2JHmJ3gg2vw-A9ZLhfrF8azVIgfys59lQ</recordid><startdate>202004</startdate><enddate>202004</enddate><creator>Hao, Tianxiao</creator><creator>Elith, Jane</creator><creator>Lahoz‐Monfort, José J.</creator><creator>Guillera‐Arroita, Gurutzeta</creator><general>Blackwell Publishing Ltd</general><general>John Wiley & Sons, Inc</general><scope>24P</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SN</scope><scope>7SS</scope><scope>C1K</scope><orcidid>https://orcid.org/0000-0002-8387-5739</orcidid><orcidid>https://orcid.org/0000-0002-8706-0326</orcidid><orcidid>https://orcid.org/0000-0003-4363-1956</orcidid><orcidid>https://orcid.org/0000-0002-0845-7035</orcidid></search><sort><creationdate>202004</creationdate><title>Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models</title><author>Hao, Tianxiao ; Elith, Jane ; Lahoz‐Monfort, José J. ; Guillera‐Arroita, Gurutzeta</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3373-2bb1f4982473f207fef400957b122b75f5f305dcbb729a86ed7c6d37094ca7763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>BIOMOD</topic><topic>block cross-validation</topic><topic>Calibration</topic><topic>consensus forecast</topic><topic>Datasets</topic><topic>Geographical distribution</topic><topic>model performance</topic><topic>Model testing</topic><topic>model tuning</topic><topic>Modelling</topic><topic>Performance prediction</topic><topic>Regression analysis</topic><topic>Slicing</topic><topic>spatial autocorrelation</topic><topic>spatial blocking</topic><topic>Species</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hao, Tianxiao</creatorcontrib><creatorcontrib>Elith, Jane</creatorcontrib><creatorcontrib>Lahoz‐Monfort, José J.</creatorcontrib><creatorcontrib>Guillera‐Arroita, Gurutzeta</creatorcontrib><collection>Wiley Online Library Open Access</collection><collection>CrossRef</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Environmental Sciences and Pollution Management</collection><jtitle>Ecography (Copenhagen)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hao, Tianxiao</au><au>Elith, Jane</au><au>Lahoz‐Monfort, José J.</au><au>Guillera‐Arroita, Gurutzeta</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models</atitle><jtitle>Ecography (Copenhagen)</jtitle><date>2020-04</date><risdate>2020</risdate><volume>43</volume><issue>4</issue><spage>549</spage><epage>558</epage><pages>549-558</pages><issn>0906-7590</issn><eissn>1600-0587</eissn><abstract>Predictive performance is important to many applications of species distribution models (SDMs). The SDM ‘ensemble’ approach, which combines predictions across different modelling methods, is believed to improve predictive performance, and is used in many recent SDM studies. Here, we aim to compare the predictive performance of ensemble species distribution models to that of individual models, using a large presence–absence dataset of eucalypt tree species. To test model performance, we divided our dataset into calibration and evaluation folds using two spatial blocking strategies (checkerboard‐pattern and latitudinal slicing). We calibrated and cross‐validated all models within the calibration folds, using both repeated random division of data (a common approach) and spatial blocking. Ensembles were built using the software package ‘biomod2’, with standard (‘untuned’) settings. Boosted regression tree (BRT) models were also fitted to the same data, tuned according to published procedures. We then used evaluation folds to compare ensembles against both their component untuned individual models, and against the BRTs. We used area under the receiver‐operating characteristic curve (AUC) and log‐likelihood for assessing model performance. In all our tests, ensemble models performed well, but not consistently better than their component untuned individual models or tuned BRTs across all tests. Moreover, choosing untuned individual models with best cross‐validation performance also yielded good external performance, with blocked cross‐validation proving better suited for this choice, in this study, than repeated random cross‐validation. The latitudinal slice test was only possible for four species; this showed some individual models, and particularly the tuned one, performing better than ensembles. This study shows no particular benefit to using ensembles over individual tuned models. It also suggests that further robust testing of performance is required for situations where models are used to predict to distant places or environments.</abstract><cop>Oxford, UK</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/ecog.04890</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-8387-5739</orcidid><orcidid>https://orcid.org/0000-0002-8706-0326</orcidid><orcidid>https://orcid.org/0000-0003-4363-1956</orcidid><orcidid>https://orcid.org/0000-0002-0845-7035</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0906-7590
ispartof	Ecography (Copenhagen), 2020-04, Vol.43 (4), p.549-558
issn	0906-7590 1600-0587
language	eng
recordid	cdi_proquest_journals_2384847339
source	Wiley Online Library Open Access; DOAJ Directory of Open Access Journals; Wiley Online Library Journals Frontfile Complete; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	BIOMOD block cross-validation Calibration consensus forecast Datasets Geographical distribution model performance Model testing model tuning Modelling Performance prediction Regression analysis Slicing spatial autocorrelation spatial blocking Species
title	Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-16T07%3A32%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Testing%20whether%20ensemble%20modelling%20is%20advantageous%20for%20maximising%20predictive%20performance%20of%20species%20distribution%20models&rft.jtitle=Ecography%20(Copenhagen)&rft.au=Hao,%20Tianxiao&rft.date=2020-04&rft.volume=43&rft.issue=4&rft.spage=549&rft.epage=558&rft.pages=549-558&rft.issn=0906-7590&rft.eissn=1600-0587&rft_id=info:doi/10.1111/ecog.04890&rft_dat=%3Cproquest_cross%3E2384847339%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2384847339&rft_id=info:pmid/&rfr_iscdi=true