Data quantity is more important than its spatial bias for predictive species distribution modelling

Biological records are often the data of choice for training predictive species distribution models (SDMs), but spatial sampling bias is pervasive in biological records data at multiple spatial scales and is thought to impair the performance of SDMs. We simulated presences and absences of virtual sp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PeerJ (San Francisco, CA) CA), 2020-11, Vol.8, p.e10411-e10411, Article e10411
Hauptverfasser: Gaul, Willson, Sadykova, Dinara, White, Hannah J, Leon-Sanchez, Lupe, Caplat, Paul, Emmerson, Mark C, Yearsley, Jon M
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e10411
container_issue
container_start_page e10411
container_title PeerJ (San Francisco, CA)
container_volume 8
creator Gaul, Willson
Sadykova, Dinara
White, Hannah J
Leon-Sanchez, Lupe
Caplat, Paul
Emmerson, Mark C
Yearsley, Jon M
description Biological records are often the data of choice for training predictive species distribution models (SDMs), but spatial sampling bias is pervasive in biological records data at multiple spatial scales and is thought to impair the performance of SDMs. We simulated presences and absences of virtual species as well as the process of recording these species to evaluate the effect on species distribution model prediction performance of (1) spatial bias in training data, (2) sample size (the average number of observations per species), and (3) the choice of species distribution modelling method. Our approach is novel in quantifying and applying real-world spatial sampling biases to simulated data. Spatial bias in training data decreased species distribution model prediction performance, but sample size and the choice of modelling method were more important than spatial bias in determining the prediction performance of species distribution models.
doi_str_mv 10.7717/peerj.10411
format Article
fullrecord <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7703440</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A642951768</galeid><sourcerecordid>A642951768</sourcerecordid><originalsourceid>FETCH-LOGICAL-c507t-75127bd8f7b5e62c3a21b65ac0e4c17afdd7fd31f60e50bd1707607576310ed23</originalsourceid><addsrcrecordid>eNptkk1r3DAQhk1paUKaU-9FUCiFslt92JJ9KYT0EwK9tGchS-PdWWzJkeRA_n2UTZrulkoHiZlnXmmGt6peM7pWiqmPM0DcrRmtGXtWnXIm1aoVTff84H5Snae0o2W1XNJWvKxOhBCMK9mdVvazyYZcL8ZnzLcEE5lCBILTHGIuQZK3xhPMiaTZZDQj6dEkMoRI5ggObcYbKDmwCIk4TDliv2QMvgg5GEf0m1fVi8GMCc4fz7Pq99cvvy6_r65-fvtxeXG1sg1VeaWa8qfetYPqG5DcCsNZLxtjKdSWKTM4pwYn2CApNLR3TFElqWqUFIyC4-Ks-vSgOy_9BM6Cz9GMeo44mXirg0F9nPG41Ztwo5Wioq5pEXj_KBDD9QIp6wmTLU0YD2FJmteKUt52XBb07T_oLizRl_YKJeuOt1Syv9TGjKDRD6G8a-9F9YWsedcwJdtCrf9Dle1gQhs8DFjiRwXvDgq2YMa8TWHcjz0dgx8eQBtDShGGp2Ewqu_9o_f-0Xv_FPrN4fye2D9uEXeFGMDD</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2464928061</pqid></control><display><type>article</type><title>Data quantity is more important than its spatial bias for predictive species distribution modelling</title><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><creator>Gaul, Willson ; Sadykova, Dinara ; White, Hannah J ; Leon-Sanchez, Lupe ; Caplat, Paul ; Emmerson, Mark C ; Yearsley, Jon M</creator><creatorcontrib>Gaul, Willson ; Sadykova, Dinara ; White, Hannah J ; Leon-Sanchez, Lupe ; Caplat, Paul ; Emmerson, Mark C ; Yearsley, Jon M</creatorcontrib><description>Biological records are often the data of choice for training predictive species distribution models (SDMs), but spatial sampling bias is pervasive in biological records data at multiple spatial scales and is thought to impair the performance of SDMs. We simulated presences and absences of virtual species as well as the process of recording these species to evaluate the effect on species distribution model prediction performance of (1) spatial bias in training data, (2) sample size (the average number of observations per species), and (3) the choice of species distribution modelling method. Our approach is novel in quantifying and applying real-world spatial sampling biases to simulated data. Spatial bias in training data decreased species distribution model prediction performance, but sample size and the choice of modelling method were more important than spatial bias in determining the prediction performance of species distribution models.</description><identifier>ISSN: 2167-8359</identifier><identifier>EISSN: 2167-8359</identifier><identifier>DOI: 10.7717/peerj.10411</identifier><identifier>PMID: 33312769</identifier><language>eng</language><publisher>United States: PeerJ. Ltd</publisher><subject>Analysis ; Biogeography ; Datasets ; Ecologists ; Ecology ; Econometrics ; Geographical distribution ; Geospatial data ; Investigations ; Predictions ; Sample size ; Sampling ; Simulation ; Species ; Taxonomy ; Variables ; Zoology</subject><ispartof>PeerJ (San Francisco, CA), 2020-11, Vol.8, p.e10411-e10411, Article e10411</ispartof><rights>2020 Gaul et al.</rights><rights>COPYRIGHT 2020 PeerJ. Ltd.</rights><rights>2020 Gaul et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2020 Gaul et al. 2020 Gaul et al.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c507t-75127bd8f7b5e62c3a21b65ac0e4c17afdd7fd31f60e50bd1707607576310ed23</citedby><cites>FETCH-LOGICAL-c507t-75127bd8f7b5e62c3a21b65ac0e4c17afdd7fd31f60e50bd1707607576310ed23</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703440/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703440/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33312769$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Gaul, Willson</creatorcontrib><creatorcontrib>Sadykova, Dinara</creatorcontrib><creatorcontrib>White, Hannah J</creatorcontrib><creatorcontrib>Leon-Sanchez, Lupe</creatorcontrib><creatorcontrib>Caplat, Paul</creatorcontrib><creatorcontrib>Emmerson, Mark C</creatorcontrib><creatorcontrib>Yearsley, Jon M</creatorcontrib><title>Data quantity is more important than its spatial bias for predictive species distribution modelling</title><title>PeerJ (San Francisco, CA)</title><addtitle>PeerJ</addtitle><description>Biological records are often the data of choice for training predictive species distribution models (SDMs), but spatial sampling bias is pervasive in biological records data at multiple spatial scales and is thought to impair the performance of SDMs. We simulated presences and absences of virtual species as well as the process of recording these species to evaluate the effect on species distribution model prediction performance of (1) spatial bias in training data, (2) sample size (the average number of observations per species), and (3) the choice of species distribution modelling method. Our approach is novel in quantifying and applying real-world spatial sampling biases to simulated data. Spatial bias in training data decreased species distribution model prediction performance, but sample size and the choice of modelling method were more important than spatial bias in determining the prediction performance of species distribution models.</description><subject>Analysis</subject><subject>Biogeography</subject><subject>Datasets</subject><subject>Ecologists</subject><subject>Ecology</subject><subject>Econometrics</subject><subject>Geographical distribution</subject><subject>Geospatial data</subject><subject>Investigations</subject><subject>Predictions</subject><subject>Sample size</subject><subject>Sampling</subject><subject>Simulation</subject><subject>Species</subject><subject>Taxonomy</subject><subject>Variables</subject><subject>Zoology</subject><issn>2167-8359</issn><issn>2167-8359</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNptkk1r3DAQhk1paUKaU-9FUCiFslt92JJ9KYT0EwK9tGchS-PdWWzJkeRA_n2UTZrulkoHiZlnXmmGt6peM7pWiqmPM0DcrRmtGXtWnXIm1aoVTff84H5Snae0o2W1XNJWvKxOhBCMK9mdVvazyYZcL8ZnzLcEE5lCBILTHGIuQZK3xhPMiaTZZDQj6dEkMoRI5ggObcYbKDmwCIk4TDliv2QMvgg5GEf0m1fVi8GMCc4fz7Pq99cvvy6_r65-fvtxeXG1sg1VeaWa8qfetYPqG5DcCsNZLxtjKdSWKTM4pwYn2CApNLR3TFElqWqUFIyC4-Ks-vSgOy_9BM6Cz9GMeo44mXirg0F9nPG41Ztwo5Wioq5pEXj_KBDD9QIp6wmTLU0YD2FJmteKUt52XBb07T_oLizRl_YKJeuOt1Syv9TGjKDRD6G8a-9F9YWsedcwJdtCrf9Dle1gQhs8DFjiRwXvDgq2YMa8TWHcjz0dgx8eQBtDShGGp2Ewqu_9o_f-0Xv_FPrN4fye2D9uEXeFGMDD</recordid><startdate>20201127</startdate><enddate>20201127</enddate><creator>Gaul, Willson</creator><creator>Sadykova, Dinara</creator><creator>White, Hannah J</creator><creator>Leon-Sanchez, Lupe</creator><creator>Caplat, Paul</creator><creator>Emmerson, Mark C</creator><creator>Yearsley, Jon M</creator><general>PeerJ. Ltd</general><general>PeerJ, Inc</general><general>PeerJ Inc</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20201127</creationdate><title>Data quantity is more important than its spatial bias for predictive species distribution modelling</title><author>Gaul, Willson ; Sadykova, Dinara ; White, Hannah J ; Leon-Sanchez, Lupe ; Caplat, Paul ; Emmerson, Mark C ; Yearsley, Jon M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c507t-75127bd8f7b5e62c3a21b65ac0e4c17afdd7fd31f60e50bd1707607576310ed23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Analysis</topic><topic>Biogeography</topic><topic>Datasets</topic><topic>Ecologists</topic><topic>Ecology</topic><topic>Econometrics</topic><topic>Geographical distribution</topic><topic>Geospatial data</topic><topic>Investigations</topic><topic>Predictions</topic><topic>Sample size</topic><topic>Sampling</topic><topic>Simulation</topic><topic>Species</topic><topic>Taxonomy</topic><topic>Variables</topic><topic>Zoology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gaul, Willson</creatorcontrib><creatorcontrib>Sadykova, Dinara</creatorcontrib><creatorcontrib>White, Hannah J</creatorcontrib><creatorcontrib>Leon-Sanchez, Lupe</creatorcontrib><creatorcontrib>Caplat, Paul</creatorcontrib><creatorcontrib>Emmerson, Mark C</creatorcontrib><creatorcontrib>Yearsley, Jon M</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Science Database</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>PeerJ (San Francisco, CA)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gaul, Willson</au><au>Sadykova, Dinara</au><au>White, Hannah J</au><au>Leon-Sanchez, Lupe</au><au>Caplat, Paul</au><au>Emmerson, Mark C</au><au>Yearsley, Jon M</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Data quantity is more important than its spatial bias for predictive species distribution modelling</atitle><jtitle>PeerJ (San Francisco, CA)</jtitle><addtitle>PeerJ</addtitle><date>2020-11-27</date><risdate>2020</risdate><volume>8</volume><spage>e10411</spage><epage>e10411</epage><pages>e10411-e10411</pages><artnum>e10411</artnum><issn>2167-8359</issn><eissn>2167-8359</eissn><abstract>Biological records are often the data of choice for training predictive species distribution models (SDMs), but spatial sampling bias is pervasive in biological records data at multiple spatial scales and is thought to impair the performance of SDMs. We simulated presences and absences of virtual species as well as the process of recording these species to evaluate the effect on species distribution model prediction performance of (1) spatial bias in training data, (2) sample size (the average number of observations per species), and (3) the choice of species distribution modelling method. Our approach is novel in quantifying and applying real-world spatial sampling biases to simulated data. Spatial bias in training data decreased species distribution model prediction performance, but sample size and the choice of modelling method were more important than spatial bias in determining the prediction performance of species distribution models.</abstract><cop>United States</cop><pub>PeerJ. Ltd</pub><pmid>33312769</pmid><doi>10.7717/peerj.10411</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2167-8359
ispartof PeerJ (San Francisco, CA), 2020-11, Vol.8, p.e10411-e10411, Article e10411
issn 2167-8359
2167-8359
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7703440
source DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central
subjects Analysis
Biogeography
Datasets
Ecologists
Ecology
Econometrics
Geographical distribution
Geospatial data
Investigations
Predictions
Sample size
Sampling
Simulation
Species
Taxonomy
Variables
Zoology
title Data quantity is more important than its spatial bias for predictive species distribution modelling
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T23%3A39%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Data%20quantity%20is%20more%20important%20than%20its%20spatial%20bias%20for%20predictive%20species%20distribution%20modelling&rft.jtitle=PeerJ%20(San%20Francisco,%20CA)&rft.au=Gaul,%20Willson&rft.date=2020-11-27&rft.volume=8&rft.spage=e10411&rft.epage=e10411&rft.pages=e10411-e10411&rft.artnum=e10411&rft.issn=2167-8359&rft.eissn=2167-8359&rft_id=info:doi/10.7717/peerj.10411&rft_dat=%3Cgale_pubme%3EA642951768%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2464928061&rft_id=info:pmid/33312769&rft_galeid=A642951768&rfr_iscdi=true