Prevalence dependence in model goodness measures with special emphasis on true skill statistics
It has long been a concern that performance measures of species distribution models react to attributes of the modeled entity arising from the input data structure rather than to model performance. Thus, the study of Allouche et al. (Journal of Applied Ecology, 43, 1223, 2006) identifying the true s...
Gespeichert in:
Veröffentlicht in: | Ecology and evolution 2017-02, Vol.7 (3), p.863-872 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 872 |
---|---|
container_issue | 3 |
container_start_page | 863 |
container_title | Ecology and evolution |
container_volume | 7 |
creator | Somodi, Imelda Lepesi, Nikolett Botta‐Dukát, Zoltán |
description | It has long been a concern that performance measures of species distribution models react to attributes of the modeled entity arising from the input data structure rather than to model performance. Thus, the study of Allouche et al. (Journal of Applied Ecology, 43, 1223, 2006) identifying the true skill statistics (TSS) as being independent of prevalence had a great impact. However, empirical experience questioned the validity of the statement. We searched for technical reasons behind these observations. We explored possible sources of prevalence dependence in TSS including sampling constraints and species characteristics, which influence the calculation of TSS. We also examined whether the widespread solution of using the maximum of TSS for comparison among species introduces a prevalence effect. We found that the design of Allouche et al. (Journal of Applied Ecology, 43, 1223, 2006) was flawed, but TSS is indeed independent of prevalence if model predictions are binary and under the strict set of assumptions methodological studies usually apply. However, if we take realistic sources of prevalence dependence, effects appear even in binary calculations. Furthermore, in the widespread approach of using maximum TSS for continuous predictions, the use of the maximum alone induces prevalence dependence for small, but realistic samples. Thus, prevalence differences need to be taken into account when model comparisons are carried out based on discrimination capacity. The sources we identified can serve as a checklist to safely control comparisons, so that true discrimination capacity is compared as opposed to artefacts arising from data structure, species characteristics, or the calculation of the comparison measure (here TSS).
It has long been a concern that performance measures of species distribution models (SDM) react to attributes of the modeled entity arising from the input data structure (including the ratio of presences and absences known as prevalence) rather than to model performance. The true skill statistics (TSS) has been propagated as unaffected by prevalence changes; however, experience questioned this. Therefore, we examined possible causes of observed prevalence dependence for TSS, while also extending the theory of prevalence dependence in general. |
doi_str_mv | 10.1002/ece3.2654 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5288248</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1868339976</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4764-4a507712364721613fc13469659d6bfa8a63fdfafcebd68da0e0b17f76da3d5a3</originalsourceid><addsrcrecordid>eNqNkU1rFDEYgIMottQe_AMS8KKHbfMx-ZiLIMv6AQU96Dlkk3e6qZnJmHempf_e2W4tVRDMJS_k4SHJQ8hLzs44Y-IcAsgzoVXzhBwL1qiVMco-fTQfkVPEK7YszUTDzHNyJCzXlgl5TNzXCtc-wxCARhhhiHdjGmhfImR6WUocAJH24HGugPQmTTuKI4TkM4V-3HlMSMtApzoDxR8pZ4qTnxJOKeAL8qzzGeH0fj8h3z9svq0_rS6-fPy8fn-xCo3RzarxihnDhdSNEVxz2QUuG91q1Ua97bz1Wnax812AbdQ2egZsy01ndPQyKi9PyLuDd5y3PcQAw1R9dmNNva-3rvjk_jwZ0s5dlmunhLWisYvgzb2glp8z4OT6hAFy9gOUGR232krZtkb_D6qsUKplC_r6L_SqzHVYfsIJ0TItxFJlod4eqFALYoXu4d6cuX1kt4_s9pEX9tXjhz6Qv5MuwPkBuEkZbv9tcpv1Rt4pfwH5j7Gx</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2290622006</pqid></control><display><type>article</type><title>Prevalence dependence in model goodness measures with special emphasis on true skill statistics</title><source>Wiley Online Library Open Access</source><source>DOAJ Directory of Open Access Journals</source><source>Wiley Online Library Journals Frontfile Complete</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><creator>Somodi, Imelda ; Lepesi, Nikolett ; Botta‐Dukát, Zoltán</creator><creatorcontrib>Somodi, Imelda ; Lepesi, Nikolett ; Botta‐Dukát, Zoltán</creatorcontrib><description>It has long been a concern that performance measures of species distribution models react to attributes of the modeled entity arising from the input data structure rather than to model performance. Thus, the study of Allouche et al. (Journal of Applied Ecology, 43, 1223, 2006) identifying the true skill statistics (TSS) as being independent of prevalence had a great impact. However, empirical experience questioned the validity of the statement. We searched for technical reasons behind these observations. We explored possible sources of prevalence dependence in TSS including sampling constraints and species characteristics, which influence the calculation of TSS. We also examined whether the widespread solution of using the maximum of TSS for comparison among species introduces a prevalence effect. We found that the design of Allouche et al. (Journal of Applied Ecology, 43, 1223, 2006) was flawed, but TSS is indeed independent of prevalence if model predictions are binary and under the strict set of assumptions methodological studies usually apply. However, if we take realistic sources of prevalence dependence, effects appear even in binary calculations. Furthermore, in the widespread approach of using maximum TSS for continuous predictions, the use of the maximum alone induces prevalence dependence for small, but realistic samples. Thus, prevalence differences need to be taken into account when model comparisons are carried out based on discrimination capacity. The sources we identified can serve as a checklist to safely control comparisons, so that true discrimination capacity is compared as opposed to artefacts arising from data structure, species characteristics, or the calculation of the comparison measure (here TSS).
It has long been a concern that performance measures of species distribution models (SDM) react to attributes of the modeled entity arising from the input data structure (including the ratio of presences and absences known as prevalence) rather than to model performance. The true skill statistics (TSS) has been propagated as unaffected by prevalence changes; however, experience questioned this. Therefore, we examined possible causes of observed prevalence dependence for TSS, while also extending the theory of prevalence dependence in general.</description><identifier>ISSN: 2045-7758</identifier><identifier>EISSN: 2045-7758</identifier><identifier>DOI: 10.1002/ece3.2654</identifier><identifier>PMID: 28168023</identifier><language>eng</language><publisher>England: John Wiley & Sons, Inc</publisher><subject>Cohen's kappa ; Data structures ; Datasets ; Dependence ; Discrimination ; Ecological monitoring ; Ecology ; Introduced species ; Kappa coefficient ; model performance ; Original Research ; predictive models ; sample size ; species distribution models ; Statistical methods ; Statistics</subject><ispartof>Ecology and evolution, 2017-02, Vol.7 (3), p.863-872</ispartof><rights>2017 The Authors. published by John Wiley & Sons Ltd.</rights><rights>2017. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4764-4a507712364721613fc13469659d6bfa8a63fdfafcebd68da0e0b17f76da3d5a3</citedby><cites>FETCH-LOGICAL-c4764-4a507712364721613fc13469659d6bfa8a63fdfafcebd68da0e0b17f76da3d5a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5288248/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5288248/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,1411,11541,27901,27902,45550,45551,46027,46451,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28168023$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Somodi, Imelda</creatorcontrib><creatorcontrib>Lepesi, Nikolett</creatorcontrib><creatorcontrib>Botta‐Dukát, Zoltán</creatorcontrib><title>Prevalence dependence in model goodness measures with special emphasis on true skill statistics</title><title>Ecology and evolution</title><addtitle>Ecol Evol</addtitle><description>It has long been a concern that performance measures of species distribution models react to attributes of the modeled entity arising from the input data structure rather than to model performance. Thus, the study of Allouche et al. (Journal of Applied Ecology, 43, 1223, 2006) identifying the true skill statistics (TSS) as being independent of prevalence had a great impact. However, empirical experience questioned the validity of the statement. We searched for technical reasons behind these observations. We explored possible sources of prevalence dependence in TSS including sampling constraints and species characteristics, which influence the calculation of TSS. We also examined whether the widespread solution of using the maximum of TSS for comparison among species introduces a prevalence effect. We found that the design of Allouche et al. (Journal of Applied Ecology, 43, 1223, 2006) was flawed, but TSS is indeed independent of prevalence if model predictions are binary and under the strict set of assumptions methodological studies usually apply. However, if we take realistic sources of prevalence dependence, effects appear even in binary calculations. Furthermore, in the widespread approach of using maximum TSS for continuous predictions, the use of the maximum alone induces prevalence dependence for small, but realistic samples. Thus, prevalence differences need to be taken into account when model comparisons are carried out based on discrimination capacity. The sources we identified can serve as a checklist to safely control comparisons, so that true discrimination capacity is compared as opposed to artefacts arising from data structure, species characteristics, or the calculation of the comparison measure (here TSS).
It has long been a concern that performance measures of species distribution models (SDM) react to attributes of the modeled entity arising from the input data structure (including the ratio of presences and absences known as prevalence) rather than to model performance. The true skill statistics (TSS) has been propagated as unaffected by prevalence changes; however, experience questioned this. Therefore, we examined possible causes of observed prevalence dependence for TSS, while also extending the theory of prevalence dependence in general.</description><subject>Cohen's kappa</subject><subject>Data structures</subject><subject>Datasets</subject><subject>Dependence</subject><subject>Discrimination</subject><subject>Ecological monitoring</subject><subject>Ecology</subject><subject>Introduced species</subject><subject>Kappa coefficient</subject><subject>model performance</subject><subject>Original Research</subject><subject>predictive models</subject><subject>sample size</subject><subject>species distribution models</subject><subject>Statistical methods</subject><subject>Statistics</subject><issn>2045-7758</issn><issn>2045-7758</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>24P</sourceid><sourceid>BENPR</sourceid><recordid>eNqNkU1rFDEYgIMottQe_AMS8KKHbfMx-ZiLIMv6AQU96Dlkk3e6qZnJmHempf_e2W4tVRDMJS_k4SHJQ8hLzs44Y-IcAsgzoVXzhBwL1qiVMco-fTQfkVPEK7YszUTDzHNyJCzXlgl5TNzXCtc-wxCARhhhiHdjGmhfImR6WUocAJH24HGugPQmTTuKI4TkM4V-3HlMSMtApzoDxR8pZ4qTnxJOKeAL8qzzGeH0fj8h3z9svq0_rS6-fPy8fn-xCo3RzarxihnDhdSNEVxz2QUuG91q1Ua97bz1Wnax812AbdQ2egZsy01ndPQyKi9PyLuDd5y3PcQAw1R9dmNNva-3rvjk_jwZ0s5dlmunhLWisYvgzb2glp8z4OT6hAFy9gOUGR232krZtkb_D6qsUKplC_r6L_SqzHVYfsIJ0TItxFJlod4eqFALYoXu4d6cuX1kt4_s9pEX9tXjhz6Qv5MuwPkBuEkZbv9tcpv1Rt4pfwH5j7Gx</recordid><startdate>201702</startdate><enddate>201702</enddate><creator>Somodi, Imelda</creator><creator>Lepesi, Nikolett</creator><creator>Botta‐Dukát, Zoltán</creator><general>John Wiley & Sons, Inc</general><general>John Wiley and Sons Inc</general><scope>24P</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SN</scope><scope>7SS</scope><scope>7ST</scope><scope>7X2</scope><scope>8FD</scope><scope>8FE</scope><scope>8FH</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M0K</scope><scope>M7P</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>RC3</scope><scope>SOI</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>201702</creationdate><title>Prevalence dependence in model goodness measures with special emphasis on true skill statistics</title><author>Somodi, Imelda ; Lepesi, Nikolett ; Botta‐Dukát, Zoltán</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4764-4a507712364721613fc13469659d6bfa8a63fdfafcebd68da0e0b17f76da3d5a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Cohen's kappa</topic><topic>Data structures</topic><topic>Datasets</topic><topic>Dependence</topic><topic>Discrimination</topic><topic>Ecological monitoring</topic><topic>Ecology</topic><topic>Introduced species</topic><topic>Kappa coefficient</topic><topic>model performance</topic><topic>Original Research</topic><topic>predictive models</topic><topic>sample size</topic><topic>species distribution models</topic><topic>Statistical methods</topic><topic>Statistics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Somodi, Imelda</creatorcontrib><creatorcontrib>Lepesi, Nikolett</creatorcontrib><creatorcontrib>Botta‐Dukát, Zoltán</creatorcontrib><collection>Wiley Online Library Open Access</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Environment Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Biological Science Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Genetics Abstracts</collection><collection>Environment Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Ecology and evolution</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Somodi, Imelda</au><au>Lepesi, Nikolett</au><au>Botta‐Dukát, Zoltán</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Prevalence dependence in model goodness measures with special emphasis on true skill statistics</atitle><jtitle>Ecology and evolution</jtitle><addtitle>Ecol Evol</addtitle><date>2017-02</date><risdate>2017</risdate><volume>7</volume><issue>3</issue><spage>863</spage><epage>872</epage><pages>863-872</pages><issn>2045-7758</issn><eissn>2045-7758</eissn><abstract>It has long been a concern that performance measures of species distribution models react to attributes of the modeled entity arising from the input data structure rather than to model performance. Thus, the study of Allouche et al. (Journal of Applied Ecology, 43, 1223, 2006) identifying the true skill statistics (TSS) as being independent of prevalence had a great impact. However, empirical experience questioned the validity of the statement. We searched for technical reasons behind these observations. We explored possible sources of prevalence dependence in TSS including sampling constraints and species characteristics, which influence the calculation of TSS. We also examined whether the widespread solution of using the maximum of TSS for comparison among species introduces a prevalence effect. We found that the design of Allouche et al. (Journal of Applied Ecology, 43, 1223, 2006) was flawed, but TSS is indeed independent of prevalence if model predictions are binary and under the strict set of assumptions methodological studies usually apply. However, if we take realistic sources of prevalence dependence, effects appear even in binary calculations. Furthermore, in the widespread approach of using maximum TSS for continuous predictions, the use of the maximum alone induces prevalence dependence for small, but realistic samples. Thus, prevalence differences need to be taken into account when model comparisons are carried out based on discrimination capacity. The sources we identified can serve as a checklist to safely control comparisons, so that true discrimination capacity is compared as opposed to artefacts arising from data structure, species characteristics, or the calculation of the comparison measure (here TSS).
It has long been a concern that performance measures of species distribution models (SDM) react to attributes of the modeled entity arising from the input data structure (including the ratio of presences and absences known as prevalence) rather than to model performance. The true skill statistics (TSS) has been propagated as unaffected by prevalence changes; however, experience questioned this. Therefore, we examined possible causes of observed prevalence dependence for TSS, while also extending the theory of prevalence dependence in general.</abstract><cop>England</cop><pub>John Wiley & Sons, Inc</pub><pmid>28168023</pmid><doi>10.1002/ece3.2654</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2045-7758 |
ispartof | Ecology and evolution, 2017-02, Vol.7 (3), p.863-872 |
issn | 2045-7758 2045-7758 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5288248 |
source | Wiley Online Library Open Access; DOAJ Directory of Open Access Journals; Wiley Online Library Journals Frontfile Complete; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central |
subjects | Cohen's kappa Data structures Datasets Dependence Discrimination Ecological monitoring Ecology Introduced species Kappa coefficient model performance Original Research predictive models sample size species distribution models Statistical methods Statistics |
title | Prevalence dependence in model goodness measures with special emphasis on true skill statistics |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T06%3A32%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Prevalence%20dependence%20in%20model%20goodness%20measures%20with%20special%20emphasis%20on%20true%20skill%20statistics&rft.jtitle=Ecology%20and%20evolution&rft.au=Somodi,%20Imelda&rft.date=2017-02&rft.volume=7&rft.issue=3&rft.spage=863&rft.epage=872&rft.pages=863-872&rft.issn=2045-7758&rft.eissn=2045-7758&rft_id=info:doi/10.1002/ece3.2654&rft_dat=%3Cproquest_pubme%3E1868339976%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2290622006&rft_id=info:pmid/28168023&rfr_iscdi=true |