Pathologies of Between-Groups Principal Components Analysis in Geometric Morphometrics
Good empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as “high p / n ,” where p is the count of variables and n the count of specimens. This note calls your attention to two predictable catas...
Gespeichert in:
Veröffentlicht in: | Evolutionary biology 2019-12, Vol.46 (4), p.271-302 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 302 |
---|---|
container_issue | 4 |
container_start_page | 271 |
container_title | Evolutionary biology |
container_volume | 46 |
creator | Bookstein, Fred L. |
description | Good empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as “high
p
/
n
,” where
p
is the count of variables and
n
the count of specimens. This note calls your attention to two predictable catastrophic failures of one particular multivariate statistical technique, between-groups principal components analysis (bgPCA), in this high-
p
/
n
setting. The more obvious pathology is this: when applied to the patternless (null) model of
p
identically distributed Gaussians over groups of the same size, both bgPCA and its algebraic equivalent, partial least squares (PLS) analysis against group, necessarily generate the appearance of huge equilateral group separations that are fictitious (absent from the statistical model). When specimen counts by group vary greatly or when any group includes fewer than about ten specimens, an even worse failure of the technique obtains: the smaller the group, the more likely a bgPCA is to fictitiously identify that group as the end-member of one of its derived axes. For these two reasons, when used in GMM and other high-
p
/
n
settings the bgPCA method very often leads to invalid or insecure biological inferences. This paper demonstrates and quantifies these and other pathological outcomes both for patternless models and for models with one or two valid factors, then offers suggestions for how GMM practitioners should protect themselves against the consequences for inference of these lamentably predictable misrepresentations. The bgPCA method should never be used unskeptically—it is always untrustworthy, never authoritative—and whenever it appears in partial support of any biological inference it must be accompanied by a wide range of diagnostic plots and other challenges, many of which are presented here for the first time. |
doi_str_mv | 10.1007/s11692-019-09484-8 |
format | Article |
fullrecord | <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2311218214</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A717808025</galeid><sourcerecordid>A717808025</sourcerecordid><originalsourceid>FETCH-LOGICAL-c496t-273d1f538020f7d29d17922d668cca92e2ef9c53eb71d9d4eb4bc9c871715b693</originalsourceid><addsrcrecordid>eNp9kU1LAzEQhoMoWKt_wNOC52gm2Y_kWItWoWIP6jVss7NtZHezJluk_95oC0UQyWFIeJ7JDC8hl8CugbHiJgDkilMGijKVypTKIzICJVLKZZodk1GEgAqes1NyFsI7Y5kohByRt0U5rF3jVhZD4urkFodPxI7OvNv0IVl42xnbl00ydW3vOuyGkEy6stkGGxLbJTN0LQ7emuTJ-X69v4RzclKXTcCLfR2T1_u7l-kDnT_PHqeTOTWpygfKC1FBnQnJOKuLiqsKCsV5lefSmFJx5FgrkwlcFlCpKsVlujTKyAIKyJa5EmNytevbe_exwTDod7fxcb6guQDgIDmkB2pVNqhtV7vBl6a1wehJbCVZ_D-L1PUfVDwVttbE3Wsb338JfCcY70LwWOve27b0Ww1Mf8eid7HoGIv-iUXLKImdFCLcrdAfJv7H-gJt1Y70</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2311218214</pqid></control><display><type>article</type><title>Pathologies of Between-Groups Principal Components Analysis in Geometric Morphometrics</title><source>SpringerNature Journals</source><creator>Bookstein, Fred L.</creator><creatorcontrib>Bookstein, Fred L.</creatorcontrib><description>Good empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as “high
p
/
n
,” where
p
is the count of variables and
n
the count of specimens. This note calls your attention to two predictable catastrophic failures of one particular multivariate statistical technique, between-groups principal components analysis (bgPCA), in this high-
p
/
n
setting. The more obvious pathology is this: when applied to the patternless (null) model of
p
identically distributed Gaussians over groups of the same size, both bgPCA and its algebraic equivalent, partial least squares (PLS) analysis against group, necessarily generate the appearance of huge equilateral group separations that are fictitious (absent from the statistical model). When specimen counts by group vary greatly or when any group includes fewer than about ten specimens, an even worse failure of the technique obtains: the smaller the group, the more likely a bgPCA is to fictitiously identify that group as the end-member of one of its derived axes. For these two reasons, when used in GMM and other high-
p
/
n
settings the bgPCA method very often leads to invalid or insecure biological inferences. This paper demonstrates and quantifies these and other pathological outcomes both for patternless models and for models with one or two valid factors, then offers suggestions for how GMM practitioners should protect themselves against the consequences for inference of these lamentably predictable misrepresentations. The bgPCA method should never be used unskeptically—it is always untrustworthy, never authoritative—and whenever it appears in partial support of any biological inference it must be accompanied by a wide range of diagnostic plots and other challenges, many of which are presented here for the first time.</description><identifier>ISSN: 0071-3260</identifier><identifier>EISSN: 1934-2845</identifier><identifier>DOI: 10.1007/s11692-019-09484-8</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Animal Genetics and Genomics ; Biomedical and Life Sciences ; Developmental Biology ; Ecology ; Evolutionary Biology ; Focal Reviews ; Human Genetics ; Life Sciences ; Mathematical models ; Morphometry ; Principal components analysis ; Statistical analysis</subject><ispartof>Evolutionary biology, 2019-12, Vol.46 (4), p.271-302</ispartof><rights>The Author(s) 2019</rights><rights>COPYRIGHT 2019 Springer</rights><rights>Evolutionary Biology is a copyright of Springer, (2019). All Rights Reserved. © 2019. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c496t-273d1f538020f7d29d17922d668cca92e2ef9c53eb71d9d4eb4bc9c871715b693</citedby><cites>FETCH-LOGICAL-c496t-273d1f538020f7d29d17922d668cca92e2ef9c53eb71d9d4eb4bc9c871715b693</cites><orcidid>0000-0003-2716-8471</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11692-019-09484-8$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11692-019-09484-8$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Bookstein, Fred L.</creatorcontrib><title>Pathologies of Between-Groups Principal Components Analysis in Geometric Morphometrics</title><title>Evolutionary biology</title><addtitle>Evol Biol</addtitle><description>Good empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as “high
p
/
n
,” where
p
is the count of variables and
n
the count of specimens. This note calls your attention to two predictable catastrophic failures of one particular multivariate statistical technique, between-groups principal components analysis (bgPCA), in this high-
p
/
n
setting. The more obvious pathology is this: when applied to the patternless (null) model of
p
identically distributed Gaussians over groups of the same size, both bgPCA and its algebraic equivalent, partial least squares (PLS) analysis against group, necessarily generate the appearance of huge equilateral group separations that are fictitious (absent from the statistical model). When specimen counts by group vary greatly or when any group includes fewer than about ten specimens, an even worse failure of the technique obtains: the smaller the group, the more likely a bgPCA is to fictitiously identify that group as the end-member of one of its derived axes. For these two reasons, when used in GMM and other high-
p
/
n
settings the bgPCA method very often leads to invalid or insecure biological inferences. This paper demonstrates and quantifies these and other pathological outcomes both for patternless models and for models with one or two valid factors, then offers suggestions for how GMM practitioners should protect themselves against the consequences for inference of these lamentably predictable misrepresentations. The bgPCA method should never be used unskeptically—it is always untrustworthy, never authoritative—and whenever it appears in partial support of any biological inference it must be accompanied by a wide range of diagnostic plots and other challenges, many of which are presented here for the first time.</description><subject>Animal Genetics and Genomics</subject><subject>Biomedical and Life Sciences</subject><subject>Developmental Biology</subject><subject>Ecology</subject><subject>Evolutionary Biology</subject><subject>Focal Reviews</subject><subject>Human Genetics</subject><subject>Life Sciences</subject><subject>Mathematical models</subject><subject>Morphometry</subject><subject>Principal components analysis</subject><subject>Statistical analysis</subject><issn>0071-3260</issn><issn>1934-2845</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kU1LAzEQhoMoWKt_wNOC52gm2Y_kWItWoWIP6jVss7NtZHezJluk_95oC0UQyWFIeJ7JDC8hl8CugbHiJgDkilMGijKVypTKIzICJVLKZZodk1GEgAqes1NyFsI7Y5kohByRt0U5rF3jVhZD4urkFodPxI7OvNv0IVl42xnbl00ydW3vOuyGkEy6stkGGxLbJTN0LQ7emuTJ-X69v4RzclKXTcCLfR2T1_u7l-kDnT_PHqeTOTWpygfKC1FBnQnJOKuLiqsKCsV5lefSmFJx5FgrkwlcFlCpKsVlujTKyAIKyJa5EmNytevbe_exwTDod7fxcb6guQDgIDmkB2pVNqhtV7vBl6a1wehJbCVZ_D-L1PUfVDwVttbE3Wsb338JfCcY70LwWOve27b0Ww1Mf8eid7HoGIv-iUXLKImdFCLcrdAfJv7H-gJt1Y70</recordid><startdate>20191201</startdate><enddate>20191201</enddate><creator>Bookstein, Fred L.</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FH</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M7P</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0003-2716-8471</orcidid></search><sort><creationdate>20191201</creationdate><title>Pathologies of Between-Groups Principal Components Analysis in Geometric Morphometrics</title><author>Bookstein, Fred L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c496t-273d1f538020f7d29d17922d668cca92e2ef9c53eb71d9d4eb4bc9c871715b693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Animal Genetics and Genomics</topic><topic>Biomedical and Life Sciences</topic><topic>Developmental Biology</topic><topic>Ecology</topic><topic>Evolutionary Biology</topic><topic>Focal Reviews</topic><topic>Human Genetics</topic><topic>Life Sciences</topic><topic>Mathematical models</topic><topic>Morphometry</topic><topic>Principal components analysis</topic><topic>Statistical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bookstein, Fred L.</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Biological Science Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Evolutionary biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bookstein, Fred L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Pathologies of Between-Groups Principal Components Analysis in Geometric Morphometrics</atitle><jtitle>Evolutionary biology</jtitle><stitle>Evol Biol</stitle><date>2019-12-01</date><risdate>2019</risdate><volume>46</volume><issue>4</issue><spage>271</spage><epage>302</epage><pages>271-302</pages><issn>0071-3260</issn><eissn>1934-2845</eissn><abstract>Good empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as “high
p
/
n
,” where
p
is the count of variables and
n
the count of specimens. This note calls your attention to two predictable catastrophic failures of one particular multivariate statistical technique, between-groups principal components analysis (bgPCA), in this high-
p
/
n
setting. The more obvious pathology is this: when applied to the patternless (null) model of
p
identically distributed Gaussians over groups of the same size, both bgPCA and its algebraic equivalent, partial least squares (PLS) analysis against group, necessarily generate the appearance of huge equilateral group separations that are fictitious (absent from the statistical model). When specimen counts by group vary greatly or when any group includes fewer than about ten specimens, an even worse failure of the technique obtains: the smaller the group, the more likely a bgPCA is to fictitiously identify that group as the end-member of one of its derived axes. For these two reasons, when used in GMM and other high-
p
/
n
settings the bgPCA method very often leads to invalid or insecure biological inferences. This paper demonstrates and quantifies these and other pathological outcomes both for patternless models and for models with one or two valid factors, then offers suggestions for how GMM practitioners should protect themselves against the consequences for inference of these lamentably predictable misrepresentations. The bgPCA method should never be used unskeptically—it is always untrustworthy, never authoritative—and whenever it appears in partial support of any biological inference it must be accompanied by a wide range of diagnostic plots and other challenges, many of which are presented here for the first time.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11692-019-09484-8</doi><tpages>32</tpages><orcidid>https://orcid.org/0000-0003-2716-8471</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0071-3260 |
ispartof | Evolutionary biology, 2019-12, Vol.46 (4), p.271-302 |
issn | 0071-3260 1934-2845 |
language | eng |
recordid | cdi_proquest_journals_2311218214 |
source | SpringerNature Journals |
subjects | Animal Genetics and Genomics Biomedical and Life Sciences Developmental Biology Ecology Evolutionary Biology Focal Reviews Human Genetics Life Sciences Mathematical models Morphometry Principal components analysis Statistical analysis |
title | Pathologies of Between-Groups Principal Components Analysis in Geometric Morphometrics |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T20%3A46%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Pathologies%20of%20Between-Groups%20Principal%20Components%20Analysis%20in%20Geometric%20Morphometrics&rft.jtitle=Evolutionary%20biology&rft.au=Bookstein,%20Fred%20L.&rft.date=2019-12-01&rft.volume=46&rft.issue=4&rft.spage=271&rft.epage=302&rft.pages=271-302&rft.issn=0071-3260&rft.eissn=1934-2845&rft_id=info:doi/10.1007/s11692-019-09484-8&rft_dat=%3Cgale_proqu%3EA717808025%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2311218214&rft_id=info:pmid/&rft_galeid=A717808025&rfr_iscdi=true |