Comparative Performance of Popular Methods for Hybrid Detection using Genomic Data

Abstract Interspecific hybridization is an important evolutionary phenomenon that generates genetic variability in a population and fosters species diversity in nature. The availability of large genome scale data sets has revolutionized hybridization studies to shift from the observation of the pres...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Systematic biology 2021-08, Vol.70 (5), p.891-907
Hauptverfasser: Kong, Sungsik, Kubatko, Laura S
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 907
container_issue 5
container_start_page 891
container_title Systematic biology
container_volume 70
creator Kong, Sungsik
Kubatko, Laura S
description Abstract Interspecific hybridization is an important evolutionary phenomenon that generates genetic variability in a population and fosters species diversity in nature. The availability of large genome scale data sets has revolutionized hybridization studies to shift from the observation of the presence or absence of hybrids to the investigation of the genomic constitution of hybrids and their genome-specific evolutionary dynamics. Although a handful of methods have been proposed in an attempt to identify hybrids, accurate detection of hybridization from genomic data remains a challenging task. In addition to methods that infer phylogenetic networks or that utilize pairwise divergence, site pattern frequency based and population genetic clustering approaches are popularly used in practice, though the performance of these methods under different hybridization scenarios has not been extensively examined. Here, we use simulated data to comparatively evaluate the performance of four tools that are commonly used to infer hybridization events: the site pattern frequency based methods HyDe and the $D$-statistic (i.e., the ABBA-BABA test) and the population clustering approaches structure and ADMIXTURE. We consider single hybridization scenarios that vary in the time of hybridization and the amount of incomplete lineage sorting (ILS) for different proportions of parental contributions ($\gamma$); introgressive hybridization; multiple hybridization scenarios; and a mixture of ancestral and recent hybridization scenarios. We focus on the statistical power to detect hybridization and the false discovery rate (FDR) for comparisons of the $D$-statistic and HyDe, and the accuracy of the estimates of $\gamma$ as measured by the mean squared error for HyDe, structure, and ADMIXTURE. Both HyDe and the $D$-statistic are powerful for detecting hybridization in all scenarios except those with high ILS, although the $D$-statistic often has an unacceptably high FDR. The estimates of $\gamma$ in HyDe are impressively robust and accurate whereas structure and ADMIXTURE sometimes fail to identify hybrids, particularly when the proportional parental contributions are asymmetric (i.e., when $\gamma$ is close to 0). Moreover, the posterior distribution estimated using structure exhibits multimodality in many scenarios, making interpretation difficult. Our results provide guidance in selecting appropriate methods for identifying hybrid populations from genomic data. [ABBA-BABA test; A
doi_str_mv 10.1093/sysbio/syaa092
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2475532855</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/sysbio/syaa092</oup_id><sourcerecordid>2475532855</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-cd11e67ece3c3e90f946e7b5f3bf98d80c5a0f397eeb8bbba279e575b85750893</originalsourceid><addsrcrecordid>eNqFkDFPwzAQRi0EoqWwMiKPMATsOHbiEbXQIhVRIZDYItu5gFESBztB6r8nVQory30n3btveAidU3JNiWQ3YRu0dUMoRWR8gKaUpCLKmHg73O2CRZzydIJOQvgkhFLB6TGaMJaQRLB4ip7nrm6VV539BrwBXzpfq8YAdiXeuLavlMeP0H24IuDhhldb7W2BF9CB6axrcB9s846X0LjaGrxQnTpFR6WqApztc4Ze7-9e5qto_bR8mN-uI8OE7CJTUAoiBQPMMJCklImAVPOS6VJmRUYMV6RkMgXQmdZaxakEnnKdDYNkks3Q5djbevfVQ-jy2gYDVaUacH3I4yTlnMXZMGboekSNdyF4KPPW21r5bU5JvvOYjx7zvcfh4WLf3esaij_8V9wAXI2A69v_yn4AiRmAPg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2475532855</pqid></control><display><type>article</type><title>Comparative Performance of Popular Methods for Hybrid Detection using Genomic Data</title><source>Oxford University Press Journals All Titles (1996-Current)</source><source>Alma/SFX Local Collection</source><creator>Kong, Sungsik ; Kubatko, Laura S</creator><contributor>Hahn, Matthew</contributor><creatorcontrib>Kong, Sungsik ; Kubatko, Laura S ; Hahn, Matthew</creatorcontrib><description>Abstract Interspecific hybridization is an important evolutionary phenomenon that generates genetic variability in a population and fosters species diversity in nature. The availability of large genome scale data sets has revolutionized hybridization studies to shift from the observation of the presence or absence of hybrids to the investigation of the genomic constitution of hybrids and their genome-specific evolutionary dynamics. Although a handful of methods have been proposed in an attempt to identify hybrids, accurate detection of hybridization from genomic data remains a challenging task. In addition to methods that infer phylogenetic networks or that utilize pairwise divergence, site pattern frequency based and population genetic clustering approaches are popularly used in practice, though the performance of these methods under different hybridization scenarios has not been extensively examined. Here, we use simulated data to comparatively evaluate the performance of four tools that are commonly used to infer hybridization events: the site pattern frequency based methods HyDe and the $D$-statistic (i.e., the ABBA-BABA test) and the population clustering approaches structure and ADMIXTURE. We consider single hybridization scenarios that vary in the time of hybridization and the amount of incomplete lineage sorting (ILS) for different proportions of parental contributions ($\gamma$); introgressive hybridization; multiple hybridization scenarios; and a mixture of ancestral and recent hybridization scenarios. We focus on the statistical power to detect hybridization and the false discovery rate (FDR) for comparisons of the $D$-statistic and HyDe, and the accuracy of the estimates of $\gamma$ as measured by the mean squared error for HyDe, structure, and ADMIXTURE. Both HyDe and the $D$-statistic are powerful for detecting hybridization in all scenarios except those with high ILS, although the $D$-statistic often has an unacceptably high FDR. The estimates of $\gamma$ in HyDe are impressively robust and accurate whereas structure and ADMIXTURE sometimes fail to identify hybrids, particularly when the proportional parental contributions are asymmetric (i.e., when $\gamma$ is close to 0). Moreover, the posterior distribution estimated using structure exhibits multimodality in many scenarios, making interpretation difficult. Our results provide guidance in selecting appropriate methods for identifying hybrid populations from genomic data. [ABBA-BABA test; ADMIXTURE; hybridization; HyDe; introgression; Patterson’s $D$-statistic; Structure.]</description><identifier>ISSN: 1063-5157</identifier><identifier>EISSN: 1076-836X</identifier><identifier>DOI: 10.1093/sysbio/syaa092</identifier><identifier>PMID: 33404632</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><ispartof>Systematic biology, 2021-08, Vol.70 (5), p.891-907</ispartof><rights>The Author(s) 2021. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please email: journals.permissions@oup.com 2021</rights><rights>The Author(s) 2021. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-cd11e67ece3c3e90f946e7b5f3bf98d80c5a0f397eeb8bbba279e575b85750893</citedby><cites>FETCH-LOGICAL-c369t-cd11e67ece3c3e90f946e7b5f3bf98d80c5a0f397eeb8bbba279e575b85750893</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1584,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33404632$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Hahn, Matthew</contributor><creatorcontrib>Kong, Sungsik</creatorcontrib><creatorcontrib>Kubatko, Laura S</creatorcontrib><title>Comparative Performance of Popular Methods for Hybrid Detection using Genomic Data</title><title>Systematic biology</title><addtitle>Syst Biol</addtitle><description>Abstract Interspecific hybridization is an important evolutionary phenomenon that generates genetic variability in a population and fosters species diversity in nature. The availability of large genome scale data sets has revolutionized hybridization studies to shift from the observation of the presence or absence of hybrids to the investigation of the genomic constitution of hybrids and their genome-specific evolutionary dynamics. Although a handful of methods have been proposed in an attempt to identify hybrids, accurate detection of hybridization from genomic data remains a challenging task. In addition to methods that infer phylogenetic networks or that utilize pairwise divergence, site pattern frequency based and population genetic clustering approaches are popularly used in practice, though the performance of these methods under different hybridization scenarios has not been extensively examined. Here, we use simulated data to comparatively evaluate the performance of four tools that are commonly used to infer hybridization events: the site pattern frequency based methods HyDe and the $D$-statistic (i.e., the ABBA-BABA test) and the population clustering approaches structure and ADMIXTURE. We consider single hybridization scenarios that vary in the time of hybridization and the amount of incomplete lineage sorting (ILS) for different proportions of parental contributions ($\gamma$); introgressive hybridization; multiple hybridization scenarios; and a mixture of ancestral and recent hybridization scenarios. We focus on the statistical power to detect hybridization and the false discovery rate (FDR) for comparisons of the $D$-statistic and HyDe, and the accuracy of the estimates of $\gamma$ as measured by the mean squared error for HyDe, structure, and ADMIXTURE. Both HyDe and the $D$-statistic are powerful for detecting hybridization in all scenarios except those with high ILS, although the $D$-statistic often has an unacceptably high FDR. The estimates of $\gamma$ in HyDe are impressively robust and accurate whereas structure and ADMIXTURE sometimes fail to identify hybrids, particularly when the proportional parental contributions are asymmetric (i.e., when $\gamma$ is close to 0). Moreover, the posterior distribution estimated using structure exhibits multimodality in many scenarios, making interpretation difficult. Our results provide guidance in selecting appropriate methods for identifying hybrid populations from genomic data. [ABBA-BABA test; ADMIXTURE; hybridization; HyDe; introgression; Patterson’s $D$-statistic; Structure.]</description><issn>1063-5157</issn><issn>1076-836X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNqFkDFPwzAQRi0EoqWwMiKPMATsOHbiEbXQIhVRIZDYItu5gFESBztB6r8nVQory30n3btveAidU3JNiWQ3YRu0dUMoRWR8gKaUpCLKmHg73O2CRZzydIJOQvgkhFLB6TGaMJaQRLB4ip7nrm6VV539BrwBXzpfq8YAdiXeuLavlMeP0H24IuDhhldb7W2BF9CB6axrcB9s846X0LjaGrxQnTpFR6WqApztc4Ze7-9e5qto_bR8mN-uI8OE7CJTUAoiBQPMMJCklImAVPOS6VJmRUYMV6RkMgXQmdZaxakEnnKdDYNkks3Q5djbevfVQ-jy2gYDVaUacH3I4yTlnMXZMGboekSNdyF4KPPW21r5bU5JvvOYjx7zvcfh4WLf3esaij_8V9wAXI2A69v_yn4AiRmAPg</recordid><startdate>20210811</startdate><enddate>20210811</enddate><creator>Kong, Sungsik</creator><creator>Kubatko, Laura S</creator><general>Oxford University Press</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20210811</creationdate><title>Comparative Performance of Popular Methods for Hybrid Detection using Genomic Data</title><author>Kong, Sungsik ; Kubatko, Laura S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-cd11e67ece3c3e90f946e7b5f3bf98d80c5a0f397eeb8bbba279e575b85750893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kong, Sungsik</creatorcontrib><creatorcontrib>Kubatko, Laura S</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Systematic biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kong, Sungsik</au><au>Kubatko, Laura S</au><au>Hahn, Matthew</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparative Performance of Popular Methods for Hybrid Detection using Genomic Data</atitle><jtitle>Systematic biology</jtitle><addtitle>Syst Biol</addtitle><date>2021-08-11</date><risdate>2021</risdate><volume>70</volume><issue>5</issue><spage>891</spage><epage>907</epage><pages>891-907</pages><issn>1063-5157</issn><eissn>1076-836X</eissn><abstract>Abstract Interspecific hybridization is an important evolutionary phenomenon that generates genetic variability in a population and fosters species diversity in nature. The availability of large genome scale data sets has revolutionized hybridization studies to shift from the observation of the presence or absence of hybrids to the investigation of the genomic constitution of hybrids and their genome-specific evolutionary dynamics. Although a handful of methods have been proposed in an attempt to identify hybrids, accurate detection of hybridization from genomic data remains a challenging task. In addition to methods that infer phylogenetic networks or that utilize pairwise divergence, site pattern frequency based and population genetic clustering approaches are popularly used in practice, though the performance of these methods under different hybridization scenarios has not been extensively examined. Here, we use simulated data to comparatively evaluate the performance of four tools that are commonly used to infer hybridization events: the site pattern frequency based methods HyDe and the $D$-statistic (i.e., the ABBA-BABA test) and the population clustering approaches structure and ADMIXTURE. We consider single hybridization scenarios that vary in the time of hybridization and the amount of incomplete lineage sorting (ILS) for different proportions of parental contributions ($\gamma$); introgressive hybridization; multiple hybridization scenarios; and a mixture of ancestral and recent hybridization scenarios. We focus on the statistical power to detect hybridization and the false discovery rate (FDR) for comparisons of the $D$-statistic and HyDe, and the accuracy of the estimates of $\gamma$ as measured by the mean squared error for HyDe, structure, and ADMIXTURE. Both HyDe and the $D$-statistic are powerful for detecting hybridization in all scenarios except those with high ILS, although the $D$-statistic often has an unacceptably high FDR. The estimates of $\gamma$ in HyDe are impressively robust and accurate whereas structure and ADMIXTURE sometimes fail to identify hybrids, particularly when the proportional parental contributions are asymmetric (i.e., when $\gamma$ is close to 0). Moreover, the posterior distribution estimated using structure exhibits multimodality in many scenarios, making interpretation difficult. Our results provide guidance in selecting appropriate methods for identifying hybrid populations from genomic data. [ABBA-BABA test; ADMIXTURE; hybridization; HyDe; introgression; Patterson’s $D$-statistic; Structure.]</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>33404632</pmid><doi>10.1093/sysbio/syaa092</doi><tpages>17</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1063-5157
ispartof Systematic biology, 2021-08, Vol.70 (5), p.891-907
issn 1063-5157
1076-836X
language eng
recordid cdi_proquest_miscellaneous_2475532855
source Oxford University Press Journals All Titles (1996-Current); Alma/SFX Local Collection
title Comparative Performance of Popular Methods for Hybrid Detection using Genomic Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T20%3A12%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparative%20Performance%20of%20Popular%20Methods%20for%20Hybrid%20Detection%20using%20Genomic%20Data&rft.jtitle=Systematic%20biology&rft.au=Kong,%20Sungsik&rft.date=2021-08-11&rft.volume=70&rft.issue=5&rft.spage=891&rft.epage=907&rft.pages=891-907&rft.issn=1063-5157&rft.eissn=1076-836X&rft_id=info:doi/10.1093/sysbio/syaa092&rft_dat=%3Cproquest_cross%3E2475532855%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2475532855&rft_id=info:pmid/33404632&rft_oup_id=10.1093/sysbio/syaa092&rfr_iscdi=true