Localizing and Classifying Adaptive Targets with Trend Filtered Regression

Identifying genomic locations of natural selection from sequence data is an ongoing challenge in population genetics. Current methods utilizing information combined from several summary statistics typically assume no correlation of summary statistics regardless of the genomic location from which the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular biology and evolution 2019-02, Vol.36 (2), p.252-270
Hauptverfasser: Mughal, Mehreen R, DeGiorgio, Michael
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 270
container_issue 2
container_start_page 252
container_title Molecular biology and evolution
container_volume 36
creator Mughal, Mehreen R
DeGiorgio, Michael
description Identifying genomic locations of natural selection from sequence data is an ongoing challenge in population genetics. Current methods utilizing information combined from several summary statistics typically assume no correlation of summary statistics regardless of the genomic location from which they are calculated. However, due to linkage disequilibrium, summary statistics calculated at nearby genomic positions are highly correlated. We introduce an approach termed Trendsetter that accounts for the similarity of statistics calculated from adjacent genomic regions through trend filtering, while reducing the effects of multicollinearity through regularization. Our penalized regression framework has high power to detect sweeps, is capable of classifying sweep regions as either hard or soft, and can be applied to other selection scenarios as well. We find that Trendsetter is robust to both extensive missing data and strong background selection, and has comparable power to similar current approaches. Moreover, the model learned by Trendsetter can be viewed as a set of curves modeling the spatial distribution of summary statistics in the genome. Application to human genomic data revealed positively selected regions previously discovered such as LCT in Europeans and EDAR in East Asians. We also identified a number of novel candidates and show that populations with greater relatedness share more sweep signals.
doi_str_mv 10.1093/molbev/msy205
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6409434</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2130304871</sourcerecordid><originalsourceid>FETCH-LOGICAL-c387t-19079704eedef37a36966eb9930bb5bc79e920be1a5bc906515a555500de602f3</originalsourceid><addsrcrecordid>eNpVkE1LAzEQhoMotlaPXmWPXtZONtmkuQilWD8oCFLPIbs720b2oybbSv31prQWzWUyzJN3wkPINYU7CooN67bKcDOs_TaB9IT0acpkTCVVp6QPMtw5sFGPXHj_AUA5F-Kc9BgwNRI86ZOXWZubyn7bZhGZpogmlfHelttdPy7MqrMbjObGLbDz0ZftltHcYeCmturQYRG94cJheNI2l-SsNJXHq0MdkPfpw3zyFM9eH58n41mcs5HsYqpAKgkcscCSScOEEgIzpRhkWZrlUqFKIENqQqNApDQ1aTgABQpISjYg9_vc1Tqrscix6Zyp9MrZ2ritbo3V_yeNXepFu9GCg-KMh4DbQ4BrP9foO11bn2NVmQbbtdcJDX6AjyQNaLxHc9d677A8rqGgd_713r_e-w_8zd-_Helf4ewHGfmEqg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2130304871</pqid></control><display><type>article</type><title>Localizing and Classifying Adaptive Targets with Trend Filtered Regression</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Mughal, Mehreen R ; DeGiorgio, Michael</creator><creatorcontrib>Mughal, Mehreen R ; DeGiorgio, Michael</creatorcontrib><description>Identifying genomic locations of natural selection from sequence data is an ongoing challenge in population genetics. Current methods utilizing information combined from several summary statistics typically assume no correlation of summary statistics regardless of the genomic location from which they are calculated. However, due to linkage disequilibrium, summary statistics calculated at nearby genomic positions are highly correlated. We introduce an approach termed Trendsetter that accounts for the similarity of statistics calculated from adjacent genomic regions through trend filtering, while reducing the effects of multicollinearity through regularization. Our penalized regression framework has high power to detect sweeps, is capable of classifying sweep regions as either hard or soft, and can be applied to other selection scenarios as well. We find that Trendsetter is robust to both extensive missing data and strong background selection, and has comparable power to similar current approaches. Moreover, the model learned by Trendsetter can be viewed as a set of curves modeling the spatial distribution of summary statistics in the genome. Application to human genomic data revealed positively selected regions previously discovered such as LCT in Europeans and EDAR in East Asians. We also identified a number of novel candidates and show that populations with greater relatedness share more sweep signals.</description><identifier>ISSN: 0737-4038</identifier><identifier>EISSN: 1537-1719</identifier><identifier>DOI: 10.1093/molbev/msy205</identifier><identifier>PMID: 30398642</identifier><language>eng</language><publisher>United States: Oxford University Press</publisher><subject>Computer Simulation ; Discoveries ; Genetic Techniques ; Genetics, Population - methods ; Genome, Human ; Humans ; Machine Learning ; Models, Genetic ; Regression Analysis ; Software</subject><ispartof>Molecular biology and evolution, 2019-02, Vol.36 (2), p.252-270</ispartof><rights>The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c387t-19079704eedef37a36966eb9930bb5bc79e920be1a5bc906515a555500de602f3</citedby><cites>FETCH-LOGICAL-c387t-19079704eedef37a36966eb9930bb5bc79e920be1a5bc906515a555500de602f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6409434/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6409434/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30398642$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Mughal, Mehreen R</creatorcontrib><creatorcontrib>DeGiorgio, Michael</creatorcontrib><title>Localizing and Classifying Adaptive Targets with Trend Filtered Regression</title><title>Molecular biology and evolution</title><addtitle>Mol Biol Evol</addtitle><description>Identifying genomic locations of natural selection from sequence data is an ongoing challenge in population genetics. Current methods utilizing information combined from several summary statistics typically assume no correlation of summary statistics regardless of the genomic location from which they are calculated. However, due to linkage disequilibrium, summary statistics calculated at nearby genomic positions are highly correlated. We introduce an approach termed Trendsetter that accounts for the similarity of statistics calculated from adjacent genomic regions through trend filtering, while reducing the effects of multicollinearity through regularization. Our penalized regression framework has high power to detect sweeps, is capable of classifying sweep regions as either hard or soft, and can be applied to other selection scenarios as well. We find that Trendsetter is robust to both extensive missing data and strong background selection, and has comparable power to similar current approaches. Moreover, the model learned by Trendsetter can be viewed as a set of curves modeling the spatial distribution of summary statistics in the genome. Application to human genomic data revealed positively selected regions previously discovered such as LCT in Europeans and EDAR in East Asians. We also identified a number of novel candidates and show that populations with greater relatedness share more sweep signals.</description><subject>Computer Simulation</subject><subject>Discoveries</subject><subject>Genetic Techniques</subject><subject>Genetics, Population - methods</subject><subject>Genome, Human</subject><subject>Humans</subject><subject>Machine Learning</subject><subject>Models, Genetic</subject><subject>Regression Analysis</subject><subject>Software</subject><issn>0737-4038</issn><issn>1537-1719</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpVkE1LAzEQhoMotlaPXmWPXtZONtmkuQilWD8oCFLPIbs720b2oybbSv31prQWzWUyzJN3wkPINYU7CooN67bKcDOs_TaB9IT0acpkTCVVp6QPMtw5sFGPXHj_AUA5F-Kc9BgwNRI86ZOXWZubyn7bZhGZpogmlfHelttdPy7MqrMbjObGLbDz0ZftltHcYeCmturQYRG94cJheNI2l-SsNJXHq0MdkPfpw3zyFM9eH58n41mcs5HsYqpAKgkcscCSScOEEgIzpRhkWZrlUqFKIENqQqNApDQ1aTgABQpISjYg9_vc1Tqrscix6Zyp9MrZ2ritbo3V_yeNXepFu9GCg-KMh4DbQ4BrP9foO11bn2NVmQbbtdcJDX6AjyQNaLxHc9d677A8rqGgd_713r_e-w_8zd-_Helf4ewHGfmEqg</recordid><startdate>20190201</startdate><enddate>20190201</enddate><creator>Mughal, Mehreen R</creator><creator>DeGiorgio, Michael</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20190201</creationdate><title>Localizing and Classifying Adaptive Targets with Trend Filtered Regression</title><author>Mughal, Mehreen R ; DeGiorgio, Michael</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c387t-19079704eedef37a36966eb9930bb5bc79e920be1a5bc906515a555500de602f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Simulation</topic><topic>Discoveries</topic><topic>Genetic Techniques</topic><topic>Genetics, Population - methods</topic><topic>Genome, Human</topic><topic>Humans</topic><topic>Machine Learning</topic><topic>Models, Genetic</topic><topic>Regression Analysis</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mughal, Mehreen R</creatorcontrib><creatorcontrib>DeGiorgio, Michael</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Molecular biology and evolution</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mughal, Mehreen R</au><au>DeGiorgio, Michael</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Localizing and Classifying Adaptive Targets with Trend Filtered Regression</atitle><jtitle>Molecular biology and evolution</jtitle><addtitle>Mol Biol Evol</addtitle><date>2019-02-01</date><risdate>2019</risdate><volume>36</volume><issue>2</issue><spage>252</spage><epage>270</epage><pages>252-270</pages><issn>0737-4038</issn><eissn>1537-1719</eissn><abstract>Identifying genomic locations of natural selection from sequence data is an ongoing challenge in population genetics. Current methods utilizing information combined from several summary statistics typically assume no correlation of summary statistics regardless of the genomic location from which they are calculated. However, due to linkage disequilibrium, summary statistics calculated at nearby genomic positions are highly correlated. We introduce an approach termed Trendsetter that accounts for the similarity of statistics calculated from adjacent genomic regions through trend filtering, while reducing the effects of multicollinearity through regularization. Our penalized regression framework has high power to detect sweeps, is capable of classifying sweep regions as either hard or soft, and can be applied to other selection scenarios as well. We find that Trendsetter is robust to both extensive missing data and strong background selection, and has comparable power to similar current approaches. Moreover, the model learned by Trendsetter can be viewed as a set of curves modeling the spatial distribution of summary statistics in the genome. Application to human genomic data revealed positively selected regions previously discovered such as LCT in Europeans and EDAR in East Asians. We also identified a number of novel candidates and show that populations with greater relatedness share more sweep signals.</abstract><cop>United States</cop><pub>Oxford University Press</pub><pmid>30398642</pmid><doi>10.1093/molbev/msy205</doi><tpages>19</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0737-4038
ispartof Molecular biology and evolution, 2019-02, Vol.36 (2), p.252-270
issn 0737-4038
1537-1719
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6409434
source Oxford Journals Open Access Collection; MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry
subjects Computer Simulation
Discoveries
Genetic Techniques
Genetics, Population - methods
Genome, Human
Humans
Machine Learning
Models, Genetic
Regression Analysis
Software
title Localizing and Classifying Adaptive Targets with Trend Filtered Regression
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T08%3A32%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Localizing%20and%20Classifying%20Adaptive%20Targets%20with%20Trend%20Filtered%20Regression&rft.jtitle=Molecular%20biology%20and%20evolution&rft.au=Mughal,%20Mehreen%20R&rft.date=2019-02-01&rft.volume=36&rft.issue=2&rft.spage=252&rft.epage=270&rft.pages=252-270&rft.issn=0737-4038&rft.eissn=1537-1719&rft_id=info:doi/10.1093/molbev/msy205&rft_dat=%3Cproquest_pubme%3E2130304871%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2130304871&rft_id=info:pmid/30398642&rfr_iscdi=true