An accurate method for identifying recent recombinants from unaligned sequences

Abstract Motivation Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2022-03, Vol.38 (7), p.1823-1829
Hauptverfasser: Feng, Qian, Tiedje, Kathryn E, Ruybal-Pesántez, Shazia, Tonkin-Hill, Gerry, Duffy, Michael F, Day, Karen P, Shim, Heejung, Chan, Yao-Ban
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1829
container_issue 7
container_start_page 1823
container_title Bioinformatics
container_volume 38
creator Feng, Qian
Tiedje, Kathryn E
Ruybal-Pesántez, Shazia
Tonkin-Hill, Gerry
Duffy, Michael F
Day, Karen P
Shim, Heejung
Chan, Yao-Ban
description Abstract Motivation Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. Results We introduce an algorithm to detect recent recombinant sequences from a dataset without a full multiple alignment. Our algorithm can handle thousands of gene-length sequences without the need for a reference panel. We demonstrate the accuracy of our algorithm through extensive numerical simulations; in particular, it maintains its effectiveness in the presence of insertions and deletions. We apply our algorithm to a dataset of 17 335 DBLα types in var genes from Ghana, observing that sequences belonging to the same ups group or domain subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones. Availability and implementation Source code is freely available at https://github.com/qianfeng2/detREC_program. Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btac012
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8963311</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btac012</oup_id><sourcerecordid>2620082347</sourcerecordid><originalsourceid>FETCH-LOGICAL-c456t-46fd193354f0a159aedf308bedc1b8e5b80ae615a38ec5e7300654f57380d4283</originalsourceid><addsrcrecordid>eNqNkUtLxDAUhYMojq-_IF26qd40j0k3wjD4AsGNrkOa3o6RNhmTVvDfG5lRnJ2rm5BzvnvIIeScwiWFml01LjjfhTiY0dl01YzGAq32yBHlEsoKRL2fz0zOS66AzchxSm8AgnLOD8mMCahErdQReVr4wlg7RTNiMeD4GtoiYwvXoh9d9-n8qoho8-V7hKFx3vgxFV0MQzF507uVx7ZI-D6ht5hOyUFn-oRn23lCXm5vnpf35ePT3cNy8VhaLuRYctm1tGZM8A4MFbXBtmOgGmwtbRSKRoFBSYVhCq3AOQOQWSvmTEHLK8VOyPWGu56aIbtywGh6vY5uMPFTB-P07ot3r3oVPrSqJWOUZsDFFhBDzp5GPbhkse-NxzAlXckKQFWMz7NUbqQ2hpQidr9rKOjvNvRuG3rbRjae_w35a_v5_iygG0GY1v-FfgH5KJ_m</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2620082347</pqid></control><display><type>article</type><title>An accurate method for identifying recent recombinants from unaligned sequences</title><source>Oxford Journals Open Access Collection</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Feng, Qian ; Tiedje, Kathryn E ; Ruybal-Pesántez, Shazia ; Tonkin-Hill, Gerry ; Duffy, Michael F ; Day, Karen P ; Shim, Heejung ; Chan, Yao-Ban</creator><creatorcontrib>Feng, Qian ; Tiedje, Kathryn E ; Ruybal-Pesántez, Shazia ; Tonkin-Hill, Gerry ; Duffy, Michael F ; Day, Karen P ; Shim, Heejung ; Chan, Yao-Ban</creatorcontrib><description>Abstract Motivation Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. Results We introduce an algorithm to detect recent recombinant sequences from a dataset without a full multiple alignment. Our algorithm can handle thousands of gene-length sequences without the need for a reference panel. We demonstrate the accuracy of our algorithm through extensive numerical simulations; in particular, it maintains its effectiveness in the presence of insertions and deletions. We apply our algorithm to a dataset of 17 335 DBLα types in var genes from Ghana, observing that sequences belonging to the same ups group or domain subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones. Availability and implementation Source code is freely available at https://github.com/qianfeng2/detREC_program. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btac012</identifier><identifier>PMID: 35025988</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Original Papers</subject><ispartof>Bioinformatics, 2022-03, Vol.38 (7), p.1823-1829</ispartof><rights>The Author(s) 2022. Published by Oxford University Press. 2022</rights><rights>The Author(s) (2022). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c456t-46fd193354f0a159aedf308bedc1b8e5b80ae615a38ec5e7300654f57380d4283</citedby><cites>FETCH-LOGICAL-c456t-46fd193354f0a159aedf308bedc1b8e5b80ae615a38ec5e7300654f57380d4283</cites><orcidid>0000-0003-4301-8545 ; 0000-0002-1375-9310 ; 0000-0002-0495-179X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8963311/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8963311/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,1598,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35025988$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Feng, Qian</creatorcontrib><creatorcontrib>Tiedje, Kathryn E</creatorcontrib><creatorcontrib>Ruybal-Pesántez, Shazia</creatorcontrib><creatorcontrib>Tonkin-Hill, Gerry</creatorcontrib><creatorcontrib>Duffy, Michael F</creatorcontrib><creatorcontrib>Day, Karen P</creatorcontrib><creatorcontrib>Shim, Heejung</creatorcontrib><creatorcontrib>Chan, Yao-Ban</creatorcontrib><title>An accurate method for identifying recent recombinants from unaligned sequences</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. Results We introduce an algorithm to detect recent recombinant sequences from a dataset without a full multiple alignment. Our algorithm can handle thousands of gene-length sequences without the need for a reference panel. We demonstrate the accuracy of our algorithm through extensive numerical simulations; in particular, it maintains its effectiveness in the presence of insertions and deletions. We apply our algorithm to a dataset of 17 335 DBLα types in var genes from Ghana, observing that sequences belonging to the same ups group or domain subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones. Availability and implementation Source code is freely available at https://github.com/qianfeng2/detREC_program. Supplementary information Supplementary data are available at Bioinformatics online.</description><subject>Original Papers</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><recordid>eNqNkUtLxDAUhYMojq-_IF26qd40j0k3wjD4AsGNrkOa3o6RNhmTVvDfG5lRnJ2rm5BzvnvIIeScwiWFml01LjjfhTiY0dl01YzGAq32yBHlEsoKRL2fz0zOS66AzchxSm8AgnLOD8mMCahErdQReVr4wlg7RTNiMeD4GtoiYwvXoh9d9-n8qoho8-V7hKFx3vgxFV0MQzF507uVx7ZI-D6ht5hOyUFn-oRn23lCXm5vnpf35ePT3cNy8VhaLuRYctm1tGZM8A4MFbXBtmOgGmwtbRSKRoFBSYVhCq3AOQOQWSvmTEHLK8VOyPWGu56aIbtywGh6vY5uMPFTB-P07ot3r3oVPrSqJWOUZsDFFhBDzp5GPbhkse-NxzAlXckKQFWMz7NUbqQ2hpQidr9rKOjvNvRuG3rbRjae_w35a_v5_iygG0GY1v-FfgH5KJ_m</recordid><startdate>20220328</startdate><enddate>20220328</enddate><creator>Feng, Qian</creator><creator>Tiedje, Kathryn E</creator><creator>Ruybal-Pesántez, Shazia</creator><creator>Tonkin-Hill, Gerry</creator><creator>Duffy, Michael F</creator><creator>Day, Karen P</creator><creator>Shim, Heejung</creator><creator>Chan, Yao-Ban</creator><general>Oxford University Press</general><scope>TOX</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-4301-8545</orcidid><orcidid>https://orcid.org/0000-0002-1375-9310</orcidid><orcidid>https://orcid.org/0000-0002-0495-179X</orcidid></search><sort><creationdate>20220328</creationdate><title>An accurate method for identifying recent recombinants from unaligned sequences</title><author>Feng, Qian ; Tiedje, Kathryn E ; Ruybal-Pesántez, Shazia ; Tonkin-Hill, Gerry ; Duffy, Michael F ; Day, Karen P ; Shim, Heejung ; Chan, Yao-Ban</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c456t-46fd193354f0a159aedf308bedc1b8e5b80ae615a38ec5e7300654f57380d4283</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Original Papers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Feng, Qian</creatorcontrib><creatorcontrib>Tiedje, Kathryn E</creatorcontrib><creatorcontrib>Ruybal-Pesántez, Shazia</creatorcontrib><creatorcontrib>Tonkin-Hill, Gerry</creatorcontrib><creatorcontrib>Duffy, Michael F</creatorcontrib><creatorcontrib>Day, Karen P</creatorcontrib><creatorcontrib>Shim, Heejung</creatorcontrib><creatorcontrib>Chan, Yao-Ban</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Feng, Qian</au><au>Tiedje, Kathryn E</au><au>Ruybal-Pesántez, Shazia</au><au>Tonkin-Hill, Gerry</au><au>Duffy, Michael F</au><au>Day, Karen P</au><au>Shim, Heejung</au><au>Chan, Yao-Ban</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An accurate method for identifying recent recombinants from unaligned sequences</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2022-03-28</date><risdate>2022</risdate><volume>38</volume><issue>7</issue><spage>1823</spage><epage>1829</epage><pages>1823-1829</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract Motivation Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. Results We introduce an algorithm to detect recent recombinant sequences from a dataset without a full multiple alignment. Our algorithm can handle thousands of gene-length sequences without the need for a reference panel. We demonstrate the accuracy of our algorithm through extensive numerical simulations; in particular, it maintains its effectiveness in the presence of insertions and deletions. We apply our algorithm to a dataset of 17 335 DBLα types in var genes from Ghana, observing that sequences belonging to the same ups group or domain subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones. Availability and implementation Source code is freely available at https://github.com/qianfeng2/detREC_program. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>35025988</pmid><doi>10.1093/bioinformatics/btac012</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0003-4301-8545</orcidid><orcidid>https://orcid.org/0000-0002-1375-9310</orcidid><orcidid>https://orcid.org/0000-0002-0495-179X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2022-03, Vol.38 (7), p.1823-1829
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8963311
source Oxford Journals Open Access Collection; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Alma/SFX Local Collection
subjects Original Papers
title An accurate method for identifying recent recombinants from unaligned sequences
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T19%3A30%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20accurate%20method%20for%20identifying%20recent%20recombinants%20from%20unaligned%20sequences&rft.jtitle=Bioinformatics&rft.au=Feng,%20Qian&rft.date=2022-03-28&rft.volume=38&rft.issue=7&rft.spage=1823&rft.epage=1829&rft.pages=1823-1829&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btac012&rft_dat=%3Cproquest_pubme%3E2620082347%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2620082347&rft_id=info:pmid/35025988&rft_oup_id=10.1093/bioinformatics/btac012&rfr_iscdi=true