Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic

HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the P...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientific reports 2016-12, Vol.6 (1), p.39489-39489, Article 39489
Hauptverfasser: Yebra, Gonzalo, Hodcroft, Emma B., Ragonnet-Cronin, Manon L., Pillay, Deenan, Brown, Andrew J. Leigh, Fraser, Christophe, Kellam, Paul, de Oliveira, Tulio, Dennis, Ann, Hoppe, Anne, Kityo, Cissy, Frampton, Dan, Ssemwanga, Deogratius, Tanser, Frank, Keshani, Jagoda, Lingappa, Jairam, Herbeck, Joshua, Wawer, Maria, Essex, Max, Cohen, Myron S., Paton, Nicholas, Ratmann, Oliver, Kaleebu, Pontiano, Hayes, Richard, Fidler, Sarah, Quinn, Thomas, Novitsky, Vladimir, Haywards, Andrew, Nastouli, Eleni, Morris, Steven, Clark, Duncan, Kozlakidis, Zisis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 39489
container_issue 1
container_start_page 39489
container_title Scientific reports
container_volume 6
creator Yebra, Gonzalo
Hodcroft, Emma B.
Ragonnet-Cronin, Manon L.
Pillay, Deenan
Brown, Andrew J. Leigh
Fraser, Christophe
Kellam, Paul
de Oliveira, Tulio
Dennis, Ann
Hoppe, Anne
Kityo, Cissy
Frampton, Dan
Ssemwanga, Deogratius
Tanser, Frank
Keshani, Jagoda
Lingappa, Jairam
Herbeck, Joshua
Wawer, Maria
Essex, Max
Cohen, Myron S.
Paton, Nicholas
Ratmann, Oliver
Kaleebu, Pontiano
Hayes, Richard
Fidler, Sarah
Quinn, Thomas
Novitsky, Vladimir
Haywards, Andrew
Nastouli, Eleni
Morris, Steven
Clark, Duncan
Kozlakidis, Zisis
description HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes ( gag - pol - env, gag - pol, gag, pol, env and partial pol ) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag - pol - env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.
doi_str_mv 10.1038/srep39489
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5180198</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1899447358</sourcerecordid><originalsourceid>FETCH-LOGICAL-c438t-1a0b168a71fa308a6d6d959a49c8137e6df3da2ef675f8b88ba60f787d622d263</originalsourceid><addsrcrecordid>eNplkV1LHTEQhkNRqlgv-gdKoDdW2DZfu5vcCCKtCoI3tbchJ5k9RrLJNtkVzr9v5NjDqc3NBObhmRlehD5S8pUSLr-VDBNXQqp36JgR0TaMM3aw9z9Cp6U8kfpapgRV79ERk4RIJdpj5B6Kj2scweSwwcMSQrOGmEbAN7e_cIHfC0QL2JnZYD9OOT1DwdPjJqSKbXAGm2KZ82JnnyL2ERtc_LgEM4PDMHkHo7cf0OFgQoHT13qCHn58_3l109zdX99eXd41VnA5N9SQFe2k6elgOJGmc51TrTJCWUl5D50buDMMhq5vB7mScmU6MvSydx1jjnX8BF1svdOyGsFZiHM2QU_ZjyZvdDJe_9uJ_lGv07NuqSRUySo4exXkVC8vsx59sRCCiZCWoqlsWS85EaSin9-gT2nJsZ5XKaWE6Hn7IvyypWxOpSY17JahRL_Ep3fxVfbT_vY78m9YFTjfAqW24hry3sj_bH8A4Iyl7w</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1899447358</pqid></control><display><type>article</type><title>Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic</title><source>Springer Open Access</source><source>MEDLINE</source><source>Nature Free</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><source>EZB Electronic Journals Library</source><creator>Yebra, Gonzalo ; Hodcroft, Emma B. ; Ragonnet-Cronin, Manon L. ; Pillay, Deenan ; Brown, Andrew J. Leigh ; Fraser, Christophe ; Kellam, Paul ; de Oliveira, Tulio ; Dennis, Ann ; Hoppe, Anne ; Kityo, Cissy ; Frampton, Dan ; Ssemwanga, Deogratius ; Tanser, Frank ; Keshani, Jagoda ; Lingappa, Jairam ; Herbeck, Joshua ; Wawer, Maria ; Essex, Max ; Cohen, Myron S. ; Paton, Nicholas ; Ratmann, Oliver ; Kaleebu, Pontiano ; Hayes, Richard ; Fidler, Sarah ; Quinn, Thomas ; Novitsky, Vladimir ; Haywards, Andrew ; Nastouli, Eleni ; Morris, Steven ; Clark, Duncan ; Kozlakidis, Zisis</creator><creatorcontrib>Yebra, Gonzalo ; Hodcroft, Emma B. ; Ragonnet-Cronin, Manon L. ; Pillay, Deenan ; Brown, Andrew J. Leigh ; Fraser, Christophe ; Kellam, Paul ; de Oliveira, Tulio ; Dennis, Ann ; Hoppe, Anne ; Kityo, Cissy ; Frampton, Dan ; Ssemwanga, Deogratius ; Tanser, Frank ; Keshani, Jagoda ; Lingappa, Jairam ; Herbeck, Joshua ; Wawer, Maria ; Essex, Max ; Cohen, Myron S. ; Paton, Nicholas ; Ratmann, Oliver ; Kaleebu, Pontiano ; Hayes, Richard ; Fidler, Sarah ; Quinn, Thomas ; Novitsky, Vladimir ; Haywards, Andrew ; Nastouli, Eleni ; Morris, Steven ; Clark, Duncan ; Kozlakidis, Zisis ; PANGEA_HIV Consortium ; ICONIC Project</creatorcontrib><description>HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes ( gag - pol - env, gag - pol, gag, pol, env and partial pol ) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag - pol - env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.</description><identifier>ISSN: 2045-2322</identifier><identifier>EISSN: 2045-2322</identifier><identifier>DOI: 10.1038/srep39489</identifier><identifier>PMID: 28008945</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>631/114/739 ; 631/181/757 ; 692/699/255/1901 ; Cohort Studies ; Coverage ; Epidemics ; Epidemiology ; Genes ; Genes, env ; Genes, gag ; Genes, pol ; Genome, Viral ; Genomes ; HIV - genetics ; HIV Infections - epidemiology ; HIV Infections - virology ; Humanities and Social Sciences ; Humans ; Likelihood Functions ; Molecular Epidemiology ; multidisciplinary ; Nucleotide sequence ; Phylogeny ; Pol gene ; Regression Analysis ; Reproducibility of Results ; Sampling ; Science ; South Africa ; Trees ; United Kingdom</subject><ispartof>Scientific reports, 2016-12, Vol.6 (1), p.39489-39489, Article 39489</ispartof><rights>The Author(s) 2016</rights><rights>Copyright Nature Publishing Group Dec 2016</rights><rights>Copyright © 2016, The Author(s) 2016 The Author(s)</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c438t-1a0b168a71fa308a6d6d959a49c8137e6df3da2ef675f8b88ba60f787d622d263</citedby><cites>FETCH-LOGICAL-c438t-1a0b168a71fa308a6d6d959a49c8137e6df3da2ef675f8b88ba60f787d622d263</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5180198/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5180198/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,27903,27904,41099,42168,51554,53769,53771</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28008945$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yebra, Gonzalo</creatorcontrib><creatorcontrib>Hodcroft, Emma B.</creatorcontrib><creatorcontrib>Ragonnet-Cronin, Manon L.</creatorcontrib><creatorcontrib>Pillay, Deenan</creatorcontrib><creatorcontrib>Brown, Andrew J. Leigh</creatorcontrib><creatorcontrib>Fraser, Christophe</creatorcontrib><creatorcontrib>Kellam, Paul</creatorcontrib><creatorcontrib>de Oliveira, Tulio</creatorcontrib><creatorcontrib>Dennis, Ann</creatorcontrib><creatorcontrib>Hoppe, Anne</creatorcontrib><creatorcontrib>Kityo, Cissy</creatorcontrib><creatorcontrib>Frampton, Dan</creatorcontrib><creatorcontrib>Ssemwanga, Deogratius</creatorcontrib><creatorcontrib>Tanser, Frank</creatorcontrib><creatorcontrib>Keshani, Jagoda</creatorcontrib><creatorcontrib>Lingappa, Jairam</creatorcontrib><creatorcontrib>Herbeck, Joshua</creatorcontrib><creatorcontrib>Wawer, Maria</creatorcontrib><creatorcontrib>Essex, Max</creatorcontrib><creatorcontrib>Cohen, Myron S.</creatorcontrib><creatorcontrib>Paton, Nicholas</creatorcontrib><creatorcontrib>Ratmann, Oliver</creatorcontrib><creatorcontrib>Kaleebu, Pontiano</creatorcontrib><creatorcontrib>Hayes, Richard</creatorcontrib><creatorcontrib>Fidler, Sarah</creatorcontrib><creatorcontrib>Quinn, Thomas</creatorcontrib><creatorcontrib>Novitsky, Vladimir</creatorcontrib><creatorcontrib>Haywards, Andrew</creatorcontrib><creatorcontrib>Nastouli, Eleni</creatorcontrib><creatorcontrib>Morris, Steven</creatorcontrib><creatorcontrib>Clark, Duncan</creatorcontrib><creatorcontrib>Kozlakidis, Zisis</creatorcontrib><creatorcontrib>PANGEA_HIV Consortium</creatorcontrib><creatorcontrib>ICONIC Project</creatorcontrib><title>Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic</title><title>Scientific reports</title><addtitle>Sci Rep</addtitle><addtitle>Sci Rep</addtitle><description>HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes ( gag - pol - env, gag - pol, gag, pol, env and partial pol ) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag - pol - env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.</description><subject>631/114/739</subject><subject>631/181/757</subject><subject>692/699/255/1901</subject><subject>Cohort Studies</subject><subject>Coverage</subject><subject>Epidemics</subject><subject>Epidemiology</subject><subject>Genes</subject><subject>Genes, env</subject><subject>Genes, gag</subject><subject>Genes, pol</subject><subject>Genome, Viral</subject><subject>Genomes</subject><subject>HIV - genetics</subject><subject>HIV Infections - epidemiology</subject><subject>HIV Infections - virology</subject><subject>Humanities and Social Sciences</subject><subject>Humans</subject><subject>Likelihood Functions</subject><subject>Molecular Epidemiology</subject><subject>multidisciplinary</subject><subject>Nucleotide sequence</subject><subject>Phylogeny</subject><subject>Pol gene</subject><subject>Regression Analysis</subject><subject>Reproducibility of Results</subject><subject>Sampling</subject><subject>Science</subject><subject>South Africa</subject><subject>Trees</subject><subject>United Kingdom</subject><issn>2045-2322</issn><issn>2045-2322</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><recordid>eNplkV1LHTEQhkNRqlgv-gdKoDdW2DZfu5vcCCKtCoI3tbchJ5k9RrLJNtkVzr9v5NjDqc3NBObhmRlehD5S8pUSLr-VDBNXQqp36JgR0TaMM3aw9z9Cp6U8kfpapgRV79ERk4RIJdpj5B6Kj2scweSwwcMSQrOGmEbAN7e_cIHfC0QL2JnZYD9OOT1DwdPjJqSKbXAGm2KZ82JnnyL2ERtc_LgEM4PDMHkHo7cf0OFgQoHT13qCHn58_3l109zdX99eXd41VnA5N9SQFe2k6elgOJGmc51TrTJCWUl5D50buDMMhq5vB7mScmU6MvSydx1jjnX8BF1svdOyGsFZiHM2QU_ZjyZvdDJe_9uJ_lGv07NuqSRUySo4exXkVC8vsx59sRCCiZCWoqlsWS85EaSin9-gT2nJsZ5XKaWE6Hn7IvyypWxOpSY17JahRL_Ep3fxVfbT_vY78m9YFTjfAqW24hry3sj_bH8A4Iyl7w</recordid><startdate>20161223</startdate><enddate>20161223</enddate><creator>Yebra, Gonzalo</creator><creator>Hodcroft, Emma B.</creator><creator>Ragonnet-Cronin, Manon L.</creator><creator>Pillay, Deenan</creator><creator>Brown, Andrew J. Leigh</creator><creator>Fraser, Christophe</creator><creator>Kellam, Paul</creator><creator>de Oliveira, Tulio</creator><creator>Dennis, Ann</creator><creator>Hoppe, Anne</creator><creator>Kityo, Cissy</creator><creator>Frampton, Dan</creator><creator>Ssemwanga, Deogratius</creator><creator>Tanser, Frank</creator><creator>Keshani, Jagoda</creator><creator>Lingappa, Jairam</creator><creator>Herbeck, Joshua</creator><creator>Wawer, Maria</creator><creator>Essex, Max</creator><creator>Cohen, Myron S.</creator><creator>Paton, Nicholas</creator><creator>Ratmann, Oliver</creator><creator>Kaleebu, Pontiano</creator><creator>Hayes, Richard</creator><creator>Fidler, Sarah</creator><creator>Quinn, Thomas</creator><creator>Novitsky, Vladimir</creator><creator>Haywards, Andrew</creator><creator>Nastouli, Eleni</creator><creator>Morris, Steven</creator><creator>Clark, Duncan</creator><creator>Kozlakidis, Zisis</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><scope>C6C</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20161223</creationdate><title>Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic</title><author>Yebra, Gonzalo ; Hodcroft, Emma B. ; Ragonnet-Cronin, Manon L. ; Pillay, Deenan ; Brown, Andrew J. Leigh ; Fraser, Christophe ; Kellam, Paul ; de Oliveira, Tulio ; Dennis, Ann ; Hoppe, Anne ; Kityo, Cissy ; Frampton, Dan ; Ssemwanga, Deogratius ; Tanser, Frank ; Keshani, Jagoda ; Lingappa, Jairam ; Herbeck, Joshua ; Wawer, Maria ; Essex, Max ; Cohen, Myron S. ; Paton, Nicholas ; Ratmann, Oliver ; Kaleebu, Pontiano ; Hayes, Richard ; Fidler, Sarah ; Quinn, Thomas ; Novitsky, Vladimir ; Haywards, Andrew ; Nastouli, Eleni ; Morris, Steven ; Clark, Duncan ; Kozlakidis, Zisis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c438t-1a0b168a71fa308a6d6d959a49c8137e6df3da2ef675f8b88ba60f787d622d263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>631/114/739</topic><topic>631/181/757</topic><topic>692/699/255/1901</topic><topic>Cohort Studies</topic><topic>Coverage</topic><topic>Epidemics</topic><topic>Epidemiology</topic><topic>Genes</topic><topic>Genes, env</topic><topic>Genes, gag</topic><topic>Genes, pol</topic><topic>Genome, Viral</topic><topic>Genomes</topic><topic>HIV - genetics</topic><topic>HIV Infections - epidemiology</topic><topic>HIV Infections - virology</topic><topic>Humanities and Social Sciences</topic><topic>Humans</topic><topic>Likelihood Functions</topic><topic>Molecular Epidemiology</topic><topic>multidisciplinary</topic><topic>Nucleotide sequence</topic><topic>Phylogeny</topic><topic>Pol gene</topic><topic>Regression Analysis</topic><topic>Reproducibility of Results</topic><topic>Sampling</topic><topic>Science</topic><topic>South Africa</topic><topic>Trees</topic><topic>United Kingdom</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yebra, Gonzalo</creatorcontrib><creatorcontrib>Hodcroft, Emma B.</creatorcontrib><creatorcontrib>Ragonnet-Cronin, Manon L.</creatorcontrib><creatorcontrib>Pillay, Deenan</creatorcontrib><creatorcontrib>Brown, Andrew J. Leigh</creatorcontrib><creatorcontrib>Fraser, Christophe</creatorcontrib><creatorcontrib>Kellam, Paul</creatorcontrib><creatorcontrib>de Oliveira, Tulio</creatorcontrib><creatorcontrib>Dennis, Ann</creatorcontrib><creatorcontrib>Hoppe, Anne</creatorcontrib><creatorcontrib>Kityo, Cissy</creatorcontrib><creatorcontrib>Frampton, Dan</creatorcontrib><creatorcontrib>Ssemwanga, Deogratius</creatorcontrib><creatorcontrib>Tanser, Frank</creatorcontrib><creatorcontrib>Keshani, Jagoda</creatorcontrib><creatorcontrib>Lingappa, Jairam</creatorcontrib><creatorcontrib>Herbeck, Joshua</creatorcontrib><creatorcontrib>Wawer, Maria</creatorcontrib><creatorcontrib>Essex, Max</creatorcontrib><creatorcontrib>Cohen, Myron S.</creatorcontrib><creatorcontrib>Paton, Nicholas</creatorcontrib><creatorcontrib>Ratmann, Oliver</creatorcontrib><creatorcontrib>Kaleebu, Pontiano</creatorcontrib><creatorcontrib>Hayes, Richard</creatorcontrib><creatorcontrib>Fidler, Sarah</creatorcontrib><creatorcontrib>Quinn, Thomas</creatorcontrib><creatorcontrib>Novitsky, Vladimir</creatorcontrib><creatorcontrib>Haywards, Andrew</creatorcontrib><creatorcontrib>Nastouli, Eleni</creatorcontrib><creatorcontrib>Morris, Steven</creatorcontrib><creatorcontrib>Clark, Duncan</creatorcontrib><creatorcontrib>Kozlakidis, Zisis</creatorcontrib><creatorcontrib>PANGEA_HIV Consortium</creatorcontrib><creatorcontrib>ICONIC Project</creatorcontrib><collection>Springer Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Science Database</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Scientific reports</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yebra, Gonzalo</au><au>Hodcroft, Emma B.</au><au>Ragonnet-Cronin, Manon L.</au><au>Pillay, Deenan</au><au>Brown, Andrew J. Leigh</au><au>Fraser, Christophe</au><au>Kellam, Paul</au><au>de Oliveira, Tulio</au><au>Dennis, Ann</au><au>Hoppe, Anne</au><au>Kityo, Cissy</au><au>Frampton, Dan</au><au>Ssemwanga, Deogratius</au><au>Tanser, Frank</au><au>Keshani, Jagoda</au><au>Lingappa, Jairam</au><au>Herbeck, Joshua</au><au>Wawer, Maria</au><au>Essex, Max</au><au>Cohen, Myron S.</au><au>Paton, Nicholas</au><au>Ratmann, Oliver</au><au>Kaleebu, Pontiano</au><au>Hayes, Richard</au><au>Fidler, Sarah</au><au>Quinn, Thomas</au><au>Novitsky, Vladimir</au><au>Haywards, Andrew</au><au>Nastouli, Eleni</au><au>Morris, Steven</au><au>Clark, Duncan</au><au>Kozlakidis, Zisis</au><aucorp>PANGEA_HIV Consortium</aucorp><aucorp>ICONIC Project</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic</atitle><jtitle>Scientific reports</jtitle><stitle>Sci Rep</stitle><addtitle>Sci Rep</addtitle><date>2016-12-23</date><risdate>2016</risdate><volume>6</volume><issue>1</issue><spage>39489</spage><epage>39489</epage><pages>39489-39489</pages><artnum>39489</artnum><issn>2045-2322</issn><eissn>2045-2322</eissn><abstract>HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes ( gag - pol - env, gag - pol, gag, pol, env and partial pol ) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag - pol - env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>28008945</pmid><doi>10.1038/srep39489</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2045-2322
ispartof Scientific reports, 2016-12, Vol.6 (1), p.39489-39489, Article 39489
issn 2045-2322
2045-2322
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5180198
source Springer Open Access; MEDLINE; Nature Free; DOAJ Directory of Open Access Journals; PubMed Central; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry; EZB Electronic Journals Library
subjects 631/114/739
631/181/757
692/699/255/1901
Cohort Studies
Coverage
Epidemics
Epidemiology
Genes
Genes, env
Genes, gag
Genes, pol
Genome, Viral
Genomes
HIV - genetics
HIV Infections - epidemiology
HIV Infections - virology
Humanities and Social Sciences
Humans
Likelihood Functions
Molecular Epidemiology
multidisciplinary
Nucleotide sequence
Phylogeny
Pol gene
Regression Analysis
Reproducibility of Results
Sampling
Science
South Africa
Trees
United Kingdom
title Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T20%3A34%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Using%20nearly%20full-genome%20HIV%20sequence%20data%20improves%20phylogeny%20reconstruction%20in%20a%20simulated%20epidemic&rft.jtitle=Scientific%20reports&rft.au=Yebra,%20Gonzalo&rft.aucorp=PANGEA_HIV%20Consortium&rft.date=2016-12-23&rft.volume=6&rft.issue=1&rft.spage=39489&rft.epage=39489&rft.pages=39489-39489&rft.artnum=39489&rft.issn=2045-2322&rft.eissn=2045-2322&rft_id=info:doi/10.1038/srep39489&rft_dat=%3Cproquest_pubme%3E1899447358%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1899447358&rft_id=info:pmid/28008945&rfr_iscdi=true