VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models

The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS computational biology 2023-08, Vol.19 (8), p.e1011422-e1011422
Hauptverfasser: Rangel-Pineros, Guillermo, Almeida, Alexandre, Beracochea, Martin, Sakharova, Ekaterina, Marz, Manja, Reyes Muñoz, Alejandro, Hölzer, Martin, Finn, Robert D
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e1011422
container_issue 8
container_start_page e1011422
container_title PLoS computational biology
container_volume 19
creator Rangel-Pineros, Guillermo
Almeida, Alexandre
Beracochea, Martin
Sakharova, Ekaterina
Marz, Manja
Reyes Muñoz, Alejandro
Hölzer, Martin
Finn, Robert D
description The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.
doi_str_mv 10.1371/journal.pcbi.1011422
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2865519722</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A763760400</galeid><doaj_id>oai_doaj_org_article_e1d8683e840c49e5a3bfa6198610a74e</doaj_id><sourcerecordid>A763760400</sourcerecordid><originalsourceid>FETCH-LOGICAL-c662t-1c4776a462169408d2ce08f71d96830b9187b41c06da09b6f1a4fba14b80291d3</originalsourceid><addsrcrecordid>eNqVk81u1DAQxyMEoqXwBggscQGJXezEceJe0KriY6UCUvm4Wo49SV2ydrCdVfsMvDQOm1Zd1AvywSP7N__xzHiy7CnBS1JU5M2FG72V_XJQjVkSTAjN83vZISnLYlEVZX3_ln2QPQrhAuNkcvYwOygqVnBalYfZ7x_rM9NeHaOVRcZG6LyMoJGGCCoaZ18jaa2LcrKTqVGUl866jVFI9TIE0xq1uxzMAL2xgMZgbIe2xo9hEQZQE4IG7yIYO-2t6QGdG63Bok_S_3RbtHEa-vA4e9DKPsCTeT_Kvr9_9-3k4-L0y4f1yep0oRjL44IoWlVMUpYTximuda4A121FNGd1gRtO6qqhRGGmJeYNa4mkbSMJbWqcc6KLo-z5TnfoXRBzHYPIa1aWhFd5noj1jtBOXojBm430V8JJI_4eON8J6aNRPQgguk5hoaZYUQ6lLJpWMsJrRrCsKCStt3O0sdmAVmCjl_2e6P6NNeeic1tBMOWk4DgpvJwVvPs1QohiY4KCvpcW3Dg9PLWVkwrzhL74B707vZnqZMrA2NalwGoSFav0MyqGKZ7CLu-g0tKQ2u8sTI3cd3i155CYCJexk2MIYv317D_Yz_ss3bHKuxA8tDfFI1hMs3CdpJhmQcyzkNye3S78jdP15y_-ANIqBiY</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2865519722</pqid></control><display><type>article</type><title>VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Public Library of Science (PLoS)</source><creator>Rangel-Pineros, Guillermo ; Almeida, Alexandre ; Beracochea, Martin ; Sakharova, Ekaterina ; Marz, Manja ; Reyes Muñoz, Alejandro ; Hölzer, Martin ; Finn, Robert D</creator><contributor>Ouzounis, Christos A.</contributor><creatorcontrib>Rangel-Pineros, Guillermo ; Almeida, Alexandre ; Beracochea, Martin ; Sakharova, Ekaterina ; Marz, Manja ; Reyes Muñoz, Alejandro ; Hölzer, Martin ; Finn, Robert D ; Ouzounis, Christos A.</creatorcontrib><description>The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1011422</identifier><identifier>PMID: 37639475</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Annotations ; Assemblies ; Biodiversity ; Biology and Life Sciences ; Classification ; Computer and Information Sciences ; Computer applications ; Datasets ; Eukaryota ; Eukaryotic Cells ; Genome, Viral - genetics ; Genomes ; Hidden Markov models ; Host-virus relationships ; Humans ; Identification and classification ; Markov chains ; Metagenome - genetics ; Metagenomics ; Microbiota ; Microorganisms ; Oceans ; Physical sciences ; Pipeline design ; Prophages ; Proteins ; Sequences ; Taxonomy ; Viral genetics ; Viruses</subject><ispartof>PLoS computational biology, 2023-08, Vol.19 (8), p.e1011422-e1011422</ispartof><rights>Copyright: © 2023 Rangel-Pineros et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</rights><rights>COPYRIGHT 2023 Public Library of Science</rights><rights>2023 Rangel-Pineros et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2023 Rangel-Pineros et al 2023 Rangel-Pineros et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c662t-1c4776a462169408d2ce08f71d96830b9187b41c06da09b6f1a4fba14b80291d3</citedby><cites>FETCH-LOGICAL-c662t-1c4776a462169408d2ce08f71d96830b9187b41c06da09b6f1a4fba14b80291d3</cites><orcidid>0000-0003-2907-3265 ; 0000-0003-3472-3736 ; 0000-0001-8626-2148 ; 0000-0003-3848-4330 ; 0000-0001-7090-8717</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491390/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491390/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23847,27903,27904,53769,53771,79346,79347</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37639475$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Ouzounis, Christos A.</contributor><creatorcontrib>Rangel-Pineros, Guillermo</creatorcontrib><creatorcontrib>Almeida, Alexandre</creatorcontrib><creatorcontrib>Beracochea, Martin</creatorcontrib><creatorcontrib>Sakharova, Ekaterina</creatorcontrib><creatorcontrib>Marz, Manja</creatorcontrib><creatorcontrib>Reyes Muñoz, Alejandro</creatorcontrib><creatorcontrib>Hölzer, Martin</creatorcontrib><creatorcontrib>Finn, Robert D</creatorcontrib><title>VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.</description><subject>Annotations</subject><subject>Assemblies</subject><subject>Biodiversity</subject><subject>Biology and Life Sciences</subject><subject>Classification</subject><subject>Computer and Information Sciences</subject><subject>Computer applications</subject><subject>Datasets</subject><subject>Eukaryota</subject><subject>Eukaryotic Cells</subject><subject>Genome, Viral - genetics</subject><subject>Genomes</subject><subject>Hidden Markov models</subject><subject>Host-virus relationships</subject><subject>Humans</subject><subject>Identification and classification</subject><subject>Markov chains</subject><subject>Metagenome - genetics</subject><subject>Metagenomics</subject><subject>Microbiota</subject><subject>Microorganisms</subject><subject>Oceans</subject><subject>Physical sciences</subject><subject>Pipeline design</subject><subject>Prophages</subject><subject>Proteins</subject><subject>Sequences</subject><subject>Taxonomy</subject><subject>Viral genetics</subject><subject>Viruses</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqVk81u1DAQxyMEoqXwBggscQGJXezEceJe0KriY6UCUvm4Wo49SV2ydrCdVfsMvDQOm1Zd1AvywSP7N__xzHiy7CnBS1JU5M2FG72V_XJQjVkSTAjN83vZISnLYlEVZX3_ln2QPQrhAuNkcvYwOygqVnBalYfZ7x_rM9NeHaOVRcZG6LyMoJGGCCoaZ18jaa2LcrKTqVGUl866jVFI9TIE0xq1uxzMAL2xgMZgbIe2xo9hEQZQE4IG7yIYO-2t6QGdG63Bok_S_3RbtHEa-vA4e9DKPsCTeT_Kvr9_9-3k4-L0y4f1yep0oRjL44IoWlVMUpYTximuda4A121FNGd1gRtO6qqhRGGmJeYNa4mkbSMJbWqcc6KLo-z5TnfoXRBzHYPIa1aWhFd5noj1jtBOXojBm430V8JJI_4eON8J6aNRPQgguk5hoaZYUQ6lLJpWMsJrRrCsKCStt3O0sdmAVmCjl_2e6P6NNeeic1tBMOWk4DgpvJwVvPs1QohiY4KCvpcW3Dg9PLWVkwrzhL74B707vZnqZMrA2NalwGoSFav0MyqGKZ7CLu-g0tKQ2u8sTI3cd3i155CYCJexk2MIYv317D_Yz_ss3bHKuxA8tDfFI1hMs3CdpJhmQcyzkNye3S78jdP15y_-ANIqBiY</recordid><startdate>20230801</startdate><enddate>20230801</enddate><creator>Rangel-Pineros, Guillermo</creator><creator>Almeida, Alexandre</creator><creator>Beracochea, Martin</creator><creator>Sakharova, Ekaterina</creator><creator>Marz, Manja</creator><creator>Reyes Muñoz, Alejandro</creator><creator>Hölzer, Martin</creator><creator>Finn, Robert D</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2907-3265</orcidid><orcidid>https://orcid.org/0000-0003-3472-3736</orcidid><orcidid>https://orcid.org/0000-0001-8626-2148</orcidid><orcidid>https://orcid.org/0000-0003-3848-4330</orcidid><orcidid>https://orcid.org/0000-0001-7090-8717</orcidid></search><sort><creationdate>20230801</creationdate><title>VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models</title><author>Rangel-Pineros, Guillermo ; Almeida, Alexandre ; Beracochea, Martin ; Sakharova, Ekaterina ; Marz, Manja ; Reyes Muñoz, Alejandro ; Hölzer, Martin ; Finn, Robert D</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c662t-1c4776a462169408d2ce08f71d96830b9187b41c06da09b6f1a4fba14b80291d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Annotations</topic><topic>Assemblies</topic><topic>Biodiversity</topic><topic>Biology and Life Sciences</topic><topic>Classification</topic><topic>Computer and Information Sciences</topic><topic>Computer applications</topic><topic>Datasets</topic><topic>Eukaryota</topic><topic>Eukaryotic Cells</topic><topic>Genome, Viral - genetics</topic><topic>Genomes</topic><topic>Hidden Markov models</topic><topic>Host-virus relationships</topic><topic>Humans</topic><topic>Identification and classification</topic><topic>Markov chains</topic><topic>Metagenome - genetics</topic><topic>Metagenomics</topic><topic>Microbiota</topic><topic>Microorganisms</topic><topic>Oceans</topic><topic>Physical sciences</topic><topic>Pipeline design</topic><topic>Prophages</topic><topic>Proteins</topic><topic>Sequences</topic><topic>Taxonomy</topic><topic>Viral genetics</topic><topic>Viruses</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rangel-Pineros, Guillermo</creatorcontrib><creatorcontrib>Almeida, Alexandre</creatorcontrib><creatorcontrib>Beracochea, Martin</creatorcontrib><creatorcontrib>Sakharova, Ekaterina</creatorcontrib><creatorcontrib>Marz, Manja</creatorcontrib><creatorcontrib>Reyes Muñoz, Alejandro</creatorcontrib><creatorcontrib>Hölzer, Martin</creatorcontrib><creatorcontrib>Finn, Robert D</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rangel-Pineros, Guillermo</au><au>Almeida, Alexandre</au><au>Beracochea, Martin</au><au>Sakharova, Ekaterina</au><au>Marz, Manja</au><au>Reyes Muñoz, Alejandro</au><au>Hölzer, Martin</au><au>Finn, Robert D</au><au>Ouzounis, Christos A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2023-08-01</date><risdate>2023</risdate><volume>19</volume><issue>8</issue><spage>e1011422</spage><epage>e1011422</epage><pages>e1011422-e1011422</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>37639475</pmid><doi>10.1371/journal.pcbi.1011422</doi><tpages>e1011422</tpages><orcidid>https://orcid.org/0000-0003-2907-3265</orcidid><orcidid>https://orcid.org/0000-0003-3472-3736</orcidid><orcidid>https://orcid.org/0000-0001-8626-2148</orcidid><orcidid>https://orcid.org/0000-0003-3848-4330</orcidid><orcidid>https://orcid.org/0000-0001-7090-8717</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2023-08, Vol.19 (8), p.e1011422-e1011422
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_plos_journals_2865519722
source MEDLINE; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central; Public Library of Science (PLoS)
subjects Annotations
Assemblies
Biodiversity
Biology and Life Sciences
Classification
Computer and Information Sciences
Computer applications
Datasets
Eukaryota
Eukaryotic Cells
Genome, Viral - genetics
Genomes
Hidden Markov models
Host-virus relationships
Humans
Identification and classification
Markov chains
Metagenome - genetics
Metagenomics
Microbiota
Microorganisms
Oceans
Physical sciences
Pipeline design
Prophages
Proteins
Sequences
Taxonomy
Viral genetics
Viruses
title VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T16%3A07%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=VIRify:%20An%20integrated%20detection,%20annotation%20and%20taxonomic%20classification%20pipeline%20using%20virus-specific%20protein%20profile%20hidden%20Markov%20models&rft.jtitle=PLoS%20computational%20biology&rft.au=Rangel-Pineros,%20Guillermo&rft.date=2023-08-01&rft.volume=19&rft.issue=8&rft.spage=e1011422&rft.epage=e1011422&rft.pages=e1011422-e1011422&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1011422&rft_dat=%3Cgale_plos_%3EA763760400%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2865519722&rft_id=info:pmid/37639475&rft_galeid=A763760400&rft_doaj_id=oai_doaj_org_article_e1d8683e840c49e5a3bfa6198610a74e&rfr_iscdi=true