Virus classification in 60-dimensional protein space

[Display omitted] •Our method uses protein sequences to classify viruses efficiently and quickly.•We compare classification accuracy rates using proteomes with those using genomes of viruses.•Our approach uses the natural graphical representation to reliably infer viral phylogeny. Due to vast sequen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular phylogenetics and evolution 2016-06, Vol.99, p.53-62
Hauptverfasser: Li, Yongkun, Tian, Kun, Yin, Changchuan, He, Rong Lucy, Yau, Stephen S.-T.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 62
container_issue
container_start_page 53
container_title Molecular phylogenetics and evolution
container_volume 99
creator Li, Yongkun
Tian, Kun
Yin, Changchuan
He, Rong Lucy
Yau, Stephen S.-T.
description [Display omitted] •Our method uses protein sequences to classify viruses efficiently and quickly.•We compare classification accuracy rates using proteomes with those using genomes of viruses.•Our approach uses the natural graphical representation to reliably infer viral phylogeny. Due to vast sequence divergence among different viral groups, sequence alignment is not directly applicable to genome-wide comparative analysis of viruses. More and more attention has been paid to alignment-free methods for whole genome comparison and phylogenetic tree reconstruction. Among alignment-free methods, the recently proposed “Natural Vector (NV) representation” has successfully been used to study the phylogeny of multi-segmented viruses based on a 12-dimensional genome space derived from the nucleotide sequence structure. But the preference of proteomes over genomes for the determination of viral phylogeny was not deeply investigated. As the translated products of genes, proteins directly form the shape of viral structure and are vital for all metabolic pathways. In this study, using the NV representation of a protein sequence along with the Hausdorff distance suitable to compare point sets, we construct a 60-dimensional protein space to analyze the evolutionary relationships of 4021 viruses by whole-proteomes in the current NCBI Reference Sequence Database (RefSeq). We also take advantage of the previously developed natural graphical representation to recover viral phylogeny. Our results demonstrate that the proposed method is efficient and accurate for classifying viruses. The accuracy rates of our predictions such as for Baltimore II viruses are as high as 95.9% for family labels, 95.7% for subfamily labels and 96.5% for genus labels. Finally, we discover that proteomes lead to better viral classification when reliable protein sequences are abundant. In other cases, the accuracy rates using proteomes are still comparable to that of genomes.
doi_str_mv 10.1016/j.ympev.2016.03.009
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1808672918</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S105579031630001X</els_id><sourcerecordid>1787479358</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-ad66cd935cbcb45ccd6c8f7f0ec978a0285ee039d326fb3044751e6f305842c73</originalsourceid><addsrcrecordid>eNqFkEtLxDAQx4Movj-BIHv00jppmtfBg4gvWPCiXkN2OoUsfaxNu-C3N-uuHvU0mfCb-Q8_xi445By4ul7mn-2K1nmRmhxEDmD32DEHKzMrudjfvKXMtAVxxE5iXAJwLq08ZEeFssaUvDxm5XsYpjjDxscY6oB-DH03C91MQVaFlrqYet_MVkM_UvqOK490xg5q30Q639VT9vZw_3r3lM1fHp_vbucZCluMma-UwsoKiQtclBKxUmhqXQOh1cZDYSQRCFuJQtULAWWpJSdVC5CmLFCLU3a13ZvSPyaKo2tDRGoa31E_RccNGKULy83_qDa61OmWDSq2KA59jAPVbjWE1g-fjoPbmHVL923Wbcw6EC6ZTVOXu4Bp0VL1O_OjMgE3W4CSkXWgwUUM1CFVYSAcXdWHPwO-AJJeiec</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1787479358</pqid></control><display><type>article</type><title>Virus classification in 60-dimensional protein space</title><source>MEDLINE</source><source>Access via ScienceDirect (Elsevier)</source><creator>Li, Yongkun ; Tian, Kun ; Yin, Changchuan ; He, Rong Lucy ; Yau, Stephen S.-T.</creator><creatorcontrib>Li, Yongkun ; Tian, Kun ; Yin, Changchuan ; He, Rong Lucy ; Yau, Stephen S.-T.</creatorcontrib><description>[Display omitted] •Our method uses protein sequences to classify viruses efficiently and quickly.•We compare classification accuracy rates using proteomes with those using genomes of viruses.•Our approach uses the natural graphical representation to reliably infer viral phylogeny. Due to vast sequence divergence among different viral groups, sequence alignment is not directly applicable to genome-wide comparative analysis of viruses. More and more attention has been paid to alignment-free methods for whole genome comparison and phylogenetic tree reconstruction. Among alignment-free methods, the recently proposed “Natural Vector (NV) representation” has successfully been used to study the phylogeny of multi-segmented viruses based on a 12-dimensional genome space derived from the nucleotide sequence structure. But the preference of proteomes over genomes for the determination of viral phylogeny was not deeply investigated. As the translated products of genes, proteins directly form the shape of viral structure and are vital for all metabolic pathways. In this study, using the NV representation of a protein sequence along with the Hausdorff distance suitable to compare point sets, we construct a 60-dimensional protein space to analyze the evolutionary relationships of 4021 viruses by whole-proteomes in the current NCBI Reference Sequence Database (RefSeq). We also take advantage of the previously developed natural graphical representation to recover viral phylogeny. Our results demonstrate that the proposed method is efficient and accurate for classifying viruses. The accuracy rates of our predictions such as for Baltimore II viruses are as high as 95.9% for family labels, 95.7% for subfamily labels and 96.5% for genus labels. Finally, we discover that proteomes lead to better viral classification when reliable protein sequences are abundant. In other cases, the accuracy rates using proteomes are still comparable to that of genomes.</description><identifier>ISSN: 1055-7903</identifier><identifier>EISSN: 1095-9513</identifier><identifier>DOI: 10.1016/j.ympev.2016.03.009</identifier><identifier>PMID: 26988414</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Amino Acid Sequence ; Databases, Protein ; Genome, Viral ; Hausdorff distance ; Natural graphical representation ; Natural vector ; Phylogeny ; Proteome - chemistry ; Proteome - genetics ; Viral Proteins - chemistry ; Virus classification ; Viruses - classification ; Viruses - genetics</subject><ispartof>Molecular phylogenetics and evolution, 2016-06, Vol.99, p.53-62</ispartof><rights>2016</rights><rights>Published by Elsevier Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-ad66cd935cbcb45ccd6c8f7f0ec978a0285ee039d326fb3044751e6f305842c73</citedby><cites>FETCH-LOGICAL-c392t-ad66cd935cbcb45ccd6c8f7f0ec978a0285ee039d326fb3044751e6f305842c73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.ympev.2016.03.009$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26988414$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Yongkun</creatorcontrib><creatorcontrib>Tian, Kun</creatorcontrib><creatorcontrib>Yin, Changchuan</creatorcontrib><creatorcontrib>He, Rong Lucy</creatorcontrib><creatorcontrib>Yau, Stephen S.-T.</creatorcontrib><title>Virus classification in 60-dimensional protein space</title><title>Molecular phylogenetics and evolution</title><addtitle>Mol Phylogenet Evol</addtitle><description>[Display omitted] •Our method uses protein sequences to classify viruses efficiently and quickly.•We compare classification accuracy rates using proteomes with those using genomes of viruses.•Our approach uses the natural graphical representation to reliably infer viral phylogeny. Due to vast sequence divergence among different viral groups, sequence alignment is not directly applicable to genome-wide comparative analysis of viruses. More and more attention has been paid to alignment-free methods for whole genome comparison and phylogenetic tree reconstruction. Among alignment-free methods, the recently proposed “Natural Vector (NV) representation” has successfully been used to study the phylogeny of multi-segmented viruses based on a 12-dimensional genome space derived from the nucleotide sequence structure. But the preference of proteomes over genomes for the determination of viral phylogeny was not deeply investigated. As the translated products of genes, proteins directly form the shape of viral structure and are vital for all metabolic pathways. In this study, using the NV representation of a protein sequence along with the Hausdorff distance suitable to compare point sets, we construct a 60-dimensional protein space to analyze the evolutionary relationships of 4021 viruses by whole-proteomes in the current NCBI Reference Sequence Database (RefSeq). We also take advantage of the previously developed natural graphical representation to recover viral phylogeny. Our results demonstrate that the proposed method is efficient and accurate for classifying viruses. The accuracy rates of our predictions such as for Baltimore II viruses are as high as 95.9% for family labels, 95.7% for subfamily labels and 96.5% for genus labels. Finally, we discover that proteomes lead to better viral classification when reliable protein sequences are abundant. In other cases, the accuracy rates using proteomes are still comparable to that of genomes.</description><subject>Amino Acid Sequence</subject><subject>Databases, Protein</subject><subject>Genome, Viral</subject><subject>Hausdorff distance</subject><subject>Natural graphical representation</subject><subject>Natural vector</subject><subject>Phylogeny</subject><subject>Proteome - chemistry</subject><subject>Proteome - genetics</subject><subject>Viral Proteins - chemistry</subject><subject>Virus classification</subject><subject>Viruses - classification</subject><subject>Viruses - genetics</subject><issn>1055-7903</issn><issn>1095-9513</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkEtLxDAQx4Movj-BIHv00jppmtfBg4gvWPCiXkN2OoUsfaxNu-C3N-uuHvU0mfCb-Q8_xi445By4ul7mn-2K1nmRmhxEDmD32DEHKzMrudjfvKXMtAVxxE5iXAJwLq08ZEeFssaUvDxm5XsYpjjDxscY6oB-DH03C91MQVaFlrqYet_MVkM_UvqOK490xg5q30Q639VT9vZw_3r3lM1fHp_vbucZCluMma-UwsoKiQtclBKxUmhqXQOh1cZDYSQRCFuJQtULAWWpJSdVC5CmLFCLU3a13ZvSPyaKo2tDRGoa31E_RccNGKULy83_qDa61OmWDSq2KA59jAPVbjWE1g-fjoPbmHVL923Wbcw6EC6ZTVOXu4Bp0VL1O_OjMgE3W4CSkXWgwUUM1CFVYSAcXdWHPwO-AJJeiec</recordid><startdate>201606</startdate><enddate>201606</enddate><creator>Li, Yongkun</creator><creator>Tian, Kun</creator><creator>Yin, Changchuan</creator><creator>He, Rong Lucy</creator><creator>Yau, Stephen S.-T.</creator><general>Elsevier Inc</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7U9</scope><scope>H94</scope></search><sort><creationdate>201606</creationdate><title>Virus classification in 60-dimensional protein space</title><author>Li, Yongkun ; Tian, Kun ; Yin, Changchuan ; He, Rong Lucy ; Yau, Stephen S.-T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-ad66cd935cbcb45ccd6c8f7f0ec978a0285ee039d326fb3044751e6f305842c73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Amino Acid Sequence</topic><topic>Databases, Protein</topic><topic>Genome, Viral</topic><topic>Hausdorff distance</topic><topic>Natural graphical representation</topic><topic>Natural vector</topic><topic>Phylogeny</topic><topic>Proteome - chemistry</topic><topic>Proteome - genetics</topic><topic>Viral Proteins - chemistry</topic><topic>Virus classification</topic><topic>Viruses - classification</topic><topic>Viruses - genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Yongkun</creatorcontrib><creatorcontrib>Tian, Kun</creatorcontrib><creatorcontrib>Yin, Changchuan</creatorcontrib><creatorcontrib>He, Rong Lucy</creatorcontrib><creatorcontrib>Yau, Stephen S.-T.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Virology and AIDS Abstracts</collection><collection>AIDS and Cancer Research Abstracts</collection><jtitle>Molecular phylogenetics and evolution</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Yongkun</au><au>Tian, Kun</au><au>Yin, Changchuan</au><au>He, Rong Lucy</au><au>Yau, Stephen S.-T.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Virus classification in 60-dimensional protein space</atitle><jtitle>Molecular phylogenetics and evolution</jtitle><addtitle>Mol Phylogenet Evol</addtitle><date>2016-06</date><risdate>2016</risdate><volume>99</volume><spage>53</spage><epage>62</epage><pages>53-62</pages><issn>1055-7903</issn><eissn>1095-9513</eissn><abstract>[Display omitted] •Our method uses protein sequences to classify viruses efficiently and quickly.•We compare classification accuracy rates using proteomes with those using genomes of viruses.•Our approach uses the natural graphical representation to reliably infer viral phylogeny. Due to vast sequence divergence among different viral groups, sequence alignment is not directly applicable to genome-wide comparative analysis of viruses. More and more attention has been paid to alignment-free methods for whole genome comparison and phylogenetic tree reconstruction. Among alignment-free methods, the recently proposed “Natural Vector (NV) representation” has successfully been used to study the phylogeny of multi-segmented viruses based on a 12-dimensional genome space derived from the nucleotide sequence structure. But the preference of proteomes over genomes for the determination of viral phylogeny was not deeply investigated. As the translated products of genes, proteins directly form the shape of viral structure and are vital for all metabolic pathways. In this study, using the NV representation of a protein sequence along with the Hausdorff distance suitable to compare point sets, we construct a 60-dimensional protein space to analyze the evolutionary relationships of 4021 viruses by whole-proteomes in the current NCBI Reference Sequence Database (RefSeq). We also take advantage of the previously developed natural graphical representation to recover viral phylogeny. Our results demonstrate that the proposed method is efficient and accurate for classifying viruses. The accuracy rates of our predictions such as for Baltimore II viruses are as high as 95.9% for family labels, 95.7% for subfamily labels and 96.5% for genus labels. Finally, we discover that proteomes lead to better viral classification when reliable protein sequences are abundant. In other cases, the accuracy rates using proteomes are still comparable to that of genomes.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>26988414</pmid><doi>10.1016/j.ympev.2016.03.009</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1055-7903
ispartof Molecular phylogenetics and evolution, 2016-06, Vol.99, p.53-62
issn 1055-7903
1095-9513
language eng
recordid cdi_proquest_miscellaneous_1808672918
source MEDLINE; Access via ScienceDirect (Elsevier)
subjects Amino Acid Sequence
Databases, Protein
Genome, Viral
Hausdorff distance
Natural graphical representation
Natural vector
Phylogeny
Proteome - chemistry
Proteome - genetics
Viral Proteins - chemistry
Virus classification
Viruses - classification
Viruses - genetics
title Virus classification in 60-dimensional protein space
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T19%3A07%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Virus%20classification%20in%2060-dimensional%20protein%20space&rft.jtitle=Molecular%20phylogenetics%20and%20evolution&rft.au=Li,%20Yongkun&rft.date=2016-06&rft.volume=99&rft.spage=53&rft.epage=62&rft.pages=53-62&rft.issn=1055-7903&rft.eissn=1095-9513&rft_id=info:doi/10.1016/j.ympev.2016.03.009&rft_dat=%3Cproquest_cross%3E1787479358%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1787479358&rft_id=info:pmid/26988414&rft_els_id=S105579031630001X&rfr_iscdi=true