Using networks to analyze and visualize the distribution of overlapping genes in virus genomes
Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global c...
Gespeichert in:
Veröffentlicht in: | PLoS pathogens 2022-02, Vol.18 (2), p.e1010331-e1010331 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e1010331 |
---|---|
container_issue | 2 |
container_start_page | e1010331 |
container_title | PLoS pathogens |
container_volume | 18 |
creator | Muñoz-Baena, Laura Poon, Art F Y |
description | Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps. |
doi_str_mv | 10.1371/journal.ppat.1010331 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2640117641</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A695461211</galeid><doaj_id>oai_doaj_org_article_66d9c00919564226bc190594146b85c4</doaj_id><sourcerecordid>A695461211</sourcerecordid><originalsourceid>FETCH-LOGICAL-c661t-cedc4c7160d810f8b4ba4553cdd87dd592fc88cc6817f4364b325fc5e6c0abef3</originalsourceid><addsrcrecordid>eNqVkk1v1DAQhiMEou3CP0AQiQscdvHEjuNckKqKj5UqkIBesRzbSb1k7WA7S8uvx2HTqot6QT7YYz_v65nRZNkzQCvAFbzZuNFb0a-GQcQVIEAYw4PsGMoSLytckYd3zkfZSQgbhAhgoI-zI1wWqCBFfZx9vwjGdrnV8ZfzP0IeXS6S6_VvnXaV70wYRW9SFC91rkyI3jRjNM7mrs3dTvteDMPk0GmrQ25skvgxTKHb6vAke9SKPuin877ILt6_-3b2cXn--cP67PR8KSmFuJRaSSIroEgxQC1rSCNISl4qxSqlyrpoJWNSUgZVSzAlDS7KVpaaSiQa3eJF9mLvO_Qu8Lk1gReUIICKproX2XpPKCc2fPBmK_w1d8LwvxfOd1z4aGSvOaWqlgjVUJeUFAVtJNSorAkQ2rBSkuT1dv5tbLYpdW2jF_2B6eGLNZe8czvOaoSrmiWDV7OBdz9HHSLfmiB13wur3TjljTEjDGGU0Jf_oPdXN1OdSAUY27r0r5xM-SmtS0KhgIla3UOlpfTWSGd1a9L9geD1gSAxUV_FTowh8PXXL__BfjpkyZ6V3oXgdXvbO0B8mu6bIvk03Xye7iR7frfvt6KbccZ_AF-J9b4</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2640117641</pqid></control><display><type>article</type><title>Using networks to analyze and visualize the distribution of overlapping genes in virus genomes</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central Open Access</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>PubMed Central</source><creator>Muñoz-Baena, Laura ; Poon, Art F Y</creator><contributor>Koonin, Eugene V</contributor><creatorcontrib>Muñoz-Baena, Laura ; Poon, Art F Y ; Koonin, Eugene V</creatorcontrib><description>Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.</description><identifier>ISSN: 1553-7374</identifier><identifier>ISSN: 1553-7366</identifier><identifier>EISSN: 1553-7374</identifier><identifier>DOI: 10.1371/journal.ppat.1010331</identifier><identifier>PMID: 35202429</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Biology and Life Sciences ; Comparative analysis ; Comparative studies ; Computer and Information Sciences ; Frames (data processing) ; Genes ; Genes, Overlapping - genetics ; Genetic engineering ; Genome, Viral - genetics ; Genomes ; Graphical representations ; Homology ; Medicine and Health Sciences ; Mutation ; Nucleotides ; Open reading frames ; Open Reading Frames - genetics ; Proteins ; Reading ; Taxonomy ; Viral genetics ; Viruses</subject><ispartof>PLoS pathogens, 2022-02, Vol.18 (2), p.e1010331-e1010331</ispartof><rights>COPYRIGHT 2022 Public Library of Science</rights><rights>2022 Muñoz-Baena, Poon. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2022 Muñoz-Baena, Poon 2022 Muñoz-Baena, Poon</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c661t-cedc4c7160d810f8b4ba4553cdd87dd592fc88cc6817f4364b325fc5e6c0abef3</citedby><cites>FETCH-LOGICAL-c661t-cedc4c7160d810f8b4ba4553cdd87dd592fc88cc6817f4364b325fc5e6c0abef3</cites><orcidid>0000-0002-6120-7211</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8903798/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8903798/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35202429$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Koonin, Eugene V</contributor><creatorcontrib>Muñoz-Baena, Laura</creatorcontrib><creatorcontrib>Poon, Art F Y</creatorcontrib><title>Using networks to analyze and visualize the distribution of overlapping genes in virus genomes</title><title>PLoS pathogens</title><addtitle>PLoS Pathog</addtitle><description>Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.</description><subject>Biology and Life Sciences</subject><subject>Comparative analysis</subject><subject>Comparative studies</subject><subject>Computer and Information Sciences</subject><subject>Frames (data processing)</subject><subject>Genes</subject><subject>Genes, Overlapping - genetics</subject><subject>Genetic engineering</subject><subject>Genome, Viral - genetics</subject><subject>Genomes</subject><subject>Graphical representations</subject><subject>Homology</subject><subject>Medicine and Health Sciences</subject><subject>Mutation</subject><subject>Nucleotides</subject><subject>Open reading frames</subject><subject>Open Reading Frames - genetics</subject><subject>Proteins</subject><subject>Reading</subject><subject>Taxonomy</subject><subject>Viral genetics</subject><subject>Viruses</subject><issn>1553-7374</issn><issn>1553-7366</issn><issn>1553-7374</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVkk1v1DAQhiMEou3CP0AQiQscdvHEjuNckKqKj5UqkIBesRzbSb1k7WA7S8uvx2HTqot6QT7YYz_v65nRZNkzQCvAFbzZuNFb0a-GQcQVIEAYw4PsGMoSLytckYd3zkfZSQgbhAhgoI-zI1wWqCBFfZx9vwjGdrnV8ZfzP0IeXS6S6_VvnXaV70wYRW9SFC91rkyI3jRjNM7mrs3dTvteDMPk0GmrQ25skvgxTKHb6vAke9SKPuin877ILt6_-3b2cXn--cP67PR8KSmFuJRaSSIroEgxQC1rSCNISl4qxSqlyrpoJWNSUgZVSzAlDS7KVpaaSiQa3eJF9mLvO_Qu8Lk1gReUIICKproX2XpPKCc2fPBmK_w1d8LwvxfOd1z4aGSvOaWqlgjVUJeUFAVtJNSorAkQ2rBSkuT1dv5tbLYpdW2jF_2B6eGLNZe8czvOaoSrmiWDV7OBdz9HHSLfmiB13wur3TjljTEjDGGU0Jf_oPdXN1OdSAUY27r0r5xM-SmtS0KhgIla3UOlpfTWSGd1a9L9geD1gSAxUV_FTowh8PXXL__BfjpkyZ6V3oXgdXvbO0B8mu6bIvk03Xye7iR7frfvt6KbccZ_AF-J9b4</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Muñoz-Baena, Laura</creator><creator>Poon, Art F Y</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QL</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6120-7211</orcidid></search><sort><creationdate>20220201</creationdate><title>Using networks to analyze and visualize the distribution of overlapping genes in virus genomes</title><author>Muñoz-Baena, Laura ; Poon, Art F Y</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c661t-cedc4c7160d810f8b4ba4553cdd87dd592fc88cc6817f4364b325fc5e6c0abef3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Biology and Life Sciences</topic><topic>Comparative analysis</topic><topic>Comparative studies</topic><topic>Computer and Information Sciences</topic><topic>Frames (data processing)</topic><topic>Genes</topic><topic>Genes, Overlapping - genetics</topic><topic>Genetic engineering</topic><topic>Genome, Viral - genetics</topic><topic>Genomes</topic><topic>Graphical representations</topic><topic>Homology</topic><topic>Medicine and Health Sciences</topic><topic>Mutation</topic><topic>Nucleotides</topic><topic>Open reading frames</topic><topic>Open Reading Frames - genetics</topic><topic>Proteins</topic><topic>Reading</topic><topic>Taxonomy</topic><topic>Viral genetics</topic><topic>Viruses</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Muñoz-Baena, Laura</creatorcontrib><creatorcontrib>Poon, Art F Y</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Virology and AIDS Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS pathogens</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Muñoz-Baena, Laura</au><au>Poon, Art F Y</au><au>Koonin, Eugene V</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Using networks to analyze and visualize the distribution of overlapping genes in virus genomes</atitle><jtitle>PLoS pathogens</jtitle><addtitle>PLoS Pathog</addtitle><date>2022-02-01</date><risdate>2022</risdate><volume>18</volume><issue>2</issue><spage>e1010331</spage><epage>e1010331</epage><pages>e1010331-e1010331</pages><issn>1553-7374</issn><issn>1553-7366</issn><eissn>1553-7374</eissn><abstract>Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>35202429</pmid><doi>10.1371/journal.ppat.1010331</doi><tpages>e1010331</tpages><orcidid>https://orcid.org/0000-0002-6120-7211</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1553-7374 |
ispartof | PLoS pathogens, 2022-02, Vol.18 (2), p.e1010331-e1010331 |
issn | 1553-7374 1553-7366 1553-7374 |
language | eng |
recordid | cdi_plos_journals_2640117641 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central Open Access; Public Library of Science (PLoS) Journals Open Access; PubMed Central |
subjects | Biology and Life Sciences Comparative analysis Comparative studies Computer and Information Sciences Frames (data processing) Genes Genes, Overlapping - genetics Genetic engineering Genome, Viral - genetics Genomes Graphical representations Homology Medicine and Health Sciences Mutation Nucleotides Open reading frames Open Reading Frames - genetics Proteins Reading Taxonomy Viral genetics Viruses |
title | Using networks to analyze and visualize the distribution of overlapping genes in virus genomes |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T04%3A26%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Using%20networks%20to%20analyze%20and%20visualize%20the%20distribution%20of%20overlapping%20genes%20in%20virus%20genomes&rft.jtitle=PLoS%20pathogens&rft.au=Mu%C3%B1oz-Baena,%20Laura&rft.date=2022-02-01&rft.volume=18&rft.issue=2&rft.spage=e1010331&rft.epage=e1010331&rft.pages=e1010331-e1010331&rft.issn=1553-7374&rft.eissn=1553-7374&rft_id=info:doi/10.1371/journal.ppat.1010331&rft_dat=%3Cgale_plos_%3EA695461211%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2640117641&rft_id=info:pmid/35202429&rft_galeid=A695461211&rft_doaj_id=oai_doaj_org_article_66d9c00919564226bc190594146b85c4&rfr_iscdi=true |