Using networks to analyze and visualize the distribution of overlapping genes in virus genomes

Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS pathogens 2022-02, Vol.18 (2), p.e1010331-e1010331
Hauptverfasser: Muñoz-Baena, Laura, Poon, Art F Y
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e1010331
container_issue 2
container_start_page e1010331
container_title PLoS pathogens
container_volume 18
creator Muñoz-Baena, Laura
Poon, Art F Y
description Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.
doi_str_mv 10.1371/journal.ppat.1010331
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2640117641</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A695461211</galeid><doaj_id>oai_doaj_org_article_66d9c00919564226bc190594146b85c4</doaj_id><sourcerecordid>A695461211</sourcerecordid><originalsourceid>FETCH-LOGICAL-c661t-cedc4c7160d810f8b4ba4553cdd87dd592fc88cc6817f4364b325fc5e6c0abef3</originalsourceid><addsrcrecordid>eNqVkk1v1DAQhiMEou3CP0AQiQscdvHEjuNckKqKj5UqkIBesRzbSb1k7WA7S8uvx2HTqot6QT7YYz_v65nRZNkzQCvAFbzZuNFb0a-GQcQVIEAYw4PsGMoSLytckYd3zkfZSQgbhAhgoI-zI1wWqCBFfZx9vwjGdrnV8ZfzP0IeXS6S6_VvnXaV70wYRW9SFC91rkyI3jRjNM7mrs3dTvteDMPk0GmrQ25skvgxTKHb6vAke9SKPuin877ILt6_-3b2cXn--cP67PR8KSmFuJRaSSIroEgxQC1rSCNISl4qxSqlyrpoJWNSUgZVSzAlDS7KVpaaSiQa3eJF9mLvO_Qu8Lk1gReUIICKproX2XpPKCc2fPBmK_w1d8LwvxfOd1z4aGSvOaWqlgjVUJeUFAVtJNSorAkQ2rBSkuT1dv5tbLYpdW2jF_2B6eGLNZe8czvOaoSrmiWDV7OBdz9HHSLfmiB13wur3TjljTEjDGGU0Jf_oPdXN1OdSAUY27r0r5xM-SmtS0KhgIla3UOlpfTWSGd1a9L9geD1gSAxUV_FTowh8PXXL__BfjpkyZ6V3oXgdXvbO0B8mu6bIvk03Xye7iR7frfvt6KbccZ_AF-J9b4</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2640117641</pqid></control><display><type>article</type><title>Using networks to analyze and visualize the distribution of overlapping genes in virus genomes</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central Open Access</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>PubMed Central</source><creator>Muñoz-Baena, Laura ; Poon, Art F Y</creator><contributor>Koonin, Eugene V</contributor><creatorcontrib>Muñoz-Baena, Laura ; Poon, Art F Y ; Koonin, Eugene V</creatorcontrib><description>Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.</description><identifier>ISSN: 1553-7374</identifier><identifier>ISSN: 1553-7366</identifier><identifier>EISSN: 1553-7374</identifier><identifier>DOI: 10.1371/journal.ppat.1010331</identifier><identifier>PMID: 35202429</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Biology and Life Sciences ; Comparative analysis ; Comparative studies ; Computer and Information Sciences ; Frames (data processing) ; Genes ; Genes, Overlapping - genetics ; Genetic engineering ; Genome, Viral - genetics ; Genomes ; Graphical representations ; Homology ; Medicine and Health Sciences ; Mutation ; Nucleotides ; Open reading frames ; Open Reading Frames - genetics ; Proteins ; Reading ; Taxonomy ; Viral genetics ; Viruses</subject><ispartof>PLoS pathogens, 2022-02, Vol.18 (2), p.e1010331-e1010331</ispartof><rights>COPYRIGHT 2022 Public Library of Science</rights><rights>2022 Muñoz-Baena, Poon. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2022 Muñoz-Baena, Poon 2022 Muñoz-Baena, Poon</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c661t-cedc4c7160d810f8b4ba4553cdd87dd592fc88cc6817f4364b325fc5e6c0abef3</citedby><cites>FETCH-LOGICAL-c661t-cedc4c7160d810f8b4ba4553cdd87dd592fc88cc6817f4364b325fc5e6c0abef3</cites><orcidid>0000-0002-6120-7211</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8903798/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8903798/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35202429$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Koonin, Eugene V</contributor><creatorcontrib>Muñoz-Baena, Laura</creatorcontrib><creatorcontrib>Poon, Art F Y</creatorcontrib><title>Using networks to analyze and visualize the distribution of overlapping genes in virus genomes</title><title>PLoS pathogens</title><addtitle>PLoS Pathog</addtitle><description>Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.</description><subject>Biology and Life Sciences</subject><subject>Comparative analysis</subject><subject>Comparative studies</subject><subject>Computer and Information Sciences</subject><subject>Frames (data processing)</subject><subject>Genes</subject><subject>Genes, Overlapping - genetics</subject><subject>Genetic engineering</subject><subject>Genome, Viral - genetics</subject><subject>Genomes</subject><subject>Graphical representations</subject><subject>Homology</subject><subject>Medicine and Health Sciences</subject><subject>Mutation</subject><subject>Nucleotides</subject><subject>Open reading frames</subject><subject>Open Reading Frames - genetics</subject><subject>Proteins</subject><subject>Reading</subject><subject>Taxonomy</subject><subject>Viral genetics</subject><subject>Viruses</subject><issn>1553-7374</issn><issn>1553-7366</issn><issn>1553-7374</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVkk1v1DAQhiMEou3CP0AQiQscdvHEjuNckKqKj5UqkIBesRzbSb1k7WA7S8uvx2HTqot6QT7YYz_v65nRZNkzQCvAFbzZuNFb0a-GQcQVIEAYw4PsGMoSLytckYd3zkfZSQgbhAhgoI-zI1wWqCBFfZx9vwjGdrnV8ZfzP0IeXS6S6_VvnXaV70wYRW9SFC91rkyI3jRjNM7mrs3dTvteDMPk0GmrQ25skvgxTKHb6vAke9SKPuin877ILt6_-3b2cXn--cP67PR8KSmFuJRaSSIroEgxQC1rSCNISl4qxSqlyrpoJWNSUgZVSzAlDS7KVpaaSiQa3eJF9mLvO_Qu8Lk1gReUIICKproX2XpPKCc2fPBmK_w1d8LwvxfOd1z4aGSvOaWqlgjVUJeUFAVtJNSorAkQ2rBSkuT1dv5tbLYpdW2jF_2B6eGLNZe8czvOaoSrmiWDV7OBdz9HHSLfmiB13wur3TjljTEjDGGU0Jf_oPdXN1OdSAUY27r0r5xM-SmtS0KhgIla3UOlpfTWSGd1a9L9geD1gSAxUV_FTowh8PXXL__BfjpkyZ6V3oXgdXvbO0B8mu6bIvk03Xye7iR7frfvt6KbccZ_AF-J9b4</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Muñoz-Baena, Laura</creator><creator>Poon, Art F Y</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QL</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6120-7211</orcidid></search><sort><creationdate>20220201</creationdate><title>Using networks to analyze and visualize the distribution of overlapping genes in virus genomes</title><author>Muñoz-Baena, Laura ; Poon, Art F Y</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c661t-cedc4c7160d810f8b4ba4553cdd87dd592fc88cc6817f4364b325fc5e6c0abef3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Biology and Life Sciences</topic><topic>Comparative analysis</topic><topic>Comparative studies</topic><topic>Computer and Information Sciences</topic><topic>Frames (data processing)</topic><topic>Genes</topic><topic>Genes, Overlapping - genetics</topic><topic>Genetic engineering</topic><topic>Genome, Viral - genetics</topic><topic>Genomes</topic><topic>Graphical representations</topic><topic>Homology</topic><topic>Medicine and Health Sciences</topic><topic>Mutation</topic><topic>Nucleotides</topic><topic>Open reading frames</topic><topic>Open Reading Frames - genetics</topic><topic>Proteins</topic><topic>Reading</topic><topic>Taxonomy</topic><topic>Viral genetics</topic><topic>Viruses</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Muñoz-Baena, Laura</creatorcontrib><creatorcontrib>Poon, Art F Y</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Virology and AIDS Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS pathogens</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Muñoz-Baena, Laura</au><au>Poon, Art F Y</au><au>Koonin, Eugene V</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Using networks to analyze and visualize the distribution of overlapping genes in virus genomes</atitle><jtitle>PLoS pathogens</jtitle><addtitle>PLoS Pathog</addtitle><date>2022-02-01</date><risdate>2022</risdate><volume>18</volume><issue>2</issue><spage>e1010331</spage><epage>e1010331</epage><pages>e1010331-e1010331</pages><issn>1553-7374</issn><issn>1553-7366</issn><eissn>1553-7374</eissn><abstract>Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>35202429</pmid><doi>10.1371/journal.ppat.1010331</doi><tpages>e1010331</tpages><orcidid>https://orcid.org/0000-0002-6120-7211</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7374
ispartof PLoS pathogens, 2022-02, Vol.18 (2), p.e1010331-e1010331
issn 1553-7374
1553-7366
1553-7374
language eng
recordid cdi_plos_journals_2640117641
source MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central Open Access; Public Library of Science (PLoS) Journals Open Access; PubMed Central
subjects Biology and Life Sciences
Comparative analysis
Comparative studies
Computer and Information Sciences
Frames (data processing)
Genes
Genes, Overlapping - genetics
Genetic engineering
Genome, Viral - genetics
Genomes
Graphical representations
Homology
Medicine and Health Sciences
Mutation
Nucleotides
Open reading frames
Open Reading Frames - genetics
Proteins
Reading
Taxonomy
Viral genetics
Viruses
title Using networks to analyze and visualize the distribution of overlapping genes in virus genomes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T04%3A26%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Using%20networks%20to%20analyze%20and%20visualize%20the%20distribution%20of%20overlapping%20genes%20in%20virus%20genomes&rft.jtitle=PLoS%20pathogens&rft.au=Mu%C3%B1oz-Baena,%20Laura&rft.date=2022-02-01&rft.volume=18&rft.issue=2&rft.spage=e1010331&rft.epage=e1010331&rft.pages=e1010331-e1010331&rft.issn=1553-7374&rft.eissn=1553-7374&rft_id=info:doi/10.1371/journal.ppat.1010331&rft_dat=%3Cgale_plos_%3EA695461211%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2640117641&rft_id=info:pmid/35202429&rft_galeid=A695461211&rft_doaj_id=oai_doaj_org_article_66d9c00919564226bc190594146b85c4&rfr_iscdi=true