Information theoretic perspective on genome clustering

Shannon’s information theoretic perspective of communication helps one to understand the storage and processing of information in one-dimensional sequences. An information theoretic analysis of 937 available completely sequenced prokaryotic genomes and 238 eukaryotic chromosomes is presented. Inform...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Saudi journal of biological sciences 2021-03, Vol.28 (3), p.1867-1889
Hauptverfasser: Veluchamy, Alaguraj, Mehta, Preeti, Srividhya, K.V., Vikram, Hirendra, Govind, M.K., Gupta, Ramneek, Aziz Bin Dukhyil, Abdul, Abdullah Alharbi, Raed, Abdullah Aloyuni, Saleh, Hassan, Mohamed M., Krishnaswamy, S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1889
container_issue 3
container_start_page 1867
container_title Saudi journal of biological sciences
container_volume 28
creator Veluchamy, Alaguraj
Mehta, Preeti
Srividhya, K.V.
Vikram, Hirendra
Govind, M.K.
Gupta, Ramneek
Aziz Bin Dukhyil, Abdul
Abdullah Alharbi, Raed
Abdullah Aloyuni, Saleh
Hassan, Mohamed M.
Krishnaswamy, S.
description Shannon’s information theoretic perspective of communication helps one to understand the storage and processing of information in one-dimensional sequences. An information theoretic analysis of 937 available completely sequenced prokaryotic genomes and 238 eukaryotic chromosomes is presented. Information content (Id) values were used to cluster these chromosomes. Chargaff’s second parity rule i.e compositional self-complementarity, an empirical fact is observed in all the genomes, except for the proteobacteria Candidatus Hodgkinia cicadicola. High information content, arising out of biased base composition in all the 14 chromosomes of Plasmodium falciparum is found among two other genomes of prokaryotes viz. Buchnera aphidicola str. Cc (Cinara cedri) and Candidatus Carsonella ruddii PV. Despite size and compositional variations, both prokaryotic and eukaryotic genomes do not deviate significantly from an equiprobable and random situation. Eukaryotic chromosomes of an organism tend to have similar informational restraints as seen when a simple distance based method is used to cluster them. In eukaryotes, in certain cases, Id values are also similar for the two arms (p and q arm) of the chromosomes. The results of this current study confirm that the information content can provide insights into the clustering of genomes and the evolution of messaging strategies of the genomes. An efficient and robust Perl CGI standalone tool is created based on this information theory algorithm for the analysis of the whole genomes and is made available at https://github.com/AlagurajVeluchamy/InformationTheory.
doi_str_mv 10.1016/j.sjbs.2020.12.039
format Article
fullrecord <record><control><sourceid>proquest_webof</sourceid><recordid>TN_cdi_webofscience_primary_000631757500014CitationCount</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1319562X20307038</els_id><sourcerecordid>2502803621</sourcerecordid><originalsourceid>FETCH-LOGICAL-c455t-c96a874f0b7f2b30f1808e96d65649b0eb2aaa1261a0568c79789f4d4a90a24e3</originalsourceid><addsrcrecordid>eNqNkV-LEzEUxYMobq1-AR-kj4JMvUlm8gdEWIqrCwu-KPgWMpk73ZSZSU0yFb-96bYWfRGfEnLPuTf3dwh5SWFNgYq3u3XatWnNgJUHtgauH5EFY5RXkoJ4TBaUU101gn27Is9S2gEIxRV9Sq44l5yBrBdE3E59iKPNPkyrfI8hYvZutceY9uiyP-CqFLY4hRFXbphTxuin7XPypLdDwhfnc0m-3nz4svlU3X3-eLu5vqtc3TS5clpYJeseWtmzlkNPFSjUohONqHUL2DJrLWWCWmiEclJLpfu6q60Gy2rkS_L-1Hc_tyN2Dqcc7WD20Y82_jTBevN3ZfL3ZhsORuqyKWOlwetzgxi-z5iyGX1yOAx2wjAnwxpgCrgo1JaEnaQuhpQi9pcxFMwRuNmZI3BzBG4oMwV4Mb3684MXy2_CRfDmJPiBbeiT8zg5vMigZMKpbGRTbvSoVv-v3vj8kNsmzFMu1ncnK5Y8Dh6jOds7H0uSpgv-X4v8AkVAs4I</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2502803621</pqid></control><display><type>article</type><title>Information theoretic perspective on genome clustering</title><source>Web of Science - Science Citation Index Expanded - 2021&lt;img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /&gt;</source><source>Access via ScienceDirect (Elsevier)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Veluchamy, Alaguraj ; Mehta, Preeti ; Srividhya, K.V. ; Vikram, Hirendra ; Govind, M.K. ; Gupta, Ramneek ; Aziz Bin Dukhyil, Abdul ; Abdullah Alharbi, Raed ; Abdullah Aloyuni, Saleh ; Hassan, Mohamed M. ; Krishnaswamy, S.</creator><creatorcontrib>Veluchamy, Alaguraj ; Mehta, Preeti ; Srividhya, K.V. ; Vikram, Hirendra ; Govind, M.K. ; Gupta, Ramneek ; Aziz Bin Dukhyil, Abdul ; Abdullah Alharbi, Raed ; Abdullah Aloyuni, Saleh ; Hassan, Mohamed M. ; Krishnaswamy, S.</creatorcontrib><description>Shannon’s information theoretic perspective of communication helps one to understand the storage and processing of information in one-dimensional sequences. An information theoretic analysis of 937 available completely sequenced prokaryotic genomes and 238 eukaryotic chromosomes is presented. Information content (Id) values were used to cluster these chromosomes. Chargaff’s second parity rule i.e compositional self-complementarity, an empirical fact is observed in all the genomes, except for the proteobacteria Candidatus Hodgkinia cicadicola. High information content, arising out of biased base composition in all the 14 chromosomes of Plasmodium falciparum is found among two other genomes of prokaryotes viz. Buchnera aphidicola str. Cc (Cinara cedri) and Candidatus Carsonella ruddii PV. Despite size and compositional variations, both prokaryotic and eukaryotic genomes do not deviate significantly from an equiprobable and random situation. Eukaryotic chromosomes of an organism tend to have similar informational restraints as seen when a simple distance based method is used to cluster them. In eukaryotes, in certain cases, Id values are also similar for the two arms (p and q arm) of the chromosomes. The results of this current study confirm that the information content can provide insights into the clustering of genomes and the evolution of messaging strategies of the genomes. An efficient and robust Perl CGI standalone tool is created based on this information theory algorithm for the analysis of the whole genomes and is made available at https://github.com/AlagurajVeluchamy/InformationTheory.</description><identifier>ISSN: 1319-562X</identifier><identifier>EISSN: 2213-7106</identifier><identifier>DOI: 10.1016/j.sjbs.2020.12.039</identifier><identifier>PMID: 33732074</identifier><language>eng</language><publisher>AMSTERDAM: Elsevier B.V</publisher><subject>Biology ; Genome arrangement ; Genome clustering ; Genome evolution ; Information theory ; Life Sciences &amp; Biomedicine ; Life Sciences &amp; Biomedicine - Other Topics ; Nucleotide distribution ; Original ; Science &amp; Technology ; Shannon redundancy</subject><ispartof>Saudi journal of biological sciences, 2021-03, Vol.28 (3), p.1867-1889</ispartof><rights>2020 The Author(s)</rights><rights>2020 The Author(s).</rights><rights>2020 The Author(s) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>1</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000631757500014</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c455t-c96a874f0b7f2b30f1808e96d65649b0eb2aaa1261a0568c79789f4d4a90a24e3</citedby><cites>FETCH-LOGICAL-c455t-c96a874f0b7f2b30f1808e96d65649b0eb2aaa1261a0568c79789f4d4a90a24e3</cites><orcidid>0000-0001-5476-6114 ; 0000-0002-5349-5794</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7938122/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.sjbs.2020.12.039$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>230,315,728,781,785,886,3551,27929,27930,39263,46000,53796,53798</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33732074$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Veluchamy, Alaguraj</creatorcontrib><creatorcontrib>Mehta, Preeti</creatorcontrib><creatorcontrib>Srividhya, K.V.</creatorcontrib><creatorcontrib>Vikram, Hirendra</creatorcontrib><creatorcontrib>Govind, M.K.</creatorcontrib><creatorcontrib>Gupta, Ramneek</creatorcontrib><creatorcontrib>Aziz Bin Dukhyil, Abdul</creatorcontrib><creatorcontrib>Abdullah Alharbi, Raed</creatorcontrib><creatorcontrib>Abdullah Aloyuni, Saleh</creatorcontrib><creatorcontrib>Hassan, Mohamed M.</creatorcontrib><creatorcontrib>Krishnaswamy, S.</creatorcontrib><title>Information theoretic perspective on genome clustering</title><title>Saudi journal of biological sciences</title><addtitle>SAUDI J BIOL SCI</addtitle><addtitle>Saudi J Biol Sci</addtitle><description>Shannon’s information theoretic perspective of communication helps one to understand the storage and processing of information in one-dimensional sequences. An information theoretic analysis of 937 available completely sequenced prokaryotic genomes and 238 eukaryotic chromosomes is presented. Information content (Id) values were used to cluster these chromosomes. Chargaff’s second parity rule i.e compositional self-complementarity, an empirical fact is observed in all the genomes, except for the proteobacteria Candidatus Hodgkinia cicadicola. High information content, arising out of biased base composition in all the 14 chromosomes of Plasmodium falciparum is found among two other genomes of prokaryotes viz. Buchnera aphidicola str. Cc (Cinara cedri) and Candidatus Carsonella ruddii PV. Despite size and compositional variations, both prokaryotic and eukaryotic genomes do not deviate significantly from an equiprobable and random situation. Eukaryotic chromosomes of an organism tend to have similar informational restraints as seen when a simple distance based method is used to cluster them. In eukaryotes, in certain cases, Id values are also similar for the two arms (p and q arm) of the chromosomes. The results of this current study confirm that the information content can provide insights into the clustering of genomes and the evolution of messaging strategies of the genomes. An efficient and robust Perl CGI standalone tool is created based on this information theory algorithm for the analysis of the whole genomes and is made available at https://github.com/AlagurajVeluchamy/InformationTheory.</description><subject>Biology</subject><subject>Genome arrangement</subject><subject>Genome clustering</subject><subject>Genome evolution</subject><subject>Information theory</subject><subject>Life Sciences &amp; Biomedicine</subject><subject>Life Sciences &amp; Biomedicine - Other Topics</subject><subject>Nucleotide distribution</subject><subject>Original</subject><subject>Science &amp; Technology</subject><subject>Shannon redundancy</subject><issn>1319-562X</issn><issn>2213-7106</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>HGBXW</sourceid><recordid>eNqNkV-LEzEUxYMobq1-AR-kj4JMvUlm8gdEWIqrCwu-KPgWMpk73ZSZSU0yFb-96bYWfRGfEnLPuTf3dwh5SWFNgYq3u3XatWnNgJUHtgauH5EFY5RXkoJ4TBaUU101gn27Is9S2gEIxRV9Sq44l5yBrBdE3E59iKPNPkyrfI8hYvZutceY9uiyP-CqFLY4hRFXbphTxuin7XPypLdDwhfnc0m-3nz4svlU3X3-eLu5vqtc3TS5clpYJeseWtmzlkNPFSjUohONqHUL2DJrLWWCWmiEclJLpfu6q60Gy2rkS_L-1Hc_tyN2Dqcc7WD20Y82_jTBevN3ZfL3ZhsORuqyKWOlwetzgxi-z5iyGX1yOAx2wjAnwxpgCrgo1JaEnaQuhpQi9pcxFMwRuNmZI3BzBG4oMwV4Mb3684MXy2_CRfDmJPiBbeiT8zg5vMigZMKpbGRTbvSoVv-v3vj8kNsmzFMu1ncnK5Y8Dh6jOds7H0uSpgv-X4v8AkVAs4I</recordid><startdate>20210301</startdate><enddate>20210301</enddate><creator>Veluchamy, Alaguraj</creator><creator>Mehta, Preeti</creator><creator>Srividhya, K.V.</creator><creator>Vikram, Hirendra</creator><creator>Govind, M.K.</creator><creator>Gupta, Ramneek</creator><creator>Aziz Bin Dukhyil, Abdul</creator><creator>Abdullah Alharbi, Raed</creator><creator>Abdullah Aloyuni, Saleh</creator><creator>Hassan, Mohamed M.</creator><creator>Krishnaswamy, S.</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>6I.</scope><scope>AAFTH</scope><scope>BLEPL</scope><scope>DTL</scope><scope>HGBXW</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-5476-6114</orcidid><orcidid>https://orcid.org/0000-0002-5349-5794</orcidid></search><sort><creationdate>20210301</creationdate><title>Information theoretic perspective on genome clustering</title><author>Veluchamy, Alaguraj ; Mehta, Preeti ; Srividhya, K.V. ; Vikram, Hirendra ; Govind, M.K. ; Gupta, Ramneek ; Aziz Bin Dukhyil, Abdul ; Abdullah Alharbi, Raed ; Abdullah Aloyuni, Saleh ; Hassan, Mohamed M. ; Krishnaswamy, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c455t-c96a874f0b7f2b30f1808e96d65649b0eb2aaa1261a0568c79789f4d4a90a24e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Biology</topic><topic>Genome arrangement</topic><topic>Genome clustering</topic><topic>Genome evolution</topic><topic>Information theory</topic><topic>Life Sciences &amp; Biomedicine</topic><topic>Life Sciences &amp; Biomedicine - Other Topics</topic><topic>Nucleotide distribution</topic><topic>Original</topic><topic>Science &amp; Technology</topic><topic>Shannon redundancy</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Veluchamy, Alaguraj</creatorcontrib><creatorcontrib>Mehta, Preeti</creatorcontrib><creatorcontrib>Srividhya, K.V.</creatorcontrib><creatorcontrib>Vikram, Hirendra</creatorcontrib><creatorcontrib>Govind, M.K.</creatorcontrib><creatorcontrib>Gupta, Ramneek</creatorcontrib><creatorcontrib>Aziz Bin Dukhyil, Abdul</creatorcontrib><creatorcontrib>Abdullah Alharbi, Raed</creatorcontrib><creatorcontrib>Abdullah Aloyuni, Saleh</creatorcontrib><creatorcontrib>Hassan, Mohamed M.</creatorcontrib><creatorcontrib>Krishnaswamy, S.</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Web of Science - Science Citation Index Expanded - 2021</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Saudi journal of biological sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Veluchamy, Alaguraj</au><au>Mehta, Preeti</au><au>Srividhya, K.V.</au><au>Vikram, Hirendra</au><au>Govind, M.K.</au><au>Gupta, Ramneek</au><au>Aziz Bin Dukhyil, Abdul</au><au>Abdullah Alharbi, Raed</au><au>Abdullah Aloyuni, Saleh</au><au>Hassan, Mohamed M.</au><au>Krishnaswamy, S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Information theoretic perspective on genome clustering</atitle><jtitle>Saudi journal of biological sciences</jtitle><stitle>SAUDI J BIOL SCI</stitle><addtitle>Saudi J Biol Sci</addtitle><date>2021-03-01</date><risdate>2021</risdate><volume>28</volume><issue>3</issue><spage>1867</spage><epage>1889</epage><pages>1867-1889</pages><issn>1319-562X</issn><eissn>2213-7106</eissn><abstract>Shannon’s information theoretic perspective of communication helps one to understand the storage and processing of information in one-dimensional sequences. An information theoretic analysis of 937 available completely sequenced prokaryotic genomes and 238 eukaryotic chromosomes is presented. Information content (Id) values were used to cluster these chromosomes. Chargaff’s second parity rule i.e compositional self-complementarity, an empirical fact is observed in all the genomes, except for the proteobacteria Candidatus Hodgkinia cicadicola. High information content, arising out of biased base composition in all the 14 chromosomes of Plasmodium falciparum is found among two other genomes of prokaryotes viz. Buchnera aphidicola str. Cc (Cinara cedri) and Candidatus Carsonella ruddii PV. Despite size and compositional variations, both prokaryotic and eukaryotic genomes do not deviate significantly from an equiprobable and random situation. Eukaryotic chromosomes of an organism tend to have similar informational restraints as seen when a simple distance based method is used to cluster them. In eukaryotes, in certain cases, Id values are also similar for the two arms (p and q arm) of the chromosomes. The results of this current study confirm that the information content can provide insights into the clustering of genomes and the evolution of messaging strategies of the genomes. An efficient and robust Perl CGI standalone tool is created based on this information theory algorithm for the analysis of the whole genomes and is made available at https://github.com/AlagurajVeluchamy/InformationTheory.</abstract><cop>AMSTERDAM</cop><pub>Elsevier B.V</pub><pmid>33732074</pmid><doi>10.1016/j.sjbs.2020.12.039</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0001-5476-6114</orcidid><orcidid>https://orcid.org/0000-0002-5349-5794</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1319-562X
ispartof Saudi journal of biological sciences, 2021-03, Vol.28 (3), p.1867-1889
issn 1319-562X
2213-7106
language eng
recordid cdi_webofscience_primary_000631757500014CitationCount
source Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; Access via ScienceDirect (Elsevier); EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Biology
Genome arrangement
Genome clustering
Genome evolution
Information theory
Life Sciences & Biomedicine
Life Sciences & Biomedicine - Other Topics
Nucleotide distribution
Original
Science & Technology
Shannon redundancy
title Information theoretic perspective on genome clustering
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T11%3A40%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_webof&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Information%20theoretic%20perspective%20on%20genome%20clustering&rft.jtitle=Saudi%20journal%20of%20biological%20sciences&rft.au=Veluchamy,%20Alaguraj&rft.date=2021-03-01&rft.volume=28&rft.issue=3&rft.spage=1867&rft.epage=1889&rft.pages=1867-1889&rft.issn=1319-562X&rft.eissn=2213-7106&rft_id=info:doi/10.1016/j.sjbs.2020.12.039&rft_dat=%3Cproquest_webof%3E2502803621%3C/proquest_webof%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2502803621&rft_id=info:pmid/33732074&rft_els_id=S1319562X20307038&rfr_iscdi=true