Genome sequence comparison under a new form of tri-nucleotide representation based on bio-chemical properties of nucleotides

•A new tri-nucleotide representation is proposed for genome sequence comparison.•Representation is non-degenerate and it is based on bio-chemical properties of the nucleotides.•Simple Euclidian distance measure is applied for sequence comparison.•Method is not dependent on the alignment of the seque...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Gene 2020-03, Vol.730, p.144257-144257, Article 144257
Hauptverfasser: Das, Subhram, Das, Arijit, Mondal, Bingshati, Dey, Nilanjan, Bhattacharya, D.K., Tibarewala, D.N.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 144257
container_issue
container_start_page 144257
container_title Gene
container_volume 730
creator Das, Subhram
Das, Arijit
Mondal, Bingshati
Dey, Nilanjan
Bhattacharya, D.K.
Tibarewala, D.N.
description •A new tri-nucleotide representation is proposed for genome sequence comparison.•Representation is non-degenerate and it is based on bio-chemical properties of the nucleotides.•Simple Euclidian distance measure is applied for sequence comparison.•Method is not dependent on the alignment of the sequences.•Results of proposed method are verified for all possible genome sequences. Genetic sequence analysis, classification of genome sequence and evolutionary relationship between species using their biological sequences, are the emerging research domain in Bioinformatics. Several methods have already been applied to DNA sequence comparison under tri-nucleotide representation. In this paper, a new form of tri-nucleotide representation is proposed for sequence comparison. The comparison does not depend on the alignment of the sequences. In this representation, the bio-chemical properties of the nucleotides are considered. The novelty of this method is that the sequences of unequal lengths are represented by vectors of the same length and each of the tri-nucleotide formed out of the given sequence has its unique representation. To validate the proposed method, it is verified on several data sets related to mammalians, viruses and bacteria. The results of this method are further compared with those obtained by methods such as probabilistic method, natural vector method, Fourier power spectrum method, multiple encoding vector method, and feature frequency profiles method. Moreover, this method produces accurate phylogeny in all the cases. It is also proved that the time complexity of the present method is less.
doi_str_mv 10.1016/j.gene.2019.144257
format Article
fullrecord <record><control><sourceid>proquest_webof</sourceid><recordid>TN_cdi_proquest_miscellaneous_2317963879</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0378111919309163</els_id><sourcerecordid>2317963879</sourcerecordid><originalsourceid>FETCH-LOGICAL-c356t-a6b7055c729790ef21785e1d265c9156d8aef9fa1587092c77ef3f7443d7c7063</originalsourceid><addsrcrecordid>eNqNkU2LFDEQhoMo7rj6BzxIjoL0mI9JpwNeZNBVWPCi55BJKpqhO2mTtIvgjzdNj-tNzCV1eN6i6imEnlOyp4T2r8_7rxBhzwhVe3o4MCEfoB0dpOoI4cNDtCNcDh2lVF2hJ6WcSXtCsMfoilMplBr4Dv26gZgmwAW-LxAtYJum2eRQUsRLdJCxwRHusE95wsnjmkMXFztCqsEBzjBnKBCrqaElTqaAw2sRUme_wRSsGfGc0wy5Bihrh7_p8hQ98mYs8OzyX6Mv7999Pn7obj_dfDy-ve0sF33tTH-SbXArmZKKgGdUDgKoY72wioreDQa88oaKQRLFrJTguZeHA3fSStLza_Ry69smaWuWqqdQLIyjiZCWolnzoXrezDWUbajNqZQMXs85TCb_1JTo1bo-69W6Xq3rzXoLvbj0X04TuPvIH80NeLUBd3BKvtiwqr7H1rNQIjiTraKs0cP_08ewuT-mJdYWfbNFoen8ESDrS9yFDLZql8K_FvkNocq1kA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2317963879</pqid></control><display><type>article</type><title>Genome sequence comparison under a new form of tri-nucleotide representation based on bio-chemical properties of nucleotides</title><source>MEDLINE</source><source>Web of Science - Science Citation Index Expanded - 2020&lt;img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /&gt;</source><source>Access via ScienceDirect (Elsevier)</source><creator>Das, Subhram ; Das, Arijit ; Mondal, Bingshati ; Dey, Nilanjan ; Bhattacharya, D.K. ; Tibarewala, D.N.</creator><creatorcontrib>Das, Subhram ; Das, Arijit ; Mondal, Bingshati ; Dey, Nilanjan ; Bhattacharya, D.K. ; Tibarewala, D.N.</creatorcontrib><description>•A new tri-nucleotide representation is proposed for genome sequence comparison.•Representation is non-degenerate and it is based on bio-chemical properties of the nucleotides.•Simple Euclidian distance measure is applied for sequence comparison.•Method is not dependent on the alignment of the sequences.•Results of proposed method are verified for all possible genome sequences. Genetic sequence analysis, classification of genome sequence and evolutionary relationship between species using their biological sequences, are the emerging research domain in Bioinformatics. Several methods have already been applied to DNA sequence comparison under tri-nucleotide representation. In this paper, a new form of tri-nucleotide representation is proposed for sequence comparison. The comparison does not depend on the alignment of the sequences. In this representation, the bio-chemical properties of the nucleotides are considered. The novelty of this method is that the sequences of unequal lengths are represented by vectors of the same length and each of the tri-nucleotide formed out of the given sequence has its unique representation. To validate the proposed method, it is verified on several data sets related to mammalians, viruses and bacteria. The results of this method are further compared with those obtained by methods such as probabilistic method, natural vector method, Fourier power spectrum method, multiple encoding vector method, and feature frequency profiles method. Moreover, this method produces accurate phylogeny in all the cases. It is also proved that the time complexity of the present method is less.</description><identifier>ISSN: 0378-1119</identifier><identifier>EISSN: 1879-0038</identifier><identifier>DOI: 10.1016/j.gene.2019.144257</identifier><identifier>PMID: 31759983</identifier><language>eng</language><publisher>AMSTERDAM: Elsevier B.V</publisher><subject>Algorithms ; Alignment-free method ; Animals ; Bacteria - genetics ; Base Sequence ; Chromosome Mapping - methods ; Cluster Analysis ; Computational Biology - methods ; Evolutionary relationship ; Genetics &amp; Heredity ; Genomics - methods ; Humans ; Life Sciences &amp; Biomedicine ; Mammals - genetics ; Nucleotides - chemistry ; Nucleotides - genetics ; Phylogeny ; Science &amp; Technology ; Sequence Alignment ; Sequence Analysis, DNA - methods ; Sequence comparison ; Tri-nucleotide ; Trinucleotide Repeats - genetics ; Viruses - genetics</subject><ispartof>Gene, 2020-03, Vol.730, p.144257-144257, Article 144257</ispartof><rights>2019 Elsevier B.V.</rights><rights>Copyright © 2019 Elsevier B.V. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>7</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000510532700012</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c356t-a6b7055c729790ef21785e1d265c9156d8aef9fa1587092c77ef3f7443d7c7063</citedby><cites>FETCH-LOGICAL-c356t-a6b7055c729790ef21785e1d265c9156d8aef9fa1587092c77ef3f7443d7c7063</cites><orcidid>0000-0003-2899-4433</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.gene.2019.144257$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>315,781,785,3551,27929,27930,28253,46000</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31759983$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Das, Subhram</creatorcontrib><creatorcontrib>Das, Arijit</creatorcontrib><creatorcontrib>Mondal, Bingshati</creatorcontrib><creatorcontrib>Dey, Nilanjan</creatorcontrib><creatorcontrib>Bhattacharya, D.K.</creatorcontrib><creatorcontrib>Tibarewala, D.N.</creatorcontrib><title>Genome sequence comparison under a new form of tri-nucleotide representation based on bio-chemical properties of nucleotides</title><title>Gene</title><addtitle>GENE</addtitle><addtitle>Gene</addtitle><description>•A new tri-nucleotide representation is proposed for genome sequence comparison.•Representation is non-degenerate and it is based on bio-chemical properties of the nucleotides.•Simple Euclidian distance measure is applied for sequence comparison.•Method is not dependent on the alignment of the sequences.•Results of proposed method are verified for all possible genome sequences. Genetic sequence analysis, classification of genome sequence and evolutionary relationship between species using their biological sequences, are the emerging research domain in Bioinformatics. Several methods have already been applied to DNA sequence comparison under tri-nucleotide representation. In this paper, a new form of tri-nucleotide representation is proposed for sequence comparison. The comparison does not depend on the alignment of the sequences. In this representation, the bio-chemical properties of the nucleotides are considered. The novelty of this method is that the sequences of unequal lengths are represented by vectors of the same length and each of the tri-nucleotide formed out of the given sequence has its unique representation. To validate the proposed method, it is verified on several data sets related to mammalians, viruses and bacteria. The results of this method are further compared with those obtained by methods such as probabilistic method, natural vector method, Fourier power spectrum method, multiple encoding vector method, and feature frequency profiles method. Moreover, this method produces accurate phylogeny in all the cases. It is also proved that the time complexity of the present method is less.</description><subject>Algorithms</subject><subject>Alignment-free method</subject><subject>Animals</subject><subject>Bacteria - genetics</subject><subject>Base Sequence</subject><subject>Chromosome Mapping - methods</subject><subject>Cluster Analysis</subject><subject>Computational Biology - methods</subject><subject>Evolutionary relationship</subject><subject>Genetics &amp; Heredity</subject><subject>Genomics - methods</subject><subject>Humans</subject><subject>Life Sciences &amp; Biomedicine</subject><subject>Mammals - genetics</subject><subject>Nucleotides - chemistry</subject><subject>Nucleotides - genetics</subject><subject>Phylogeny</subject><subject>Science &amp; Technology</subject><subject>Sequence Alignment</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Sequence comparison</subject><subject>Tri-nucleotide</subject><subject>Trinucleotide Repeats - genetics</subject><subject>Viruses - genetics</subject><issn>0378-1119</issn><issn>1879-0038</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>AOWDO</sourceid><sourceid>EIF</sourceid><recordid>eNqNkU2LFDEQhoMo7rj6BzxIjoL0mI9JpwNeZNBVWPCi55BJKpqhO2mTtIvgjzdNj-tNzCV1eN6i6imEnlOyp4T2r8_7rxBhzwhVe3o4MCEfoB0dpOoI4cNDtCNcDh2lVF2hJ6WcSXtCsMfoilMplBr4Dv26gZgmwAW-LxAtYJum2eRQUsRLdJCxwRHusE95wsnjmkMXFztCqsEBzjBnKBCrqaElTqaAw2sRUme_wRSsGfGc0wy5Bihrh7_p8hQ98mYs8OzyX6Mv7999Pn7obj_dfDy-ve0sF33tTH-SbXArmZKKgGdUDgKoY72wioreDQa88oaKQRLFrJTguZeHA3fSStLza_Ry69smaWuWqqdQLIyjiZCWolnzoXrezDWUbajNqZQMXs85TCb_1JTo1bo-69W6Xq3rzXoLvbj0X04TuPvIH80NeLUBd3BKvtiwqr7H1rNQIjiTraKs0cP_08ewuT-mJdYWfbNFoen8ESDrS9yFDLZql8K_FvkNocq1kA</recordid><startdate>20200310</startdate><enddate>20200310</enddate><creator>Das, Subhram</creator><creator>Das, Arijit</creator><creator>Mondal, Bingshati</creator><creator>Dey, Nilanjan</creator><creator>Bhattacharya, D.K.</creator><creator>Tibarewala, D.N.</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>AOWDO</scope><scope>BLEPL</scope><scope>DTL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2899-4433</orcidid></search><sort><creationdate>20200310</creationdate><title>Genome sequence comparison under a new form of tri-nucleotide representation based on bio-chemical properties of nucleotides</title><author>Das, Subhram ; Das, Arijit ; Mondal, Bingshati ; Dey, Nilanjan ; Bhattacharya, D.K. ; Tibarewala, D.N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c356t-a6b7055c729790ef21785e1d265c9156d8aef9fa1587092c77ef3f7443d7c7063</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Alignment-free method</topic><topic>Animals</topic><topic>Bacteria - genetics</topic><topic>Base Sequence</topic><topic>Chromosome Mapping - methods</topic><topic>Cluster Analysis</topic><topic>Computational Biology - methods</topic><topic>Evolutionary relationship</topic><topic>Genetics &amp; Heredity</topic><topic>Genomics - methods</topic><topic>Humans</topic><topic>Life Sciences &amp; Biomedicine</topic><topic>Mammals - genetics</topic><topic>Nucleotides - chemistry</topic><topic>Nucleotides - genetics</topic><topic>Phylogeny</topic><topic>Science &amp; Technology</topic><topic>Sequence Alignment</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Sequence comparison</topic><topic>Tri-nucleotide</topic><topic>Trinucleotide Repeats - genetics</topic><topic>Viruses - genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Das, Subhram</creatorcontrib><creatorcontrib>Das, Arijit</creatorcontrib><creatorcontrib>Mondal, Bingshati</creatorcontrib><creatorcontrib>Dey, Nilanjan</creatorcontrib><creatorcontrib>Bhattacharya, D.K.</creatorcontrib><creatorcontrib>Tibarewala, D.N.</creatorcontrib><collection>Web of Science - Science Citation Index Expanded - 2020</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Gene</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Das, Subhram</au><au>Das, Arijit</au><au>Mondal, Bingshati</au><au>Dey, Nilanjan</au><au>Bhattacharya, D.K.</au><au>Tibarewala, D.N.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Genome sequence comparison under a new form of tri-nucleotide representation based on bio-chemical properties of nucleotides</atitle><jtitle>Gene</jtitle><stitle>GENE</stitle><addtitle>Gene</addtitle><date>2020-03-10</date><risdate>2020</risdate><volume>730</volume><spage>144257</spage><epage>144257</epage><pages>144257-144257</pages><artnum>144257</artnum><issn>0378-1119</issn><eissn>1879-0038</eissn><abstract>•A new tri-nucleotide representation is proposed for genome sequence comparison.•Representation is non-degenerate and it is based on bio-chemical properties of the nucleotides.•Simple Euclidian distance measure is applied for sequence comparison.•Method is not dependent on the alignment of the sequences.•Results of proposed method are verified for all possible genome sequences. Genetic sequence analysis, classification of genome sequence and evolutionary relationship between species using their biological sequences, are the emerging research domain in Bioinformatics. Several methods have already been applied to DNA sequence comparison under tri-nucleotide representation. In this paper, a new form of tri-nucleotide representation is proposed for sequence comparison. The comparison does not depend on the alignment of the sequences. In this representation, the bio-chemical properties of the nucleotides are considered. The novelty of this method is that the sequences of unequal lengths are represented by vectors of the same length and each of the tri-nucleotide formed out of the given sequence has its unique representation. To validate the proposed method, it is verified on several data sets related to mammalians, viruses and bacteria. The results of this method are further compared with those obtained by methods such as probabilistic method, natural vector method, Fourier power spectrum method, multiple encoding vector method, and feature frequency profiles method. Moreover, this method produces accurate phylogeny in all the cases. It is also proved that the time complexity of the present method is less.</abstract><cop>AMSTERDAM</cop><pub>Elsevier B.V</pub><pmid>31759983</pmid><doi>10.1016/j.gene.2019.144257</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-2899-4433</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0378-1119
ispartof Gene, 2020-03, Vol.730, p.144257-144257, Article 144257
issn 0378-1119
1879-0038
language eng
recordid cdi_proquest_miscellaneous_2317963879
source MEDLINE; Web of Science - Science Citation Index Expanded - 2020<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; Access via ScienceDirect (Elsevier)
subjects Algorithms
Alignment-free method
Animals
Bacteria - genetics
Base Sequence
Chromosome Mapping - methods
Cluster Analysis
Computational Biology - methods
Evolutionary relationship
Genetics & Heredity
Genomics - methods
Humans
Life Sciences & Biomedicine
Mammals - genetics
Nucleotides - chemistry
Nucleotides - genetics
Phylogeny
Science & Technology
Sequence Alignment
Sequence Analysis, DNA - methods
Sequence comparison
Tri-nucleotide
Trinucleotide Repeats - genetics
Viruses - genetics
title Genome sequence comparison under a new form of tri-nucleotide representation based on bio-chemical properties of nucleotides
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T20%3A39%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_webof&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Genome%20sequence%20comparison%20under%20a%20new%20form%20of%20tri-nucleotide%20representation%20based%20on%20bio-chemical%20properties%20of%20nucleotides&rft.jtitle=Gene&rft.au=Das,%20Subhram&rft.date=2020-03-10&rft.volume=730&rft.spage=144257&rft.epage=144257&rft.pages=144257-144257&rft.artnum=144257&rft.issn=0378-1119&rft.eissn=1879-0038&rft_id=info:doi/10.1016/j.gene.2019.144257&rft_dat=%3Cproquest_webof%3E2317963879%3C/proquest_webof%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2317963879&rft_id=info:pmid/31759983&rft_els_id=S0378111919309163&rfr_iscdi=true