Genome sequence comparison under a new form of tri-nucleotide representation based on bio-chemical properties of nucleotides
•A new tri-nucleotide representation is proposed for genome sequence comparison.•Representation is non-degenerate and it is based on bio-chemical properties of the nucleotides.•Simple Euclidian distance measure is applied for sequence comparison.•Method is not dependent on the alignment of the seque...
Gespeichert in:
Veröffentlicht in: | Gene 2020-03, Vol.730, p.144257-144257, Article 144257 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A new tri-nucleotide representation is proposed for genome sequence comparison.•Representation is non-degenerate and it is based on bio-chemical properties of the nucleotides.•Simple Euclidian distance measure is applied for sequence comparison.•Method is not dependent on the alignment of the sequences.•Results of proposed method are verified for all possible genome sequences.
Genetic sequence analysis, classification of genome sequence and evolutionary relationship between species using their biological sequences, are the emerging research domain in Bioinformatics. Several methods have already been applied to DNA sequence comparison under tri-nucleotide representation. In this paper, a new form of tri-nucleotide representation is proposed for sequence comparison. The comparison does not depend on the alignment of the sequences. In this representation, the bio-chemical properties of the nucleotides are considered. The novelty of this method is that the sequences of unequal lengths are represented by vectors of the same length and each of the tri-nucleotide formed out of the given sequence has its unique representation. To validate the proposed method, it is verified on several data sets related to mammalians, viruses and bacteria. The results of this method are further compared with those obtained by methods such as probabilistic method, natural vector method, Fourier power spectrum method, multiple encoding vector method, and feature frequency profiles method. Moreover, this method produces accurate phylogeny in all the cases. It is also proved that the time complexity of the present method is less. |
---|---|
ISSN: | 0378-1119 1879-0038 |
DOI: | 10.1016/j.gene.2019.144257 |