A time series representation of protein sequences for similarity comparison
•We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach....
Gespeichert in:
Veröffentlicht in: | Journal of theoretical biology 2022-04, Vol.538, p.111039-111039, Article 111039 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach.
Based on the physicochemical indexes of 20 amino acids and the Hungarian algorithm, each amino acid was mapped into a vector. And, the protein sequence can be represented as time series in eleven-dimensional space. In addition, the DTW algorithm was applied to calculate the distance between two time series to compare the similarities of protein sequences. The validity and accuracy of this method was illustrated by similarity comparison of ND5 proteins of nine species. Furthermore, homology analysis of eleven ACE2 proteins, which included human, Malayan pangolin and six species of bats, confirmed that the human had shorter evolutionary distance from the pangolin than those bats. The phylogenetic tree of spike protein sequences of 36 coronaviruses, which were divided into five groups, Class I, Class II, Class III, SARS-CoVs and COVID-19, was constructed. |
---|---|
ISSN: | 0022-5193 1095-8541 |
DOI: | 10.1016/j.jtbi.2022.111039 |