A time series representation of protein sequences for similarity comparison

•We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of theoretical biology 2022-04, Vol.538, p.111039-111039, Article 111039
Hauptverfasser: Li, Cancan, Dai, Qi, He, Ping-an
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 111039
container_issue
container_start_page 111039
container_title Journal of theoretical biology
container_volume 538
creator Li, Cancan
Dai, Qi
He, Ping-an
description •We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach. Based on the physicochemical indexes of 20 amino acids and the Hungarian algorithm, each amino acid was mapped into a vector. And, the protein sequence can be represented as time series in eleven-dimensional space. In addition, the DTW algorithm was applied to calculate the distance between two time series to compare the similarities of protein sequences. The validity and accuracy of this method was illustrated by similarity comparison of ND5 proteins of nine species. Furthermore, homology analysis of eleven ACE2 proteins, which included human, Malayan pangolin and six species of bats, confirmed that the human had shorter evolutionary distance from the pangolin than those bats. The phylogenetic tree of spike protein sequences of 36 coronaviruses, which were divided into five groups, Class I, Class II, Class III, SARS-CoVs and COVID-19, was constructed.
doi_str_mv 10.1016/j.jtbi.2022.111039
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2623883991</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0022519322000376</els_id><sourcerecordid>2623883991</sourcerecordid><originalsourceid>FETCH-LOGICAL-c356t-c3e1c8b0a9c5adda7795c461b6dc2f7665e152414e961d45e30792aae9c4f5513</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMo7rr6BzxIj1665qNJG_CyiF-44EXPIU2nkNI2NckK--9N2dWjl0xgnnmZeRC6JnhNMBF33bqLtV1TTOmaEIKZPEFLgiXPK16QU7TEqZNzItkCXYTQYYxlwcQ5WjCOK85ZsURvmyzaAbIA3kLIPEweAoxRR-vGzLXZ5F0EOybgawejSUzrfBbsYHvtbdxnxg1T-gU3XqKzVvcBro51hT6fHj8eXvLt-_Prw2abG8ZFTC8QU9VYS8N10-iylNwUgtSiMbQtheBAOC1IAVKQpuDAcCmp1iBN0XJO2ArdHnLTbmmpENVgg4G-1yO4XVBUUFZVTMoZpQfUeBeCh1ZN3g7a7xXBapaoOjVLVLNEdZCYhm6O-bt6gOZv5NdaAu4PAKQrvy14FYyd5TTWg4mqcfa__B-JZoNU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2623883991</pqid></control><display><type>article</type><title>A time series representation of protein sequences for similarity comparison</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Li, Cancan ; Dai, Qi ; He, Ping-an</creator><creatorcontrib>Li, Cancan ; Dai, Qi ; He, Ping-an</creatorcontrib><description>•We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach. Based on the physicochemical indexes of 20 amino acids and the Hungarian algorithm, each amino acid was mapped into a vector. And, the protein sequence can be represented as time series in eleven-dimensional space. In addition, the DTW algorithm was applied to calculate the distance between two time series to compare the similarities of protein sequences. The validity and accuracy of this method was illustrated by similarity comparison of ND5 proteins of nine species. Furthermore, homology analysis of eleven ACE2 proteins, which included human, Malayan pangolin and six species of bats, confirmed that the human had shorter evolutionary distance from the pangolin than those bats. The phylogenetic tree of spike protein sequences of 36 coronaviruses, which were divided into five groups, Class I, Class II, Class III, SARS-CoVs and COVID-19, was constructed.</description><identifier>ISSN: 0022-5193</identifier><identifier>EISSN: 1095-8541</identifier><identifier>DOI: 10.1016/j.jtbi.2022.111039</identifier><identifier>PMID: 35085534</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Amino Acid Sequence ; Animals ; Chiroptera ; Coronaviruses ; COVID-19 ; DTW algorithm ; Humans ; Hungarian algorithm ; Phylogeny ; SARS-CoV-2 - genetics ; Substitution matrix ; Time Factors ; Time series</subject><ispartof>Journal of theoretical biology, 2022-04, Vol.538, p.111039-111039, Article 111039</ispartof><rights>2022 Elsevier Ltd</rights><rights>Copyright © 2022 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c356t-c3e1c8b0a9c5adda7795c461b6dc2f7665e152414e961d45e30792aae9c4f5513</citedby><cites>FETCH-LOGICAL-c356t-c3e1c8b0a9c5adda7795c461b6dc2f7665e152414e961d45e30792aae9c4f5513</cites><orcidid>0000-0003-3749-1483</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0022519322000376$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27903,27904,65309</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35085534$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Cancan</creatorcontrib><creatorcontrib>Dai, Qi</creatorcontrib><creatorcontrib>He, Ping-an</creatorcontrib><title>A time series representation of protein sequences for similarity comparison</title><title>Journal of theoretical biology</title><addtitle>J Theor Biol</addtitle><description>•We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach. Based on the physicochemical indexes of 20 amino acids and the Hungarian algorithm, each amino acid was mapped into a vector. And, the protein sequence can be represented as time series in eleven-dimensional space. In addition, the DTW algorithm was applied to calculate the distance between two time series to compare the similarities of protein sequences. The validity and accuracy of this method was illustrated by similarity comparison of ND5 proteins of nine species. Furthermore, homology analysis of eleven ACE2 proteins, which included human, Malayan pangolin and six species of bats, confirmed that the human had shorter evolutionary distance from the pangolin than those bats. The phylogenetic tree of spike protein sequences of 36 coronaviruses, which were divided into five groups, Class I, Class II, Class III, SARS-CoVs and COVID-19, was constructed.</description><subject>Amino Acid Sequence</subject><subject>Animals</subject><subject>Chiroptera</subject><subject>Coronaviruses</subject><subject>COVID-19</subject><subject>DTW algorithm</subject><subject>Humans</subject><subject>Hungarian algorithm</subject><subject>Phylogeny</subject><subject>SARS-CoV-2 - genetics</subject><subject>Substitution matrix</subject><subject>Time Factors</subject><subject>Time series</subject><issn>0022-5193</issn><issn>1095-8541</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kE1LxDAQhoMo7rr6BzxIj1665qNJG_CyiF-44EXPIU2nkNI2NckK--9N2dWjl0xgnnmZeRC6JnhNMBF33bqLtV1TTOmaEIKZPEFLgiXPK16QU7TEqZNzItkCXYTQYYxlwcQ5WjCOK85ZsURvmyzaAbIA3kLIPEweAoxRR-vGzLXZ5F0EOybgawejSUzrfBbsYHvtbdxnxg1T-gU3XqKzVvcBro51hT6fHj8eXvLt-_Prw2abG8ZFTC8QU9VYS8N10-iylNwUgtSiMbQtheBAOC1IAVKQpuDAcCmp1iBN0XJO2ArdHnLTbmmpENVgg4G-1yO4XVBUUFZVTMoZpQfUeBeCh1ZN3g7a7xXBapaoOjVLVLNEdZCYhm6O-bt6gOZv5NdaAu4PAKQrvy14FYyd5TTWg4mqcfa__B-JZoNU</recordid><startdate>20220407</startdate><enddate>20220407</enddate><creator>Li, Cancan</creator><creator>Dai, Qi</creator><creator>He, Ping-an</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-3749-1483</orcidid></search><sort><creationdate>20220407</creationdate><title>A time series representation of protein sequences for similarity comparison</title><author>Li, Cancan ; Dai, Qi ; He, Ping-an</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c356t-c3e1c8b0a9c5adda7795c461b6dc2f7665e152414e961d45e30792aae9c4f5513</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Amino Acid Sequence</topic><topic>Animals</topic><topic>Chiroptera</topic><topic>Coronaviruses</topic><topic>COVID-19</topic><topic>DTW algorithm</topic><topic>Humans</topic><topic>Hungarian algorithm</topic><topic>Phylogeny</topic><topic>SARS-CoV-2 - genetics</topic><topic>Substitution matrix</topic><topic>Time Factors</topic><topic>Time series</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Cancan</creatorcontrib><creatorcontrib>Dai, Qi</creatorcontrib><creatorcontrib>He, Ping-an</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of theoretical biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Cancan</au><au>Dai, Qi</au><au>He, Ping-an</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A time series representation of protein sequences for similarity comparison</atitle><jtitle>Journal of theoretical biology</jtitle><addtitle>J Theor Biol</addtitle><date>2022-04-07</date><risdate>2022</risdate><volume>538</volume><spage>111039</spage><epage>111039</epage><pages>111039-111039</pages><artnum>111039</artnum><issn>0022-5193</issn><eissn>1095-8541</eissn><abstract>•We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach. Based on the physicochemical indexes of 20 amino acids and the Hungarian algorithm, each amino acid was mapped into a vector. And, the protein sequence can be represented as time series in eleven-dimensional space. In addition, the DTW algorithm was applied to calculate the distance between two time series to compare the similarities of protein sequences. The validity and accuracy of this method was illustrated by similarity comparison of ND5 proteins of nine species. Furthermore, homology analysis of eleven ACE2 proteins, which included human, Malayan pangolin and six species of bats, confirmed that the human had shorter evolutionary distance from the pangolin than those bats. The phylogenetic tree of spike protein sequences of 36 coronaviruses, which were divided into five groups, Class I, Class II, Class III, SARS-CoVs and COVID-19, was constructed.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>35085534</pmid><doi>10.1016/j.jtbi.2022.111039</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-3749-1483</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0022-5193
ispartof Journal of theoretical biology, 2022-04, Vol.538, p.111039-111039, Article 111039
issn 0022-5193
1095-8541
language eng
recordid cdi_proquest_miscellaneous_2623883991
source MEDLINE; Elsevier ScienceDirect Journals
subjects Amino Acid Sequence
Animals
Chiroptera
Coronaviruses
COVID-19
DTW algorithm
Humans
Hungarian algorithm
Phylogeny
SARS-CoV-2 - genetics
Substitution matrix
Time Factors
Time series
title A time series representation of protein sequences for similarity comparison
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T00%3A12%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20time%20series%20representation%20of%20protein%20sequences%20for%20similarity%20comparison&rft.jtitle=Journal%20of%20theoretical%20biology&rft.au=Li,%20Cancan&rft.date=2022-04-07&rft.volume=538&rft.spage=111039&rft.epage=111039&rft.pages=111039-111039&rft.artnum=111039&rft.issn=0022-5193&rft.eissn=1095-8541&rft_id=info:doi/10.1016/j.jtbi.2022.111039&rft_dat=%3Cproquest_cross%3E2623883991%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2623883991&rft_id=info:pmid/35085534&rft_els_id=S0022519322000376&rfr_iscdi=true