A time series representation of protein sequences for similarity comparison
•We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach....
Gespeichert in:
Veröffentlicht in: | Journal of theoretical biology 2022-04, Vol.538, p.111039-111039, Article 111039 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 111039 |
---|---|
container_issue | |
container_start_page | 111039 |
container_title | Journal of theoretical biology |
container_volume | 538 |
creator | Li, Cancan Dai, Qi He, Ping-an |
description | •We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach.
Based on the physicochemical indexes of 20 amino acids and the Hungarian algorithm, each amino acid was mapped into a vector. And, the protein sequence can be represented as time series in eleven-dimensional space. In addition, the DTW algorithm was applied to calculate the distance between two time series to compare the similarities of protein sequences. The validity and accuracy of this method was illustrated by similarity comparison of ND5 proteins of nine species. Furthermore, homology analysis of eleven ACE2 proteins, which included human, Malayan pangolin and six species of bats, confirmed that the human had shorter evolutionary distance from the pangolin than those bats. The phylogenetic tree of spike protein sequences of 36 coronaviruses, which were divided into five groups, Class I, Class II, Class III, SARS-CoVs and COVID-19, was constructed. |
doi_str_mv | 10.1016/j.jtbi.2022.111039 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2623883991</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0022519322000376</els_id><sourcerecordid>2623883991</sourcerecordid><originalsourceid>FETCH-LOGICAL-c356t-c3e1c8b0a9c5adda7795c461b6dc2f7665e152414e961d45e30792aae9c4f5513</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMo7rr6BzxIj1665qNJG_CyiF-44EXPIU2nkNI2NckK--9N2dWjl0xgnnmZeRC6JnhNMBF33bqLtV1TTOmaEIKZPEFLgiXPK16QU7TEqZNzItkCXYTQYYxlwcQ5WjCOK85ZsURvmyzaAbIA3kLIPEweAoxRR-vGzLXZ5F0EOybgawejSUzrfBbsYHvtbdxnxg1T-gU3XqKzVvcBro51hT6fHj8eXvLt-_Prw2abG8ZFTC8QU9VYS8N10-iylNwUgtSiMbQtheBAOC1IAVKQpuDAcCmp1iBN0XJO2ArdHnLTbmmpENVgg4G-1yO4XVBUUFZVTMoZpQfUeBeCh1ZN3g7a7xXBapaoOjVLVLNEdZCYhm6O-bt6gOZv5NdaAu4PAKQrvy14FYyd5TTWg4mqcfa__B-JZoNU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2623883991</pqid></control><display><type>article</type><title>A time series representation of protein sequences for similarity comparison</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Li, Cancan ; Dai, Qi ; He, Ping-an</creator><creatorcontrib>Li, Cancan ; Dai, Qi ; He, Ping-an</creatorcontrib><description>•We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach.
Based on the physicochemical indexes of 20 amino acids and the Hungarian algorithm, each amino acid was mapped into a vector. And, the protein sequence can be represented as time series in eleven-dimensional space. In addition, the DTW algorithm was applied to calculate the distance between two time series to compare the similarities of protein sequences. The validity and accuracy of this method was illustrated by similarity comparison of ND5 proteins of nine species. Furthermore, homology analysis of eleven ACE2 proteins, which included human, Malayan pangolin and six species of bats, confirmed that the human had shorter evolutionary distance from the pangolin than those bats. The phylogenetic tree of spike protein sequences of 36 coronaviruses, which were divided into five groups, Class I, Class II, Class III, SARS-CoVs and COVID-19, was constructed.</description><identifier>ISSN: 0022-5193</identifier><identifier>EISSN: 1095-8541</identifier><identifier>DOI: 10.1016/j.jtbi.2022.111039</identifier><identifier>PMID: 35085534</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Amino Acid Sequence ; Animals ; Chiroptera ; Coronaviruses ; COVID-19 ; DTW algorithm ; Humans ; Hungarian algorithm ; Phylogeny ; SARS-CoV-2 - genetics ; Substitution matrix ; Time Factors ; Time series</subject><ispartof>Journal of theoretical biology, 2022-04, Vol.538, p.111039-111039, Article 111039</ispartof><rights>2022 Elsevier Ltd</rights><rights>Copyright © 2022 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c356t-c3e1c8b0a9c5adda7795c461b6dc2f7665e152414e961d45e30792aae9c4f5513</citedby><cites>FETCH-LOGICAL-c356t-c3e1c8b0a9c5adda7795c461b6dc2f7665e152414e961d45e30792aae9c4f5513</cites><orcidid>0000-0003-3749-1483</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0022519322000376$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27903,27904,65309</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35085534$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Cancan</creatorcontrib><creatorcontrib>Dai, Qi</creatorcontrib><creatorcontrib>He, Ping-an</creatorcontrib><title>A time series representation of protein sequences for similarity comparison</title><title>Journal of theoretical biology</title><addtitle>J Theor Biol</addtitle><description>•We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach.
Based on the physicochemical indexes of 20 amino acids and the Hungarian algorithm, each amino acid was mapped into a vector. And, the protein sequence can be represented as time series in eleven-dimensional space. In addition, the DTW algorithm was applied to calculate the distance between two time series to compare the similarities of protein sequences. The validity and accuracy of this method was illustrated by similarity comparison of ND5 proteins of nine species. Furthermore, homology analysis of eleven ACE2 proteins, which included human, Malayan pangolin and six species of bats, confirmed that the human had shorter evolutionary distance from the pangolin than those bats. The phylogenetic tree of spike protein sequences of 36 coronaviruses, which were divided into five groups, Class I, Class II, Class III, SARS-CoVs and COVID-19, was constructed.</description><subject>Amino Acid Sequence</subject><subject>Animals</subject><subject>Chiroptera</subject><subject>Coronaviruses</subject><subject>COVID-19</subject><subject>DTW algorithm</subject><subject>Humans</subject><subject>Hungarian algorithm</subject><subject>Phylogeny</subject><subject>SARS-CoV-2 - genetics</subject><subject>Substitution matrix</subject><subject>Time Factors</subject><subject>Time series</subject><issn>0022-5193</issn><issn>1095-8541</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kE1LxDAQhoMo7rr6BzxIj1665qNJG_CyiF-44EXPIU2nkNI2NckK--9N2dWjl0xgnnmZeRC6JnhNMBF33bqLtV1TTOmaEIKZPEFLgiXPK16QU7TEqZNzItkCXYTQYYxlwcQ5WjCOK85ZsURvmyzaAbIA3kLIPEweAoxRR-vGzLXZ5F0EOybgawejSUzrfBbsYHvtbdxnxg1T-gU3XqKzVvcBro51hT6fHj8eXvLt-_Prw2abG8ZFTC8QU9VYS8N10-iylNwUgtSiMbQtheBAOC1IAVKQpuDAcCmp1iBN0XJO2ArdHnLTbmmpENVgg4G-1yO4XVBUUFZVTMoZpQfUeBeCh1ZN3g7a7xXBapaoOjVLVLNEdZCYhm6O-bt6gOZv5NdaAu4PAKQrvy14FYyd5TTWg4mqcfa__B-JZoNU</recordid><startdate>20220407</startdate><enddate>20220407</enddate><creator>Li, Cancan</creator><creator>Dai, Qi</creator><creator>He, Ping-an</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-3749-1483</orcidid></search><sort><creationdate>20220407</creationdate><title>A time series representation of protein sequences for similarity comparison</title><author>Li, Cancan ; Dai, Qi ; He, Ping-an</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c356t-c3e1c8b0a9c5adda7795c461b6dc2f7665e152414e961d45e30792aae9c4f5513</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Amino Acid Sequence</topic><topic>Animals</topic><topic>Chiroptera</topic><topic>Coronaviruses</topic><topic>COVID-19</topic><topic>DTW algorithm</topic><topic>Humans</topic><topic>Hungarian algorithm</topic><topic>Phylogeny</topic><topic>SARS-CoV-2 - genetics</topic><topic>Substitution matrix</topic><topic>Time Factors</topic><topic>Time series</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Cancan</creatorcontrib><creatorcontrib>Dai, Qi</creatorcontrib><creatorcontrib>He, Ping-an</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of theoretical biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Cancan</au><au>Dai, Qi</au><au>He, Ping-an</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A time series representation of protein sequences for similarity comparison</atitle><jtitle>Journal of theoretical biology</jtitle><addtitle>J Theor Biol</addtitle><date>2022-04-07</date><risdate>2022</risdate><volume>538</volume><spage>111039</spage><epage>111039</epage><pages>111039-111039</pages><artnum>111039</artnum><issn>0022-5193</issn><eissn>1095-8541</eissn><abstract>•We have mapped each amino acid into a vector based on their physicochemical indexes and the Hungarian algorithm.•We have proposed the 11-D time series representation of protein sequences and compare their similarities by DTW algorithm.•The comparison results show the effectiveness of our approach.
Based on the physicochemical indexes of 20 amino acids and the Hungarian algorithm, each amino acid was mapped into a vector. And, the protein sequence can be represented as time series in eleven-dimensional space. In addition, the DTW algorithm was applied to calculate the distance between two time series to compare the similarities of protein sequences. The validity and accuracy of this method was illustrated by similarity comparison of ND5 proteins of nine species. Furthermore, homology analysis of eleven ACE2 proteins, which included human, Malayan pangolin and six species of bats, confirmed that the human had shorter evolutionary distance from the pangolin than those bats. The phylogenetic tree of spike protein sequences of 36 coronaviruses, which were divided into five groups, Class I, Class II, Class III, SARS-CoVs and COVID-19, was constructed.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>35085534</pmid><doi>10.1016/j.jtbi.2022.111039</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-3749-1483</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0022-5193 |
ispartof | Journal of theoretical biology, 2022-04, Vol.538, p.111039-111039, Article 111039 |
issn | 0022-5193 1095-8541 |
language | eng |
recordid | cdi_proquest_miscellaneous_2623883991 |
source | MEDLINE; Elsevier ScienceDirect Journals |
subjects | Amino Acid Sequence Animals Chiroptera Coronaviruses COVID-19 DTW algorithm Humans Hungarian algorithm Phylogeny SARS-CoV-2 - genetics Substitution matrix Time Factors Time series |
title | A time series representation of protein sequences for similarity comparison |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T00%3A12%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20time%20series%20representation%20of%20protein%20sequences%20for%20similarity%20comparison&rft.jtitle=Journal%20of%20theoretical%20biology&rft.au=Li,%20Cancan&rft.date=2022-04-07&rft.volume=538&rft.spage=111039&rft.epage=111039&rft.pages=111039-111039&rft.artnum=111039&rft.issn=0022-5193&rft.eissn=1095-8541&rft_id=info:doi/10.1016/j.jtbi.2022.111039&rft_dat=%3Cproquest_cross%3E2623883991%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2623883991&rft_id=info:pmid/35085534&rft_els_id=S0022519322000376&rfr_iscdi=true |