A New Scheme for Protein Sequence Motif Extraction

Protein sequence motifs are short conserved subsequences common to related protein sequences. The extraction of sequence motifs in proteins can help classify proteins families and predict protein functions, also provide valuable information about the evolution of species. However, the automatic prot...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Jingyi Yang, Deogun, J.S., Zhaohui Sun
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 280a
container_issue
container_start_page 280a
container_title
container_volume
creator Jingyi Yang
Deogun, J.S.
Zhaohui Sun
description Protein sequence motifs are short conserved subsequences common to related protein sequences. The extraction of sequence motifs in proteins can help classify proteins families and predict protein functions, also provide valuable information about the evolution of species. However, the automatic protein sequence motif extraction is not straightforward because sequence motifs are often inexact and containing gaps. In this paper, we review currently available algorithms for protein sequence motif extraction, and propose a novel scheme to extract protein sequence motifs that allow mismatches and gaps from unaligned protein sequences. This scheme is based on a probabilistic model-Mismatch-allowed Probabilistic Suffix Tree (M-PST). In this scheme, an M-PST is first constructed from the unaligned protein sequences. The subsequences with highest likelihood scores, which are over-represented patterns, are further discovered with the M-PST. These subsequences are probable sequence motifs and outputted along with the position probability matrices.
doi_str_mv 10.1109/HICSS.2005.33
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_1385814</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1385814</ieee_id><sourcerecordid>1385814</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-64b157a35fc14faee369cf73dc56730fd744ca43c205adbf6a89121f42a06f823</originalsourceid><addsrcrecordid>eNotzLlOw0AQANAVh0QIKalo9gdsZmZPl5EVSKRwSE4fbdazYhGxwTYC_p4Cqtc9Ia4RSkSobtebumlKAjClUidiRsZRYb2lU7GonAdnK0NkvTkTMzQKCrRgLsTlOL4CEGi0M0FL-chfsokvfGSZ-kE-D_3EuZMNf3xyF1k-9FNOcvU9DSFOue-uxHkKbyMv_p2L3d1qV6-L7dP9pl5ui1zBVFh9QOOCMimiToFZ2Somp9porFOQWqd1DFpFAhPaQ7LBV0iYNAWwyZOai5u_NjPz_n3IxzD87FF541GrX7b5RJU</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A New Scheme for Protein Sequence Motif Extraction</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Jingyi Yang ; Deogun, J.S. ; Zhaohui Sun</creator><creatorcontrib>Jingyi Yang ; Deogun, J.S. ; Zhaohui Sun</creatorcontrib><description>Protein sequence motifs are short conserved subsequences common to related protein sequences. The extraction of sequence motifs in proteins can help classify proteins families and predict protein functions, also provide valuable information about the evolution of species. However, the automatic protein sequence motif extraction is not straightforward because sequence motifs are often inexact and containing gaps. In this paper, we review currently available algorithms for protein sequence motif extraction, and propose a novel scheme to extract protein sequence motifs that allow mismatches and gaps from unaligned protein sequences. This scheme is based on a probabilistic model-Mismatch-allowed Probabilistic Suffix Tree (M-PST). In this scheme, an M-PST is first constructed from the unaligned protein sequences. The subsequences with highest likelihood scores, which are over-represented patterns, are further discovered with the M-PST. These subsequences are probable sequence motifs and outputted along with the position probability matrices.</description><identifier>ISSN: 1530-1605</identifier><identifier>ISBN: 9780769522685</identifier><identifier>ISBN: 0769522688</identifier><identifier>EISSN: 2572-6862</identifier><identifier>DOI: 10.1109/HICSS.2005.33</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computational biology ; Computer science ; Data mining ; DNA ; Evolution (biology) ; Government ; Protein engineering ; Protein sequence ; Sequences ; Sun</subject><ispartof>Proceedings of the 38th Annual Hawaii International Conference on System Sciences, 2005, p.280a-280a</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1385814$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>310,311,781,785,790,791,2059,4051,4052,27930,54925</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1385814$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jingyi Yang</creatorcontrib><creatorcontrib>Deogun, J.S.</creatorcontrib><creatorcontrib>Zhaohui Sun</creatorcontrib><title>A New Scheme for Protein Sequence Motif Extraction</title><title>Proceedings of the 38th Annual Hawaii International Conference on System Sciences</title><addtitle>HICSS</addtitle><description>Protein sequence motifs are short conserved subsequences common to related protein sequences. The extraction of sequence motifs in proteins can help classify proteins families and predict protein functions, also provide valuable information about the evolution of species. However, the automatic protein sequence motif extraction is not straightforward because sequence motifs are often inexact and containing gaps. In this paper, we review currently available algorithms for protein sequence motif extraction, and propose a novel scheme to extract protein sequence motifs that allow mismatches and gaps from unaligned protein sequences. This scheme is based on a probabilistic model-Mismatch-allowed Probabilistic Suffix Tree (M-PST). In this scheme, an M-PST is first constructed from the unaligned protein sequences. The subsequences with highest likelihood scores, which are over-represented patterns, are further discovered with the M-PST. These subsequences are probable sequence motifs and outputted along with the position probability matrices.</description><subject>Computational biology</subject><subject>Computer science</subject><subject>Data mining</subject><subject>DNA</subject><subject>Evolution (biology)</subject><subject>Government</subject><subject>Protein engineering</subject><subject>Protein sequence</subject><subject>Sequences</subject><subject>Sun</subject><issn>1530-1605</issn><issn>2572-6862</issn><isbn>9780769522685</isbn><isbn>0769522688</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2005</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotzLlOw0AQANAVh0QIKalo9gdsZmZPl5EVSKRwSE4fbdazYhGxwTYC_p4Cqtc9Ia4RSkSobtebumlKAjClUidiRsZRYb2lU7GonAdnK0NkvTkTMzQKCrRgLsTlOL4CEGi0M0FL-chfsokvfGSZ-kE-D_3EuZMNf3xyF1k-9FNOcvU9DSFOue-uxHkKbyMv_p2L3d1qV6-L7dP9pl5ui1zBVFh9QOOCMimiToFZ2Somp9porFOQWqd1DFpFAhPaQ7LBV0iYNAWwyZOai5u_NjPz_n3IxzD87FF541GrX7b5RJU</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Jingyi Yang</creator><creator>Deogun, J.S.</creator><creator>Zhaohui Sun</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>2005</creationdate><title>A New Scheme for Protein Sequence Motif Extraction</title><author>Jingyi Yang ; Deogun, J.S. ; Zhaohui Sun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-64b157a35fc14faee369cf73dc56730fd744ca43c205adbf6a89121f42a06f823</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Computational biology</topic><topic>Computer science</topic><topic>Data mining</topic><topic>DNA</topic><topic>Evolution (biology)</topic><topic>Government</topic><topic>Protein engineering</topic><topic>Protein sequence</topic><topic>Sequences</topic><topic>Sun</topic><toplevel>online_resources</toplevel><creatorcontrib>Jingyi Yang</creatorcontrib><creatorcontrib>Deogun, J.S.</creatorcontrib><creatorcontrib>Zhaohui Sun</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jingyi Yang</au><au>Deogun, J.S.</au><au>Zhaohui Sun</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A New Scheme for Protein Sequence Motif Extraction</atitle><btitle>Proceedings of the 38th Annual Hawaii International Conference on System Sciences</btitle><stitle>HICSS</stitle><date>2005</date><risdate>2005</risdate><spage>280a</spage><epage>280a</epage><pages>280a-280a</pages><issn>1530-1605</issn><eissn>2572-6862</eissn><isbn>9780769522685</isbn><isbn>0769522688</isbn><abstract>Protein sequence motifs are short conserved subsequences common to related protein sequences. The extraction of sequence motifs in proteins can help classify proteins families and predict protein functions, also provide valuable information about the evolution of species. However, the automatic protein sequence motif extraction is not straightforward because sequence motifs are often inexact and containing gaps. In this paper, we review currently available algorithms for protein sequence motif extraction, and propose a novel scheme to extract protein sequence motifs that allow mismatches and gaps from unaligned protein sequences. This scheme is based on a probabilistic model-Mismatch-allowed Probabilistic Suffix Tree (M-PST). In this scheme, an M-PST is first constructed from the unaligned protein sequences. The subsequences with highest likelihood scores, which are over-represented patterns, are further discovered with the M-PST. These subsequences are probable sequence motifs and outputted along with the position probability matrices.</abstract><pub>IEEE</pub><doi>10.1109/HICSS.2005.33</doi></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1530-1605
ispartof Proceedings of the 38th Annual Hawaii International Conference on System Sciences, 2005, p.280a-280a
issn 1530-1605
2572-6862
language eng
recordid cdi_ieee_primary_1385814
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Computational biology
Computer science
Data mining
DNA
Evolution (biology)
Government
Protein engineering
Protein sequence
Sequences
Sun
title A New Scheme for Protein Sequence Motif Extraction
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-12T06%3A08%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20New%20Scheme%20for%20Protein%20Sequence%20Motif%20Extraction&rft.btitle=Proceedings%20of%20the%2038th%20Annual%20Hawaii%20International%20Conference%20on%20System%20Sciences&rft.au=Jingyi%20Yang&rft.date=2005&rft.spage=280a&rft.epage=280a&rft.pages=280a-280a&rft.issn=1530-1605&rft.eissn=2572-6862&rft.isbn=9780769522685&rft.isbn_list=0769522688&rft_id=info:doi/10.1109/HICSS.2005.33&rft_dat=%3Cieee_6IE%3E1385814%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=1385814&rfr_iscdi=true