Enriching the sequence substitution matrix by structural information

A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the pr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proteins, structure, function, and bioinformatics structure, function, and bioinformatics, 2004-01, Vol.54 (1), p.41-48
Hauptverfasser:	Teodorescu, Octavian, Galor, Tamara, Pillardy, Jaroslaw, Elber, Ron
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms energy function fitness function sequence alignment Sequence Alignment - methods Sequence Analysis, Protein - methods Sequence Homology, Amino Acid sequence-to-structure matching threading Z-score
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	48
container_issue	1
container_start_page	41
container_title	Proteins, structure, function, and bioinformatics
container_volume	54
creator	Teodorescu, Octavian Galor, Tamara Pillardy, Jaroslaw Elber, Ron
description	A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with
doi_str_mv	10.1002/prot.10474
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_80080892</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>80080892</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EgvK48ANQThyQAuvY9eOIeEtVC1URR8tOHTCkSbEdQf89Li1w47Qj7Tej3UHoEMMpBijO5r6NSVFON1APg-Q5YEI3UQ-E4Dnpi_4O2g3hFQCYJGwb7WDKoQ9F0UOXV4135YtrnrP4YrNg3zvblEl0JkQXu-jaJpvp6N1nZhZZiL4rY-d1nbmman3aJGAfbVW6DvZgPffQ4_XV5OI2H4xu7i7OB3lJGKE5xiB4AWCYlUSDZMwIaktNqbEgrNTMGKJlyU1Vpat5oQsyNVMjpxS01IbsoeNVbvo43RmimrlQ2rrWjW27oASAACGLBJ6swNK3IXhbqbl3M-0XCoNadqaWnanvzhJ8tE7tzMxO_9B1SQnAK-DD1XbxT5S6H48mP6H5yuNCtJ-_Hu3fFOOE99XT8EY9DMf3bDAgipMv3-mHqA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>80080892</pqid></control><display><type>article</type><title>Enriching the sequence substitution matrix by structural information</title><source>MEDLINE</source><source>Wiley Online Library Journals Frontfile Complete</source><creator>Teodorescu, Octavian ; Galor, Tamara ; Pillardy, Jaroslaw ; Elber, Ron</creator><creatorcontrib>Teodorescu, Octavian ; Galor, Tamara ; Pillardy, Jaroslaw ; Elber, Ron</creatorcontrib><description>A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with <25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well. Proteins 2003. © 2003 Wiley‐Liss, Inc.</description><identifier>ISSN: 0887-3585</identifier><identifier>EISSN: 1097-0134</identifier><identifier>DOI: 10.1002/prot.10474</identifier><identifier>PMID: 14705022</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc., A Wiley Company</publisher><subject>Algorithms ; energy function ; fitness function ; sequence alignment ; Sequence Alignment - methods ; Sequence Analysis, Protein - methods ; Sequence Homology, Amino Acid ; sequence-to-structure matching ; threading ; Z-score</subject><ispartof>Proteins, structure, function, and bioinformatics, 2004-01, Vol.54 (1), p.41-48</ispartof><rights>Copyright © 2003 Wiley‐Liss, Inc.</rights><rights>Copyright 2003 Wiley-Liss, Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</citedby><cites>FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fprot.10474$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fprot.10474$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/14705022$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Teodorescu, Octavian</creatorcontrib><creatorcontrib>Galor, Tamara</creatorcontrib><creatorcontrib>Pillardy, Jaroslaw</creatorcontrib><creatorcontrib>Elber, Ron</creatorcontrib><title>Enriching the sequence substitution matrix by structural information</title><title>Proteins, structure, function, and bioinformatics</title><addtitle>Proteins</addtitle><description>A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with <25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well. Proteins 2003. © 2003 Wiley‐Liss, Inc.</description><subject>Algorithms</subject><subject>energy function</subject><subject>fitness function</subject><subject>sequence alignment</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Analysis, Protein - methods</subject><subject>Sequence Homology, Amino Acid</subject><subject>sequence-to-structure matching</subject><subject>threading</subject><subject>Z-score</subject><issn>0887-3585</issn><issn>1097-0134</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kEtPwzAQhC0EgvK48ANQThyQAuvY9eOIeEtVC1URR8tOHTCkSbEdQf89Li1w47Qj7Tej3UHoEMMpBijO5r6NSVFON1APg-Q5YEI3UQ-E4Dnpi_4O2g3hFQCYJGwb7WDKoQ9F0UOXV4135YtrnrP4YrNg3zvblEl0JkQXu-jaJpvp6N1nZhZZiL4rY-d1nbmman3aJGAfbVW6DvZgPffQ4_XV5OI2H4xu7i7OB3lJGKE5xiB4AWCYlUSDZMwIaktNqbEgrNTMGKJlyU1Vpat5oQsyNVMjpxS01IbsoeNVbvo43RmimrlQ2rrWjW27oASAACGLBJ6swNK3IXhbqbl3M-0XCoNadqaWnanvzhJ8tE7tzMxO_9B1SQnAK-DD1XbxT5S6H48mP6H5yuNCtJ-_Hu3fFOOE99XT8EY9DMf3bDAgipMv3-mHqA</recordid><startdate>20040101</startdate><enddate>20040101</enddate><creator>Teodorescu, Octavian</creator><creator>Galor, Tamara</creator><creator>Pillardy, Jaroslaw</creator><creator>Elber, Ron</creator><general>Wiley Subscription Services, Inc., A Wiley Company</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20040101</creationdate><title>Enriching the sequence substitution matrix by structural information</title><author>Teodorescu, Octavian ; Galor, Tamara ; Pillardy, Jaroslaw ; Elber, Ron</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Algorithms</topic><topic>energy function</topic><topic>fitness function</topic><topic>sequence alignment</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Analysis, Protein - methods</topic><topic>Sequence Homology, Amino Acid</topic><topic>sequence-to-structure matching</topic><topic>threading</topic><topic>Z-score</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Teodorescu, Octavian</creatorcontrib><creatorcontrib>Galor, Tamara</creatorcontrib><creatorcontrib>Pillardy, Jaroslaw</creatorcontrib><creatorcontrib>Elber, Ron</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Proteins, structure, function, and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Teodorescu, Octavian</au><au>Galor, Tamara</au><au>Pillardy, Jaroslaw</au><au>Elber, Ron</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enriching the sequence substitution matrix by structural information</atitle><jtitle>Proteins, structure, function, and bioinformatics</jtitle><addtitle>Proteins</addtitle><date>2004-01-01</date><risdate>2004</risdate><volume>54</volume><issue>1</issue><spage>41</spage><epage>48</epage><pages>41-48</pages><issn>0887-3585</issn><eissn>1097-0134</eissn><abstract>A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with <25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well. Proteins 2003. © 2003 Wiley‐Liss, Inc.</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc., A Wiley Company</pub><pmid>14705022</pmid><doi>10.1002/prot.10474</doi><tpages>8</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0887-3585
ispartof	Proteins, structure, function, and bioinformatics, 2004-01, Vol.54 (1), p.41-48
issn	0887-3585 1097-0134
language	eng
recordid	cdi_proquest_miscellaneous_80080892
source	MEDLINE; Wiley Online Library Journals Frontfile Complete
subjects	Algorithms energy function fitness function sequence alignment Sequence Alignment - methods Sequence Analysis, Protein - methods Sequence Homology, Amino Acid sequence-to-structure matching threading Z-score
title	Enriching the sequence substitution matrix by structural information
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T12%3A38%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enriching%20the%20sequence%20substitution%20matrix%20by%20structural%20information&rft.jtitle=Proteins,%20structure,%20function,%20and%20bioinformatics&rft.au=Teodorescu,%20Octavian&rft.date=2004-01-01&rft.volume=54&rft.issue=1&rft.spage=41&rft.epage=48&rft.pages=41-48&rft.issn=0887-3585&rft.eissn=1097-0134&rft_id=info:doi/10.1002/prot.10474&rft_dat=%3Cproquest_cross%3E80080892%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=80080892&rft_id=info:pmid/14705022&rfr_iscdi=true