Enriching the sequence substitution matrix by structural information

A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proteins, structure, function, and bioinformatics structure, function, and bioinformatics, 2004-01, Vol.54 (1), p.41-48
Hauptverfasser: Teodorescu, Octavian, Galor, Tamara, Pillardy, Jaroslaw, Elber, Ron
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 48
container_issue 1
container_start_page 41
container_title Proteins, structure, function, and bioinformatics
container_volume 54
creator Teodorescu, Octavian
Galor, Tamara
Pillardy, Jaroslaw
Elber, Ron
description A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with
doi_str_mv 10.1002/prot.10474
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_80080892</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>80080892</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EgvK48ANQThyQAuvY9eOIeEtVC1URR8tOHTCkSbEdQf89Li1w47Qj7Tej3UHoEMMpBijO5r6NSVFON1APg-Q5YEI3UQ-E4Dnpi_4O2g3hFQCYJGwb7WDKoQ9F0UOXV4135YtrnrP4YrNg3zvblEl0JkQXu-jaJpvp6N1nZhZZiL4rY-d1nbmman3aJGAfbVW6DvZgPffQ4_XV5OI2H4xu7i7OB3lJGKE5xiB4AWCYlUSDZMwIaktNqbEgrNTMGKJlyU1Vpat5oQsyNVMjpxS01IbsoeNVbvo43RmimrlQ2rrWjW27oASAACGLBJ6swNK3IXhbqbl3M-0XCoNadqaWnanvzhJ8tE7tzMxO_9B1SQnAK-DD1XbxT5S6H48mP6H5yuNCtJ-_Hu3fFOOE99XT8EY9DMf3bDAgipMv3-mHqA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>80080892</pqid></control><display><type>article</type><title>Enriching the sequence substitution matrix by structural information</title><source>MEDLINE</source><source>Wiley Online Library Journals Frontfile Complete</source><creator>Teodorescu, Octavian ; Galor, Tamara ; Pillardy, Jaroslaw ; Elber, Ron</creator><creatorcontrib>Teodorescu, Octavian ; Galor, Tamara ; Pillardy, Jaroslaw ; Elber, Ron</creatorcontrib><description>A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with &lt;25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well. Proteins 2003. © 2003 Wiley‐Liss, Inc.</description><identifier>ISSN: 0887-3585</identifier><identifier>EISSN: 1097-0134</identifier><identifier>DOI: 10.1002/prot.10474</identifier><identifier>PMID: 14705022</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc., A Wiley Company</publisher><subject>Algorithms ; energy function ; fitness function ; sequence alignment ; Sequence Alignment - methods ; Sequence Analysis, Protein - methods ; Sequence Homology, Amino Acid ; sequence-to-structure matching ; threading ; Z-score</subject><ispartof>Proteins, structure, function, and bioinformatics, 2004-01, Vol.54 (1), p.41-48</ispartof><rights>Copyright © 2003 Wiley‐Liss, Inc.</rights><rights>Copyright 2003 Wiley-Liss, Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</citedby><cites>FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fprot.10474$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fprot.10474$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/14705022$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Teodorescu, Octavian</creatorcontrib><creatorcontrib>Galor, Tamara</creatorcontrib><creatorcontrib>Pillardy, Jaroslaw</creatorcontrib><creatorcontrib>Elber, Ron</creatorcontrib><title>Enriching the sequence substitution matrix by structural information</title><title>Proteins, structure, function, and bioinformatics</title><addtitle>Proteins</addtitle><description>A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with &lt;25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well. Proteins 2003. © 2003 Wiley‐Liss, Inc.</description><subject>Algorithms</subject><subject>energy function</subject><subject>fitness function</subject><subject>sequence alignment</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Analysis, Protein - methods</subject><subject>Sequence Homology, Amino Acid</subject><subject>sequence-to-structure matching</subject><subject>threading</subject><subject>Z-score</subject><issn>0887-3585</issn><issn>1097-0134</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kEtPwzAQhC0EgvK48ANQThyQAuvY9eOIeEtVC1URR8tOHTCkSbEdQf89Li1w47Qj7Tej3UHoEMMpBijO5r6NSVFON1APg-Q5YEI3UQ-E4Dnpi_4O2g3hFQCYJGwb7WDKoQ9F0UOXV4135YtrnrP4YrNg3zvblEl0JkQXu-jaJpvp6N1nZhZZiL4rY-d1nbmman3aJGAfbVW6DvZgPffQ4_XV5OI2H4xu7i7OB3lJGKE5xiB4AWCYlUSDZMwIaktNqbEgrNTMGKJlyU1Vpat5oQsyNVMjpxS01IbsoeNVbvo43RmimrlQ2rrWjW27oASAACGLBJ6swNK3IXhbqbl3M-0XCoNadqaWnanvzhJ8tE7tzMxO_9B1SQnAK-DD1XbxT5S6H48mP6H5yuNCtJ-_Hu3fFOOE99XT8EY9DMf3bDAgipMv3-mHqA</recordid><startdate>20040101</startdate><enddate>20040101</enddate><creator>Teodorescu, Octavian</creator><creator>Galor, Tamara</creator><creator>Pillardy, Jaroslaw</creator><creator>Elber, Ron</creator><general>Wiley Subscription Services, Inc., A Wiley Company</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20040101</creationdate><title>Enriching the sequence substitution matrix by structural information</title><author>Teodorescu, Octavian ; Galor, Tamara ; Pillardy, Jaroslaw ; Elber, Ron</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Algorithms</topic><topic>energy function</topic><topic>fitness function</topic><topic>sequence alignment</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Analysis, Protein - methods</topic><topic>Sequence Homology, Amino Acid</topic><topic>sequence-to-structure matching</topic><topic>threading</topic><topic>Z-score</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Teodorescu, Octavian</creatorcontrib><creatorcontrib>Galor, Tamara</creatorcontrib><creatorcontrib>Pillardy, Jaroslaw</creatorcontrib><creatorcontrib>Elber, Ron</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Proteins, structure, function, and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Teodorescu, Octavian</au><au>Galor, Tamara</au><au>Pillardy, Jaroslaw</au><au>Elber, Ron</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enriching the sequence substitution matrix by structural information</atitle><jtitle>Proteins, structure, function, and bioinformatics</jtitle><addtitle>Proteins</addtitle><date>2004-01-01</date><risdate>2004</risdate><volume>54</volume><issue>1</issue><spage>41</spage><epage>48</epage><pages>41-48</pages><issn>0887-3585</issn><eissn>1097-0134</eissn><abstract>A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with &lt;25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well. Proteins 2003. © 2003 Wiley‐Liss, Inc.</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc., A Wiley Company</pub><pmid>14705022</pmid><doi>10.1002/prot.10474</doi><tpages>8</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0887-3585
ispartof Proteins, structure, function, and bioinformatics, 2004-01, Vol.54 (1), p.41-48
issn 0887-3585
1097-0134
language eng
recordid cdi_proquest_miscellaneous_80080892
source MEDLINE; Wiley Online Library Journals Frontfile Complete
subjects Algorithms
energy function
fitness function
sequence alignment
Sequence Alignment - methods
Sequence Analysis, Protein - methods
Sequence Homology, Amino Acid
sequence-to-structure matching
threading
Z-score
title Enriching the sequence substitution matrix by structural information
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T12%3A38%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enriching%20the%20sequence%20substitution%20matrix%20by%20structural%20information&rft.jtitle=Proteins,%20structure,%20function,%20and%20bioinformatics&rft.au=Teodorescu,%20Octavian&rft.date=2004-01-01&rft.volume=54&rft.issue=1&rft.spage=41&rft.epage=48&rft.pages=41-48&rft.issn=0887-3585&rft.eissn=1097-0134&rft_id=info:doi/10.1002/prot.10474&rft_dat=%3Cproquest_cross%3E80080892%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=80080892&rft_id=info:pmid/14705022&rfr_iscdi=true