Enriching the sequence substitution matrix by structural information
A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the pr...
Gespeichert in:
Veröffentlicht in: | Proteins, structure, function, and bioinformatics structure, function, and bioinformatics, 2004-01, Vol.54 (1), p.41-48 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 48 |
---|---|
container_issue | 1 |
container_start_page | 41 |
container_title | Proteins, structure, function, and bioinformatics |
container_volume | 54 |
creator | Teodorescu, Octavian Galor, Tamara Pillardy, Jaroslaw Elber, Ron |
description | A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with |
doi_str_mv | 10.1002/prot.10474 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_80080892</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>80080892</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EgvK48ANQThyQAuvY9eOIeEtVC1URR8tOHTCkSbEdQf89Li1w47Qj7Tej3UHoEMMpBijO5r6NSVFON1APg-Q5YEI3UQ-E4Dnpi_4O2g3hFQCYJGwb7WDKoQ9F0UOXV4135YtrnrP4YrNg3zvblEl0JkQXu-jaJpvp6N1nZhZZiL4rY-d1nbmman3aJGAfbVW6DvZgPffQ4_XV5OI2H4xu7i7OB3lJGKE5xiB4AWCYlUSDZMwIaktNqbEgrNTMGKJlyU1Vpat5oQsyNVMjpxS01IbsoeNVbvo43RmimrlQ2rrWjW27oASAACGLBJ6swNK3IXhbqbl3M-0XCoNadqaWnanvzhJ8tE7tzMxO_9B1SQnAK-DD1XbxT5S6H48mP6H5yuNCtJ-_Hu3fFOOE99XT8EY9DMf3bDAgipMv3-mHqA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>80080892</pqid></control><display><type>article</type><title>Enriching the sequence substitution matrix by structural information</title><source>MEDLINE</source><source>Wiley Online Library Journals Frontfile Complete</source><creator>Teodorescu, Octavian ; Galor, Tamara ; Pillardy, Jaroslaw ; Elber, Ron</creator><creatorcontrib>Teodorescu, Octavian ; Galor, Tamara ; Pillardy, Jaroslaw ; Elber, Ron</creatorcontrib><description>A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with <25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well. Proteins 2003. © 2003 Wiley‐Liss, Inc.</description><identifier>ISSN: 0887-3585</identifier><identifier>EISSN: 1097-0134</identifier><identifier>DOI: 10.1002/prot.10474</identifier><identifier>PMID: 14705022</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc., A Wiley Company</publisher><subject>Algorithms ; energy function ; fitness function ; sequence alignment ; Sequence Alignment - methods ; Sequence Analysis, Protein - methods ; Sequence Homology, Amino Acid ; sequence-to-structure matching ; threading ; Z-score</subject><ispartof>Proteins, structure, function, and bioinformatics, 2004-01, Vol.54 (1), p.41-48</ispartof><rights>Copyright © 2003 Wiley‐Liss, Inc.</rights><rights>Copyright 2003 Wiley-Liss, Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</citedby><cites>FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fprot.10474$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fprot.10474$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/14705022$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Teodorescu, Octavian</creatorcontrib><creatorcontrib>Galor, Tamara</creatorcontrib><creatorcontrib>Pillardy, Jaroslaw</creatorcontrib><creatorcontrib>Elber, Ron</creatorcontrib><title>Enriching the sequence substitution matrix by structural information</title><title>Proteins, structure, function, and bioinformatics</title><addtitle>Proteins</addtitle><description>A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with <25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well. Proteins 2003. © 2003 Wiley‐Liss, Inc.</description><subject>Algorithms</subject><subject>energy function</subject><subject>fitness function</subject><subject>sequence alignment</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Analysis, Protein - methods</subject><subject>Sequence Homology, Amino Acid</subject><subject>sequence-to-structure matching</subject><subject>threading</subject><subject>Z-score</subject><issn>0887-3585</issn><issn>1097-0134</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kEtPwzAQhC0EgvK48ANQThyQAuvY9eOIeEtVC1URR8tOHTCkSbEdQf89Li1w47Qj7Tej3UHoEMMpBijO5r6NSVFON1APg-Q5YEI3UQ-E4Dnpi_4O2g3hFQCYJGwb7WDKoQ9F0UOXV4135YtrnrP4YrNg3zvblEl0JkQXu-jaJpvp6N1nZhZZiL4rY-d1nbmman3aJGAfbVW6DvZgPffQ4_XV5OI2H4xu7i7OB3lJGKE5xiB4AWCYlUSDZMwIaktNqbEgrNTMGKJlyU1Vpat5oQsyNVMjpxS01IbsoeNVbvo43RmimrlQ2rrWjW27oASAACGLBJ6swNK3IXhbqbl3M-0XCoNadqaWnanvzhJ8tE7tzMxO_9B1SQnAK-DD1XbxT5S6H48mP6H5yuNCtJ-_Hu3fFOOE99XT8EY9DMf3bDAgipMv3-mHqA</recordid><startdate>20040101</startdate><enddate>20040101</enddate><creator>Teodorescu, Octavian</creator><creator>Galor, Tamara</creator><creator>Pillardy, Jaroslaw</creator><creator>Elber, Ron</creator><general>Wiley Subscription Services, Inc., A Wiley Company</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20040101</creationdate><title>Enriching the sequence substitution matrix by structural information</title><author>Teodorescu, Octavian ; Galor, Tamara ; Pillardy, Jaroslaw ; Elber, Ron</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3634-11087200b6e93a0966b84eca44be08e9a6bb3a9c7bff35872a23dbdb9d40a9ab3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Algorithms</topic><topic>energy function</topic><topic>fitness function</topic><topic>sequence alignment</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Analysis, Protein - methods</topic><topic>Sequence Homology, Amino Acid</topic><topic>sequence-to-structure matching</topic><topic>threading</topic><topic>Z-score</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Teodorescu, Octavian</creatorcontrib><creatorcontrib>Galor, Tamara</creatorcontrib><creatorcontrib>Pillardy, Jaroslaw</creatorcontrib><creatorcontrib>Elber, Ron</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Proteins, structure, function, and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Teodorescu, Octavian</au><au>Galor, Tamara</au><au>Pillardy, Jaroslaw</au><au>Elber, Ron</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enriching the sequence substitution matrix by structural information</atitle><jtitle>Proteins, structure, function, and bioinformatics</jtitle><addtitle>Proteins</addtitle><date>2004-01-01</date><risdate>2004</risdate><volume>54</volume><issue>1</issue><spage>41</spage><epage>48</epage><pages>41-48</pages><issn>0887-3585</issn><eissn>1097-0134</eissn><abstract>A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence‐to‐sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence‐alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with <25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well. Proteins 2003. © 2003 Wiley‐Liss, Inc.</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc., A Wiley Company</pub><pmid>14705022</pmid><doi>10.1002/prot.10474</doi><tpages>8</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0887-3585 |
ispartof | Proteins, structure, function, and bioinformatics, 2004-01, Vol.54 (1), p.41-48 |
issn | 0887-3585 1097-0134 |
language | eng |
recordid | cdi_proquest_miscellaneous_80080892 |
source | MEDLINE; Wiley Online Library Journals Frontfile Complete |
subjects | Algorithms energy function fitness function sequence alignment Sequence Alignment - methods Sequence Analysis, Protein - methods Sequence Homology, Amino Acid sequence-to-structure matching threading Z-score |
title | Enriching the sequence substitution matrix by structural information |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T12%3A38%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enriching%20the%20sequence%20substitution%20matrix%20by%20structural%20information&rft.jtitle=Proteins,%20structure,%20function,%20and%20bioinformatics&rft.au=Teodorescu,%20Octavian&rft.date=2004-01-01&rft.volume=54&rft.issue=1&rft.spage=41&rft.epage=48&rft.pages=41-48&rft.issn=0887-3585&rft.eissn=1097-0134&rft_id=info:doi/10.1002/prot.10474&rft_dat=%3Cproquest_cross%3E80080892%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=80080892&rft_id=info:pmid/14705022&rfr_iscdi=true |