Evaluation and improvement of multiple sequence methods for protein secondary structure prediction

A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proteins, structure, function, and bioinformatics structure, function, and bioinformatics, 1999-03, Vol.34 (4), p.508-519
Hauptverfasser:	Cuff, James A., Barton, Geoffrey J.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms benchmarks combination of methods Computer Simulation Databases, Factual Models, Statistical protein Protein Structure, Secondary Reproducibility of Results secondary structure prediction Sequence Alignment
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	519
container_issue	4
container_start_page	508
container_title	Proteins, structure, function, and bioinformatics
container_volume	34
creator	Cuff, James A. Barton, Geoffrey J.
description	A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3 prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396‐protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8‐ to 3‐state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross‐validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/. Proteins 1999;34:508–519. © 1999 Wiley‐Liss, Inc.
doi_str_mv	10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO;2-4
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_69626526</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>69626526</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5550-ff7e2fa35138657f76dcbc44b8da26f57a0b33343638ad1aa4915cb0ceb28f513</originalsourceid><addsrcrecordid>eNqFkFtv1DAQhS0EokvhLyA_ofYhix1fkiwIUYXSrmhZKEUgXkaOM1YDuSxxUui_x1GWCgkknnyZM-ccfYS85GzJGYufHnxY5-tDzrIkYlzIA55lGROMHwq5ks8VS1ero_Wr6N3F5pKzF2LJlvnmWRzJO2Rxu3SXLFiaJpFQqdojD7z_yhjTmdD3yV4ISXmmxYIUx9emHs1QdS01bUmrZtt319hgO9DO0Wash2pbI_X4fcTWIm1wuOpKT13X0yAdsGrD0HZtafob6od-tMPYY5hhWdnJ9yG550zt8dHu3CcfXx9f5qfR2eZknR-dRVYpxSLnEoydEYqLVKvEJbq0hZWySEsTa6cSwwohhBRapKbkxsiMK1swi0WcurC1T57MvqFWKOsHaCpvsa5Ni93oQWc61irWQfh5Ftq-875HB9u-akJ94Awm_gATf5hQwoQSfvOHcJcQ-AME_jDzBwEM8g3EIIP1412HsWiw_MN4Bh4EX2bBj6rGm7-C_5v7z9jdTzCPZvPKD_jz1tz030AnIlHw6e0JnJ--zy7O34SH-AV7pLNd</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>69626526</pqid></control><display><type>article</type><title>Evaluation and improvement of multiple sequence methods for protein secondary structure prediction</title><source>MEDLINE</source><source>Wiley Online Library Journals Frontfile Complete</source><creator>Cuff, James A. ; Barton, Geoffrey J.</creator><creatorcontrib>Cuff, James A. ; Barton, Geoffrey J.</creatorcontrib><description>A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3 prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396‐protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8‐ to 3‐state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross‐validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/. Proteins 1999;34:508–519. © 1999 Wiley‐Liss, Inc.</description><identifier>ISSN: 0887-3585</identifier><identifier>EISSN: 1097-0134</identifier><identifier>DOI: 10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO;2-4</identifier><identifier>PMID: 10081963</identifier><language>eng</language><publisher>New York: John Wiley & Sons, Inc</publisher><subject>Algorithms ; benchmarks ; combination of methods ; Computer Simulation ; Databases, Factual ; Models, Statistical ; protein ; Protein Structure, Secondary ; Reproducibility of Results ; secondary structure prediction ; Sequence Alignment</subject><ispartof>Proteins, structure, function, and bioinformatics, 1999-03, Vol.34 (4), p.508-519</ispartof><rights>Copyright © 1999 Wiley‐Liss, Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c5550-ff7e2fa35138657f76dcbc44b8da26f57a0b33343638ad1aa4915cb0ceb28f513</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2F%28SICI%291097-0134%2819990301%2934%3A4%3C508%3A%3AAID-PROT10%3E3.0.CO%3B2-4$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2F%28SICI%291097-0134%2819990301%2934%3A4%3C508%3A%3AAID-PROT10%3E3.0.CO%3B2-4$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27903,27904,45553,45554</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/10081963$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Cuff, James A.</creatorcontrib><creatorcontrib>Barton, Geoffrey J.</creatorcontrib><title>Evaluation and improvement of multiple sequence methods for protein secondary structure prediction</title><title>Proteins, structure, function, and bioinformatics</title><addtitle>Proteins</addtitle><description>A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3 prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396‐protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8‐ to 3‐state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross‐validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/. Proteins 1999;34:508–519. © 1999 Wiley‐Liss, Inc.</description><subject>Algorithms</subject><subject>benchmarks</subject><subject>combination of methods</subject><subject>Computer Simulation</subject><subject>Databases, Factual</subject><subject>Models, Statistical</subject><subject>protein</subject><subject>Protein Structure, Secondary</subject><subject>Reproducibility of Results</subject><subject>secondary structure prediction</subject><subject>Sequence Alignment</subject><issn>0887-3585</issn><issn>1097-0134</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1999</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkFtv1DAQhS0EokvhLyA_ofYhix1fkiwIUYXSrmhZKEUgXkaOM1YDuSxxUui_x1GWCgkknnyZM-ccfYS85GzJGYufHnxY5-tDzrIkYlzIA55lGROMHwq5ks8VS1ero_Wr6N3F5pKzF2LJlvnmWRzJO2Rxu3SXLFiaJpFQqdojD7z_yhjTmdD3yV4ISXmmxYIUx9emHs1QdS01bUmrZtt319hgO9DO0Wash2pbI_X4fcTWIm1wuOpKT13X0yAdsGrD0HZtafob6od-tMPYY5hhWdnJ9yG550zt8dHu3CcfXx9f5qfR2eZknR-dRVYpxSLnEoydEYqLVKvEJbq0hZWySEsTa6cSwwohhBRapKbkxsiMK1swi0WcurC1T57MvqFWKOsHaCpvsa5Ni93oQWc61irWQfh5Ftq-875HB9u-akJ94Awm_gATf5hQwoQSfvOHcJcQ-AME_jDzBwEM8g3EIIP1412HsWiw_MN4Bh4EX2bBj6rGm7-C_5v7z9jdTzCPZvPKD_jz1tz030AnIlHw6e0JnJ--zy7O34SH-AV7pLNd</recordid><startdate>19990301</startdate><enddate>19990301</enddate><creator>Cuff, James A.</creator><creator>Barton, Geoffrey J.</creator><general>John Wiley & Sons, Inc</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>19990301</creationdate><title>Evaluation and improvement of multiple sequence methods for protein secondary structure prediction</title><author>Cuff, James A. ; Barton, Geoffrey J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5550-ff7e2fa35138657f76dcbc44b8da26f57a0b33343638ad1aa4915cb0ceb28f513</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1999</creationdate><topic>Algorithms</topic><topic>benchmarks</topic><topic>combination of methods</topic><topic>Computer Simulation</topic><topic>Databases, Factual</topic><topic>Models, Statistical</topic><topic>protein</topic><topic>Protein Structure, Secondary</topic><topic>Reproducibility of Results</topic><topic>secondary structure prediction</topic><topic>Sequence Alignment</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cuff, James A.</creatorcontrib><creatorcontrib>Barton, Geoffrey J.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Proteins, structure, function, and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cuff, James A.</au><au>Barton, Geoffrey J.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluation and improvement of multiple sequence methods for protein secondary structure prediction</atitle><jtitle>Proteins, structure, function, and bioinformatics</jtitle><addtitle>Proteins</addtitle><date>1999-03-01</date><risdate>1999</risdate><volume>34</volume><issue>4</issue><spage>508</spage><epage>519</epage><pages>508-519</pages><issn>0887-3585</issn><eissn>1097-0134</eissn><abstract>A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3 prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396‐protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8‐ to 3‐state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross‐validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/. Proteins 1999;34:508–519. © 1999 Wiley‐Liss, Inc.</abstract><cop>New York</cop><pub>John Wiley & Sons, Inc</pub><pmid>10081963</pmid><doi>10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO;2-4</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0887-3585
ispartof	Proteins, structure, function, and bioinformatics, 1999-03, Vol.34 (4), p.508-519
issn	0887-3585 1097-0134
language	eng
recordid	cdi_proquest_miscellaneous_69626526
source	MEDLINE; Wiley Online Library Journals Frontfile Complete
subjects	Algorithms benchmarks combination of methods Computer Simulation Databases, Factual Models, Statistical protein Protein Structure, Secondary Reproducibility of Results secondary structure prediction Sequence Alignment
title	Evaluation and improvement of multiple sequence methods for protein secondary structure prediction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T02%3A54%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluation%20and%20improvement%20of%20multiple%20sequence%20methods%20for%20protein%20secondary%20structure%20prediction&rft.jtitle=Proteins,%20structure,%20function,%20and%20bioinformatics&rft.au=Cuff,%20James%20A.&rft.date=1999-03-01&rft.volume=34&rft.issue=4&rft.spage=508&rft.epage=519&rft.pages=508-519&rft.issn=0887-3585&rft.eissn=1097-0134&rft_id=info:doi/10.1002/(SICI)1097-0134(19990301)34:4%3C508::AID-PROT10%3E3.0.CO;2-4&rft_dat=%3Cproquest_cross%3E69626526%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=69626526&rft_id=info:pmid/10081963&rfr_iscdi=true