Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures
The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with th...
Gespeichert in:
Veröffentlicht in: | Journal of molecular biology 2005-05, Vol.348 (5), p.1235-1260 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1260 |
---|---|
container_issue | 5 |
container_start_page | 1235 |
container_title | Journal of molecular biology |
container_volume | 348 |
creator | Todd, Annabel E. Marsden, Russell L. Thornton, Janet M. Orengo, Christine A. |
description | The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field. |
doi_str_mv | 10.1016/j.jmb.2005.03.037 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_67785807</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0022283605003190</els_id><sourcerecordid>19898857</sourcerecordid><originalsourceid>FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</originalsourceid><addsrcrecordid>eNqNkUtLAzEUhYMoWh8_wI3Myt3U3GTyGF1J0VoQFKzrMEnvSMo8ajJT6L93SovuVDhwN985i_sRcgl0DBTkzXK8rO2YUSrGlA9RB2QEVOepllwfkhGljKVMc3lCTmNc0gHkmT4mJyC0yKTQIzJ_De1HwBiTtkzeutC7rg9FlUyxaWvvYjJrfOeLzq8x3ib3zZCi2kS_49tqjYtkXoQP7L7bGM_JUVlUES_294y8Pz7MJ0_p88t0Nrl_Tl2WyS5lDsACUyXVVuRWlQVolwNoZRmClpYrWgIyaxVDFGCZlTIvJSrOSyFzfkaud7ur0H72GDtT--iwqooG2z4aqZQWmqo_Qch1rrX4B6h4JpjIBhB2oAttjAFLswq-LsLGADVbOWZpBjlmK8dQPmQ7frUf722Ni5_G3sYA3O0AHJ629hhMdB4bhwsf0HVm0fpf5r8AXbWe8g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>17345254</pqid></control><display><type>article</type><title>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</title><source>MEDLINE</source><source>Access via ScienceDirect (Elsevier)</source><creator>Todd, Annabel E. ; Marsden, Russell L. ; Thornton, Janet M. ; Orengo, Christine A.</creator><creatorcontrib>Todd, Annabel E. ; Marsden, Russell L. ; Thornton, Janet M. ; Orengo, Christine A.</creatorcontrib><description>The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.</description><identifier>ISSN: 0022-2836</identifier><identifier>EISSN: 1089-8638</identifier><identifier>DOI: 10.1016/j.jmb.2005.03.037</identifier><identifier>PMID: 15854658</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Animals ; Computational Biology - trends ; Databases, Protein ; fold ; Genome ; Genomics - methods ; Humans ; novelty ; Protein Conformation ; protein structure ; Sequence Analysis, Protein ; structural genomics ; Structural Homology, Protein ; superfamily</subject><ispartof>Journal of molecular biology, 2005-05, Vol.348 (5), p.1235-1260</ispartof><rights>2005 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</citedby><cites>FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jmb.2005.03.037$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/15854658$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Todd, Annabel E.</creatorcontrib><creatorcontrib>Marsden, Russell L.</creatorcontrib><creatorcontrib>Thornton, Janet M.</creatorcontrib><creatorcontrib>Orengo, Christine A.</creatorcontrib><title>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</title><title>Journal of molecular biology</title><addtitle>J Mol Biol</addtitle><description>The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.</description><subject>Animals</subject><subject>Computational Biology - trends</subject><subject>Databases, Protein</subject><subject>fold</subject><subject>Genome</subject><subject>Genomics - methods</subject><subject>Humans</subject><subject>novelty</subject><subject>Protein Conformation</subject><subject>protein structure</subject><subject>Sequence Analysis, Protein</subject><subject>structural genomics</subject><subject>Structural Homology, Protein</subject><subject>superfamily</subject><issn>0022-2836</issn><issn>1089-8638</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkUtLAzEUhYMoWh8_wI3Myt3U3GTyGF1J0VoQFKzrMEnvSMo8ajJT6L93SovuVDhwN985i_sRcgl0DBTkzXK8rO2YUSrGlA9RB2QEVOepllwfkhGljKVMc3lCTmNc0gHkmT4mJyC0yKTQIzJ_De1HwBiTtkzeutC7rg9FlUyxaWvvYjJrfOeLzq8x3ib3zZCi2kS_49tqjYtkXoQP7L7bGM_JUVlUES_294y8Pz7MJ0_p88t0Nrl_Tl2WyS5lDsACUyXVVuRWlQVolwNoZRmClpYrWgIyaxVDFGCZlTIvJSrOSyFzfkaud7ur0H72GDtT--iwqooG2z4aqZQWmqo_Qch1rrX4B6h4JpjIBhB2oAttjAFLswq-LsLGADVbOWZpBjlmK8dQPmQ7frUf722Ni5_G3sYA3O0AHJ629hhMdB4bhwsf0HVm0fpf5r8AXbWe8g</recordid><startdate>20050520</startdate><enddate>20050520</enddate><creator>Todd, Annabel E.</creator><creator>Marsden, Russell L.</creator><creator>Thornton, Janet M.</creator><creator>Orengo, Christine A.</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7QL</scope><scope>C1K</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20050520</creationdate><title>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</title><author>Todd, Annabel E. ; Marsden, Russell L. ; Thornton, Janet M. ; Orengo, Christine A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Animals</topic><topic>Computational Biology - trends</topic><topic>Databases, Protein</topic><topic>fold</topic><topic>Genome</topic><topic>Genomics - methods</topic><topic>Humans</topic><topic>novelty</topic><topic>Protein Conformation</topic><topic>protein structure</topic><topic>Sequence Analysis, Protein</topic><topic>structural genomics</topic><topic>Structural Homology, Protein</topic><topic>superfamily</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Todd, Annabel E.</creatorcontrib><creatorcontrib>Marsden, Russell L.</creatorcontrib><creatorcontrib>Thornton, Janet M.</creatorcontrib><creatorcontrib>Orengo, Christine A.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of molecular biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Todd, Annabel E.</au><au>Marsden, Russell L.</au><au>Thornton, Janet M.</au><au>Orengo, Christine A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</atitle><jtitle>Journal of molecular biology</jtitle><addtitle>J Mol Biol</addtitle><date>2005-05-20</date><risdate>2005</risdate><volume>348</volume><issue>5</issue><spage>1235</spage><epage>1260</epage><pages>1235-1260</pages><issn>0022-2836</issn><eissn>1089-8638</eissn><abstract>The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>15854658</pmid><doi>10.1016/j.jmb.2005.03.037</doi><tpages>26</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0022-2836 |
ispartof | Journal of molecular biology, 2005-05, Vol.348 (5), p.1235-1260 |
issn | 0022-2836 1089-8638 |
language | eng |
recordid | cdi_proquest_miscellaneous_67785807 |
source | MEDLINE; Access via ScienceDirect (Elsevier) |
subjects | Animals Computational Biology - trends Databases, Protein fold Genome Genomics - methods Humans novelty Protein Conformation protein structure Sequence Analysis, Protein structural genomics Structural Homology, Protein superfamily |
title | Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T09%3A38%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Progress%20of%20Structural%20Genomics%20Initiatives:%20An%20Analysis%20of%20Solved%20Target%20Structures&rft.jtitle=Journal%20of%20molecular%20biology&rft.au=Todd,%20Annabel%20E.&rft.date=2005-05-20&rft.volume=348&rft.issue=5&rft.spage=1235&rft.epage=1260&rft.pages=1235-1260&rft.issn=0022-2836&rft.eissn=1089-8638&rft_id=info:doi/10.1016/j.jmb.2005.03.037&rft_dat=%3Cproquest_cross%3E19898857%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=17345254&rft_id=info:pmid/15854658&rft_els_id=S0022283605003190&rfr_iscdi=true |