Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures

The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of molecular biology 2005-05, Vol.348 (5), p.1235-1260
Hauptverfasser: Todd, Annabel E., Marsden, Russell L., Thornton, Janet M., Orengo, Christine A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1260
container_issue 5
container_start_page 1235
container_title Journal of molecular biology
container_volume 348
creator Todd, Annabel E.
Marsden, Russell L.
Thornton, Janet M.
Orengo, Christine A.
description The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.
doi_str_mv 10.1016/j.jmb.2005.03.037
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_67785807</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0022283605003190</els_id><sourcerecordid>19898857</sourcerecordid><originalsourceid>FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</originalsourceid><addsrcrecordid>eNqNkUtLAzEUhYMoWh8_wI3Myt3U3GTyGF1J0VoQFKzrMEnvSMo8ajJT6L93SovuVDhwN985i_sRcgl0DBTkzXK8rO2YUSrGlA9RB2QEVOepllwfkhGljKVMc3lCTmNc0gHkmT4mJyC0yKTQIzJ_De1HwBiTtkzeutC7rg9FlUyxaWvvYjJrfOeLzq8x3ib3zZCi2kS_49tqjYtkXoQP7L7bGM_JUVlUES_294y8Pz7MJ0_p88t0Nrl_Tl2WyS5lDsACUyXVVuRWlQVolwNoZRmClpYrWgIyaxVDFGCZlTIvJSrOSyFzfkaud7ur0H72GDtT--iwqooG2z4aqZQWmqo_Qch1rrX4B6h4JpjIBhB2oAttjAFLswq-LsLGADVbOWZpBjlmK8dQPmQ7frUf722Ni5_G3sYA3O0AHJ629hhMdB4bhwsf0HVm0fpf5r8AXbWe8g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>17345254</pqid></control><display><type>article</type><title>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</title><source>MEDLINE</source><source>Access via ScienceDirect (Elsevier)</source><creator>Todd, Annabel E. ; Marsden, Russell L. ; Thornton, Janet M. ; Orengo, Christine A.</creator><creatorcontrib>Todd, Annabel E. ; Marsden, Russell L. ; Thornton, Janet M. ; Orengo, Christine A.</creatorcontrib><description>The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.</description><identifier>ISSN: 0022-2836</identifier><identifier>EISSN: 1089-8638</identifier><identifier>DOI: 10.1016/j.jmb.2005.03.037</identifier><identifier>PMID: 15854658</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Animals ; Computational Biology - trends ; Databases, Protein ; fold ; Genome ; Genomics - methods ; Humans ; novelty ; Protein Conformation ; protein structure ; Sequence Analysis, Protein ; structural genomics ; Structural Homology, Protein ; superfamily</subject><ispartof>Journal of molecular biology, 2005-05, Vol.348 (5), p.1235-1260</ispartof><rights>2005 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</citedby><cites>FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jmb.2005.03.037$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/15854658$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Todd, Annabel E.</creatorcontrib><creatorcontrib>Marsden, Russell L.</creatorcontrib><creatorcontrib>Thornton, Janet M.</creatorcontrib><creatorcontrib>Orengo, Christine A.</creatorcontrib><title>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</title><title>Journal of molecular biology</title><addtitle>J Mol Biol</addtitle><description>The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.</description><subject>Animals</subject><subject>Computational Biology - trends</subject><subject>Databases, Protein</subject><subject>fold</subject><subject>Genome</subject><subject>Genomics - methods</subject><subject>Humans</subject><subject>novelty</subject><subject>Protein Conformation</subject><subject>protein structure</subject><subject>Sequence Analysis, Protein</subject><subject>structural genomics</subject><subject>Structural Homology, Protein</subject><subject>superfamily</subject><issn>0022-2836</issn><issn>1089-8638</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkUtLAzEUhYMoWh8_wI3Myt3U3GTyGF1J0VoQFKzrMEnvSMo8ajJT6L93SovuVDhwN985i_sRcgl0DBTkzXK8rO2YUSrGlA9RB2QEVOepllwfkhGljKVMc3lCTmNc0gHkmT4mJyC0yKTQIzJ_De1HwBiTtkzeutC7rg9FlUyxaWvvYjJrfOeLzq8x3ib3zZCi2kS_49tqjYtkXoQP7L7bGM_JUVlUES_294y8Pz7MJ0_p88t0Nrl_Tl2WyS5lDsACUyXVVuRWlQVolwNoZRmClpYrWgIyaxVDFGCZlTIvJSrOSyFzfkaud7ur0H72GDtT--iwqooG2z4aqZQWmqo_Qch1rrX4B6h4JpjIBhB2oAttjAFLswq-LsLGADVbOWZpBjlmK8dQPmQ7frUf722Ni5_G3sYA3O0AHJ629hhMdB4bhwsf0HVm0fpf5r8AXbWe8g</recordid><startdate>20050520</startdate><enddate>20050520</enddate><creator>Todd, Annabel E.</creator><creator>Marsden, Russell L.</creator><creator>Thornton, Janet M.</creator><creator>Orengo, Christine A.</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7QL</scope><scope>C1K</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20050520</creationdate><title>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</title><author>Todd, Annabel E. ; Marsden, Russell L. ; Thornton, Janet M. ; Orengo, Christine A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Animals</topic><topic>Computational Biology - trends</topic><topic>Databases, Protein</topic><topic>fold</topic><topic>Genome</topic><topic>Genomics - methods</topic><topic>Humans</topic><topic>novelty</topic><topic>Protein Conformation</topic><topic>protein structure</topic><topic>Sequence Analysis, Protein</topic><topic>structural genomics</topic><topic>Structural Homology, Protein</topic><topic>superfamily</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Todd, Annabel E.</creatorcontrib><creatorcontrib>Marsden, Russell L.</creatorcontrib><creatorcontrib>Thornton, Janet M.</creatorcontrib><creatorcontrib>Orengo, Christine A.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of molecular biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Todd, Annabel E.</au><au>Marsden, Russell L.</au><au>Thornton, Janet M.</au><au>Orengo, Christine A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</atitle><jtitle>Journal of molecular biology</jtitle><addtitle>J Mol Biol</addtitle><date>2005-05-20</date><risdate>2005</risdate><volume>348</volume><issue>5</issue><spage>1235</spage><epage>1260</epage><pages>1235-1260</pages><issn>0022-2836</issn><eissn>1089-8638</eissn><abstract>The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>15854658</pmid><doi>10.1016/j.jmb.2005.03.037</doi><tpages>26</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0022-2836
ispartof Journal of molecular biology, 2005-05, Vol.348 (5), p.1235-1260
issn 0022-2836
1089-8638
language eng
recordid cdi_proquest_miscellaneous_67785807
source MEDLINE; Access via ScienceDirect (Elsevier)
subjects Animals
Computational Biology - trends
Databases, Protein
fold
Genome
Genomics - methods
Humans
novelty
Protein Conformation
protein structure
Sequence Analysis, Protein
structural genomics
Structural Homology, Protein
superfamily
title Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T09%3A38%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Progress%20of%20Structural%20Genomics%20Initiatives:%20An%20Analysis%20of%20Solved%20Target%20Structures&rft.jtitle=Journal%20of%20molecular%20biology&rft.au=Todd,%20Annabel%20E.&rft.date=2005-05-20&rft.volume=348&rft.issue=5&rft.spage=1235&rft.epage=1260&rft.pages=1235-1260&rft.issn=0022-2836&rft.eissn=1089-8638&rft_id=info:doi/10.1016/j.jmb.2005.03.037&rft_dat=%3Cproquest_cross%3E19898857%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=17345254&rft_id=info:pmid/15854658&rft_els_id=S0022283605003190&rfr_iscdi=true