Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures

The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of molecular biology 2005-05, Vol.348 (5), p.1235-1260
Hauptverfasser:	Todd, Annabel E., Marsden, Russell L., Thornton, Janet M., Orengo, Christine A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Animals Computational Biology - trends Databases, Protein fold Genome Genomics - methods Humans novelty Protein Conformation protein structure Sequence Analysis, Protein structural genomics Structural Homology, Protein superfamily
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1260
container_issue	5
container_start_page	1235
container_title	Journal of molecular biology
container_volume	348
creator	Todd, Annabel E. Marsden, Russell L. Thornton, Janet M. Orengo, Christine A.
description	The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.
doi_str_mv	10.1016/j.jmb.2005.03.037
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_67785807</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0022283605003190</els_id><sourcerecordid>19898857</sourcerecordid><originalsourceid>FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</originalsourceid><addsrcrecordid>eNqNkUtLAzEUhYMoWh8_wI3Myt3U3GTyGF1J0VoQFKzrMEnvSMo8ajJT6L93SovuVDhwN985i_sRcgl0DBTkzXK8rO2YUSrGlA9RB2QEVOepllwfkhGljKVMc3lCTmNc0gHkmT4mJyC0yKTQIzJ_De1HwBiTtkzeutC7rg9FlUyxaWvvYjJrfOeLzq8x3ib3zZCi2kS_49tqjYtkXoQP7L7bGM_JUVlUES_294y8Pz7MJ0_p88t0Nrl_Tl2WyS5lDsACUyXVVuRWlQVolwNoZRmClpYrWgIyaxVDFGCZlTIvJSrOSyFzfkaud7ur0H72GDtT--iwqooG2z4aqZQWmqo_Qch1rrX4B6h4JpjIBhB2oAttjAFLswq-LsLGADVbOWZpBjlmK8dQPmQ7frUf722Ni5_G3sYA3O0AHJ629hhMdB4bhwsf0HVm0fpf5r8AXbWe8g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>17345254</pqid></control><display><type>article</type><title>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</title><source>MEDLINE</source><source>Access via ScienceDirect (Elsevier)</source><creator>Todd, Annabel E. ; Marsden, Russell L. ; Thornton, Janet M. ; Orengo, Christine A.</creator><creatorcontrib>Todd, Annabel E. ; Marsden, Russell L. ; Thornton, Janet M. ; Orengo, Christine A.</creatorcontrib><description>The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.</description><identifier>ISSN: 0022-2836</identifier><identifier>EISSN: 1089-8638</identifier><identifier>DOI: 10.1016/j.jmb.2005.03.037</identifier><identifier>PMID: 15854658</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Animals ; Computational Biology - trends ; Databases, Protein ; fold ; Genome ; Genomics - methods ; Humans ; novelty ; Protein Conformation ; protein structure ; Sequence Analysis, Protein ; structural genomics ; Structural Homology, Protein ; superfamily</subject><ispartof>Journal of molecular biology, 2005-05, Vol.348 (5), p.1235-1260</ispartof><rights>2005 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</citedby><cites>FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jmb.2005.03.037$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/15854658$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Todd, Annabel E.</creatorcontrib><creatorcontrib>Marsden, Russell L.</creatorcontrib><creatorcontrib>Thornton, Janet M.</creatorcontrib><creatorcontrib>Orengo, Christine A.</creatorcontrib><title>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</title><title>Journal of molecular biology</title><addtitle>J Mol Biol</addtitle><description>The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.</description><subject>Animals</subject><subject>Computational Biology - trends</subject><subject>Databases, Protein</subject><subject>fold</subject><subject>Genome</subject><subject>Genomics - methods</subject><subject>Humans</subject><subject>novelty</subject><subject>Protein Conformation</subject><subject>protein structure</subject><subject>Sequence Analysis, Protein</subject><subject>structural genomics</subject><subject>Structural Homology, Protein</subject><subject>superfamily</subject><issn>0022-2836</issn><issn>1089-8638</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkUtLAzEUhYMoWh8_wI3Myt3U3GTyGF1J0VoQFKzrMEnvSMo8ajJT6L93SovuVDhwN985i_sRcgl0DBTkzXK8rO2YUSrGlA9RB2QEVOepllwfkhGljKVMc3lCTmNc0gHkmT4mJyC0yKTQIzJ_De1HwBiTtkzeutC7rg9FlUyxaWvvYjJrfOeLzq8x3ib3zZCi2kS_49tqjYtkXoQP7L7bGM_JUVlUES_294y8Pz7MJ0_p88t0Nrl_Tl2WyS5lDsACUyXVVuRWlQVolwNoZRmClpYrWgIyaxVDFGCZlTIvJSrOSyFzfkaud7ur0H72GDtT--iwqooG2z4aqZQWmqo_Qch1rrX4B6h4JpjIBhB2oAttjAFLswq-LsLGADVbOWZpBjlmK8dQPmQ7frUf722Ni5_G3sYA3O0AHJ629hhMdB4bhwsf0HVm0fpf5r8AXbWe8g</recordid><startdate>20050520</startdate><enddate>20050520</enddate><creator>Todd, Annabel E.</creator><creator>Marsden, Russell L.</creator><creator>Thornton, Janet M.</creator><creator>Orengo, Christine A.</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7QL</scope><scope>C1K</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20050520</creationdate><title>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</title><author>Todd, Annabel E. ; Marsden, Russell L. ; Thornton, Janet M. ; Orengo, Christine A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c446t-2c11b127f08b59b7fa18c91187b2e186b370f1e2bb72ee51b2b669f6e733f5693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Animals</topic><topic>Computational Biology - trends</topic><topic>Databases, Protein</topic><topic>fold</topic><topic>Genome</topic><topic>Genomics - methods</topic><topic>Humans</topic><topic>novelty</topic><topic>Protein Conformation</topic><topic>protein structure</topic><topic>Sequence Analysis, Protein</topic><topic>structural genomics</topic><topic>Structural Homology, Protein</topic><topic>superfamily</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Todd, Annabel E.</creatorcontrib><creatorcontrib>Marsden, Russell L.</creatorcontrib><creatorcontrib>Thornton, Janet M.</creatorcontrib><creatorcontrib>Orengo, Christine A.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of molecular biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Todd, Annabel E.</au><au>Marsden, Russell L.</au><au>Thornton, Janet M.</au><au>Orengo, Christine A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures</atitle><jtitle>Journal of molecular biology</jtitle><addtitle>J Mol Biol</addtitle><date>2005-05-20</date><risdate>2005</risdate><volume>348</volume><issue>5</issue><spage>1235</spage><epage>1260</epage><pages>1235-1260</pages><issn>0022-2836</issn><eissn>1089-8638</eissn><abstract>The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (≥30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>15854658</pmid><doi>10.1016/j.jmb.2005.03.037</doi><tpages>26</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0022-2836
ispartof	Journal of molecular biology, 2005-05, Vol.348 (5), p.1235-1260
issn	0022-2836 1089-8638
language	eng
recordid	cdi_proquest_miscellaneous_67785807
source	MEDLINE; Access via ScienceDirect (Elsevier)
subjects	Animals Computational Biology - trends Databases, Protein fold Genome Genomics - methods Humans novelty Protein Conformation protein structure Sequence Analysis, Protein structural genomics Structural Homology, Protein superfamily
title	Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T09%3A38%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Progress%20of%20Structural%20Genomics%20Initiatives:%20An%20Analysis%20of%20Solved%20Target%20Structures&rft.jtitle=Journal%20of%20molecular%20biology&rft.au=Todd,%20Annabel%20E.&rft.date=2005-05-20&rft.volume=348&rft.issue=5&rft.spage=1235&rft.epage=1260&rft.pages=1235-1260&rft.issn=0022-2836&rft.eissn=1089-8638&rft_id=info:doi/10.1016/j.jmb.2005.03.037&rft_dat=%3Cproquest_cross%3E19898857%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=17345254&rft_id=info:pmid/15854658&rft_els_id=S0022283605003190&rfr_iscdi=true