Exploring the sequence‐structure protein landscape in the glycosyltransferase family
To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBa...
Gespeichert in:
Veröffentlicht in: | Protein science 2003-10, Vol.12 (10), p.2291-2302 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2302 |
---|---|
container_issue | 10 |
container_start_page | 2291 |
container_title | Protein science |
container_volume | 12 |
creator | Zhang, Ziding Kochhar, Sunil Grigorov, Martin |
description | To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D‐PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence‐similarity search methods (i.e., BLAST and PSI‐BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI‐BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family. |
doi_str_mv | 10.1110/ps.03131303 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2366918</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>75700698</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4201-3bc27066b40a25b4702281da1dc91b1f16f3b9f7f1a54fa6adcdf4de6bf38163</originalsourceid><addsrcrecordid>eNp9kc9K5EAQxhtRdPxz8r7k5EWiVd2ZTnJZEHFVEBQR8dZ0OtVjlp4kdie6ue0j7DP6JGaYUdeL1KEo6sdXX1Uxto9whIhw3IYjEDgGiDU2wUTmcZbLh3U2gVxinAmZbbHtEH4DQIJcbLItTKYAWZZO2P3Zn9Y1vqpnUfdIUaCnnmpDr3__hc73pus9Ra1vOqrqyOm6DEa3FI3Fgp65wTRhcJ3XdbDkdaDI6nnlhl22YbULtLfKO-zu19nd6UV8dX1-eXpyFZuEA8aiMDwFKYsENJ8WSQqcZ1hqLE2OBVqUVhS5TS3qaWK11KUpbVKSLKzIUIod9nMp2_bFnEpD9WjFqdZXc-0H1ehKfe3U1aOaNc-KCylzzEaBg5WAb8bNQ6fmVTDkxlWp6YNKpymAzBfg4RI0vgnBk_0YgqAWb1BtUO9vGOkf__v6ZFd3HwG-BF4qR8N3Wurm9ho55zmKN4zBlqk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>75700698</pqid></control><display><type>article</type><title>Exploring the sequence‐structure protein landscape in the glycosyltransferase family</title><source>Wiley-Blackwell Journals</source><source>MEDLINE</source><source>Wiley Free Archive</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><source>EZB Electronic Journals Library</source><creator>Zhang, Ziding ; Kochhar, Sunil ; Grigorov, Martin</creator><creatorcontrib>Zhang, Ziding ; Kochhar, Sunil ; Grigorov, Martin</creatorcontrib><description>To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D‐PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence‐similarity search methods (i.e., BLAST and PSI‐BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI‐BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.</description><identifier>ISSN: 0961-8368</identifier><identifier>EISSN: 1469-896X</identifier><identifier>DOI: 10.1110/ps.03131303</identifier><identifier>PMID: 14500887</identifier><language>eng</language><publisher>Bristol: Cold Spring Harbor Laboratory Press</publisher><subject>Algorithms ; Amino Acid Sequence ; Computational Biology - methods ; Confidence Intervals ; Databases, Nucleic Acid ; Databases, Protein ; fold recognition ; Glycosyltransferase ; Glycosyltransferases - chemistry ; Glycosyltransferases - genetics ; Protein Folding ; protein structure prediction ; Protein Structure, Tertiary ; Sequence Alignment ; sequence‐similarity searching ; structural genomics ; Structural Homology, Protein</subject><ispartof>Protein science, 2003-10, Vol.12 (10), p.2291-2302</ispartof><rights>Copyright © 2003 The Protein Society</rights><rights>Copyright © Copyright 2003 The Protein Society</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4201-3bc27066b40a25b4702281da1dc91b1f16f3b9f7f1a54fa6adcdf4de6bf38163</citedby><cites>FETCH-LOGICAL-c4201-3bc27066b40a25b4702281da1dc91b1f16f3b9f7f1a54fa6adcdf4de6bf38163</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2366918/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2366918/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,723,776,780,881,1411,1427,27903,27904,45553,45554,46387,46811,53769,53771</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/14500887$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Ziding</creatorcontrib><creatorcontrib>Kochhar, Sunil</creatorcontrib><creatorcontrib>Grigorov, Martin</creatorcontrib><title>Exploring the sequence‐structure protein landscape in the glycosyltransferase family</title><title>Protein science</title><addtitle>Protein Sci</addtitle><description>To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D‐PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence‐similarity search methods (i.e., BLAST and PSI‐BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI‐BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.</description><subject>Algorithms</subject><subject>Amino Acid Sequence</subject><subject>Computational Biology - methods</subject><subject>Confidence Intervals</subject><subject>Databases, Nucleic Acid</subject><subject>Databases, Protein</subject><subject>fold recognition</subject><subject>Glycosyltransferase</subject><subject>Glycosyltransferases - chemistry</subject><subject>Glycosyltransferases - genetics</subject><subject>Protein Folding</subject><subject>protein structure prediction</subject><subject>Protein Structure, Tertiary</subject><subject>Sequence Alignment</subject><subject>sequence‐similarity searching</subject><subject>structural genomics</subject><subject>Structural Homology, Protein</subject><issn>0961-8368</issn><issn>1469-896X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kc9K5EAQxhtRdPxz8r7k5EWiVd2ZTnJZEHFVEBQR8dZ0OtVjlp4kdie6ue0j7DP6JGaYUdeL1KEo6sdXX1Uxto9whIhw3IYjEDgGiDU2wUTmcZbLh3U2gVxinAmZbbHtEH4DQIJcbLItTKYAWZZO2P3Zn9Y1vqpnUfdIUaCnnmpDr3__hc73pus9Ra1vOqrqyOm6DEa3FI3Fgp65wTRhcJ3XdbDkdaDI6nnlhl22YbULtLfKO-zu19nd6UV8dX1-eXpyFZuEA8aiMDwFKYsENJ8WSQqcZ1hqLE2OBVqUVhS5TS3qaWK11KUpbVKSLKzIUIod9nMp2_bFnEpD9WjFqdZXc-0H1ehKfe3U1aOaNc-KCylzzEaBg5WAb8bNQ6fmVTDkxlWp6YNKpymAzBfg4RI0vgnBk_0YgqAWb1BtUO9vGOkf__v6ZFd3HwG-BF4qR8N3Wurm9ho55zmKN4zBlqk</recordid><startdate>200310</startdate><enddate>200310</enddate><creator>Zhang, Ziding</creator><creator>Kochhar, Sunil</creator><creator>Grigorov, Martin</creator><general>Cold Spring Harbor Laboratory Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>200310</creationdate><title>Exploring the sequence‐structure protein landscape in the glycosyltransferase family</title><author>Zhang, Ziding ; Kochhar, Sunil ; Grigorov, Martin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4201-3bc27066b40a25b4702281da1dc91b1f16f3b9f7f1a54fa6adcdf4de6bf38163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Algorithms</topic><topic>Amino Acid Sequence</topic><topic>Computational Biology - methods</topic><topic>Confidence Intervals</topic><topic>Databases, Nucleic Acid</topic><topic>Databases, Protein</topic><topic>fold recognition</topic><topic>Glycosyltransferase</topic><topic>Glycosyltransferases - chemistry</topic><topic>Glycosyltransferases - genetics</topic><topic>Protein Folding</topic><topic>protein structure prediction</topic><topic>Protein Structure, Tertiary</topic><topic>Sequence Alignment</topic><topic>sequence‐similarity searching</topic><topic>structural genomics</topic><topic>Structural Homology, Protein</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Ziding</creatorcontrib><creatorcontrib>Kochhar, Sunil</creatorcontrib><creatorcontrib>Grigorov, Martin</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Protein science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Ziding</au><au>Kochhar, Sunil</au><au>Grigorov, Martin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploring the sequence‐structure protein landscape in the glycosyltransferase family</atitle><jtitle>Protein science</jtitle><addtitle>Protein Sci</addtitle><date>2003-10</date><risdate>2003</risdate><volume>12</volume><issue>10</issue><spage>2291</spage><epage>2302</epage><pages>2291-2302</pages><issn>0961-8368</issn><eissn>1469-896X</eissn><abstract>To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D‐PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence‐similarity search methods (i.e., BLAST and PSI‐BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI‐BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.</abstract><cop>Bristol</cop><pub>Cold Spring Harbor Laboratory Press</pub><pmid>14500887</pmid><doi>10.1110/ps.03131303</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0961-8368 |
ispartof | Protein science, 2003-10, Vol.12 (10), p.2291-2302 |
issn | 0961-8368 1469-896X |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2366918 |
source | Wiley-Blackwell Journals; MEDLINE; Wiley Free Archive; PubMed Central; Free Full-Text Journals in Chemistry; EZB Electronic Journals Library |
subjects | Algorithms Amino Acid Sequence Computational Biology - methods Confidence Intervals Databases, Nucleic Acid Databases, Protein fold recognition Glycosyltransferase Glycosyltransferases - chemistry Glycosyltransferases - genetics Protein Folding protein structure prediction Protein Structure, Tertiary Sequence Alignment sequence‐similarity searching structural genomics Structural Homology, Protein |
title | Exploring the sequence‐structure protein landscape in the glycosyltransferase family |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T18%3A06%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploring%20the%20sequence%E2%80%90structure%20protein%20landscape%20in%20the%20glycosyltransferase%20family&rft.jtitle=Protein%20science&rft.au=Zhang,%20Ziding&rft.date=2003-10&rft.volume=12&rft.issue=10&rft.spage=2291&rft.epage=2302&rft.pages=2291-2302&rft.issn=0961-8368&rft.eissn=1469-896X&rft_id=info:doi/10.1110/ps.03131303&rft_dat=%3Cproquest_pubme%3E75700698%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=75700698&rft_id=info:pmid/14500887&rfr_iscdi=true |