Exploring the sequence‐structure protein landscape in the glycosyltransferase family

To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Protein science 2003-10, Vol.12 (10), p.2291-2302
Hauptverfasser: Zhang, Ziding, Kochhar, Sunil, Grigorov, Martin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2302
container_issue 10
container_start_page 2291
container_title Protein science
container_volume 12
creator Zhang, Ziding
Kochhar, Sunil
Grigorov, Martin
description To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D‐PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence‐similarity search methods (i.e., BLAST and PSI‐BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI‐BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.
doi_str_mv 10.1110/ps.03131303
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2366918</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>75700698</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4201-3bc27066b40a25b4702281da1dc91b1f16f3b9f7f1a54fa6adcdf4de6bf38163</originalsourceid><addsrcrecordid>eNp9kc9K5EAQxhtRdPxz8r7k5EWiVd2ZTnJZEHFVEBQR8dZ0OtVjlp4kdie6ue0j7DP6JGaYUdeL1KEo6sdXX1Uxto9whIhw3IYjEDgGiDU2wUTmcZbLh3U2gVxinAmZbbHtEH4DQIJcbLItTKYAWZZO2P3Zn9Y1vqpnUfdIUaCnnmpDr3__hc73pus9Ra1vOqrqyOm6DEa3FI3Fgp65wTRhcJ3XdbDkdaDI6nnlhl22YbULtLfKO-zu19nd6UV8dX1-eXpyFZuEA8aiMDwFKYsENJ8WSQqcZ1hqLE2OBVqUVhS5TS3qaWK11KUpbVKSLKzIUIod9nMp2_bFnEpD9WjFqdZXc-0H1ehKfe3U1aOaNc-KCylzzEaBg5WAb8bNQ6fmVTDkxlWp6YNKpymAzBfg4RI0vgnBk_0YgqAWb1BtUO9vGOkf__v6ZFd3HwG-BF4qR8N3Wurm9ho55zmKN4zBlqk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>75700698</pqid></control><display><type>article</type><title>Exploring the sequence‐structure protein landscape in the glycosyltransferase family</title><source>Wiley-Blackwell Journals</source><source>MEDLINE</source><source>Wiley Free Archive</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><source>EZB Electronic Journals Library</source><creator>Zhang, Ziding ; Kochhar, Sunil ; Grigorov, Martin</creator><creatorcontrib>Zhang, Ziding ; Kochhar, Sunil ; Grigorov, Martin</creatorcontrib><description>To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D‐PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence‐similarity search methods (i.e., BLAST and PSI‐BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI‐BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.</description><identifier>ISSN: 0961-8368</identifier><identifier>EISSN: 1469-896X</identifier><identifier>DOI: 10.1110/ps.03131303</identifier><identifier>PMID: 14500887</identifier><language>eng</language><publisher>Bristol: Cold Spring Harbor Laboratory Press</publisher><subject>Algorithms ; Amino Acid Sequence ; Computational Biology - methods ; Confidence Intervals ; Databases, Nucleic Acid ; Databases, Protein ; fold recognition ; Glycosyltransferase ; Glycosyltransferases - chemistry ; Glycosyltransferases - genetics ; Protein Folding ; protein structure prediction ; Protein Structure, Tertiary ; Sequence Alignment ; sequence‐similarity searching ; structural genomics ; Structural Homology, Protein</subject><ispartof>Protein science, 2003-10, Vol.12 (10), p.2291-2302</ispartof><rights>Copyright © 2003 The Protein Society</rights><rights>Copyright © Copyright 2003 The Protein Society</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4201-3bc27066b40a25b4702281da1dc91b1f16f3b9f7f1a54fa6adcdf4de6bf38163</citedby><cites>FETCH-LOGICAL-c4201-3bc27066b40a25b4702281da1dc91b1f16f3b9f7f1a54fa6adcdf4de6bf38163</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2366918/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2366918/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,723,776,780,881,1411,1427,27903,27904,45553,45554,46387,46811,53769,53771</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/14500887$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Ziding</creatorcontrib><creatorcontrib>Kochhar, Sunil</creatorcontrib><creatorcontrib>Grigorov, Martin</creatorcontrib><title>Exploring the sequence‐structure protein landscape in the glycosyltransferase family</title><title>Protein science</title><addtitle>Protein Sci</addtitle><description>To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D‐PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence‐similarity search methods (i.e., BLAST and PSI‐BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI‐BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.</description><subject>Algorithms</subject><subject>Amino Acid Sequence</subject><subject>Computational Biology - methods</subject><subject>Confidence Intervals</subject><subject>Databases, Nucleic Acid</subject><subject>Databases, Protein</subject><subject>fold recognition</subject><subject>Glycosyltransferase</subject><subject>Glycosyltransferases - chemistry</subject><subject>Glycosyltransferases - genetics</subject><subject>Protein Folding</subject><subject>protein structure prediction</subject><subject>Protein Structure, Tertiary</subject><subject>Sequence Alignment</subject><subject>sequence‐similarity searching</subject><subject>structural genomics</subject><subject>Structural Homology, Protein</subject><issn>0961-8368</issn><issn>1469-896X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kc9K5EAQxhtRdPxz8r7k5EWiVd2ZTnJZEHFVEBQR8dZ0OtVjlp4kdie6ue0j7DP6JGaYUdeL1KEo6sdXX1Uxto9whIhw3IYjEDgGiDU2wUTmcZbLh3U2gVxinAmZbbHtEH4DQIJcbLItTKYAWZZO2P3Zn9Y1vqpnUfdIUaCnnmpDr3__hc73pus9Ra1vOqrqyOm6DEa3FI3Fgp65wTRhcJ3XdbDkdaDI6nnlhl22YbULtLfKO-zu19nd6UV8dX1-eXpyFZuEA8aiMDwFKYsENJ8WSQqcZ1hqLE2OBVqUVhS5TS3qaWK11KUpbVKSLKzIUIod9nMp2_bFnEpD9WjFqdZXc-0H1ehKfe3U1aOaNc-KCylzzEaBg5WAb8bNQ6fmVTDkxlWp6YNKpymAzBfg4RI0vgnBk_0YgqAWb1BtUO9vGOkf__v6ZFd3HwG-BF4qR8N3Wurm9ho55zmKN4zBlqk</recordid><startdate>200310</startdate><enddate>200310</enddate><creator>Zhang, Ziding</creator><creator>Kochhar, Sunil</creator><creator>Grigorov, Martin</creator><general>Cold Spring Harbor Laboratory Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>200310</creationdate><title>Exploring the sequence‐structure protein landscape in the glycosyltransferase family</title><author>Zhang, Ziding ; Kochhar, Sunil ; Grigorov, Martin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4201-3bc27066b40a25b4702281da1dc91b1f16f3b9f7f1a54fa6adcdf4de6bf38163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Algorithms</topic><topic>Amino Acid Sequence</topic><topic>Computational Biology - methods</topic><topic>Confidence Intervals</topic><topic>Databases, Nucleic Acid</topic><topic>Databases, Protein</topic><topic>fold recognition</topic><topic>Glycosyltransferase</topic><topic>Glycosyltransferases - chemistry</topic><topic>Glycosyltransferases - genetics</topic><topic>Protein Folding</topic><topic>protein structure prediction</topic><topic>Protein Structure, Tertiary</topic><topic>Sequence Alignment</topic><topic>sequence‐similarity searching</topic><topic>structural genomics</topic><topic>Structural Homology, Protein</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Ziding</creatorcontrib><creatorcontrib>Kochhar, Sunil</creatorcontrib><creatorcontrib>Grigorov, Martin</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Protein science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Ziding</au><au>Kochhar, Sunil</au><au>Grigorov, Martin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploring the sequence‐structure protein landscape in the glycosyltransferase family</atitle><jtitle>Protein science</jtitle><addtitle>Protein Sci</addtitle><date>2003-10</date><risdate>2003</risdate><volume>12</volume><issue>10</issue><spage>2291</spage><epage>2302</epage><pages>2291-2302</pages><issn>0961-8368</issn><eissn>1469-896X</eissn><abstract>To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D‐PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence‐similarity search methods (i.e., BLAST and PSI‐BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI‐BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.</abstract><cop>Bristol</cop><pub>Cold Spring Harbor Laboratory Press</pub><pmid>14500887</pmid><doi>10.1110/ps.03131303</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0961-8368
ispartof Protein science, 2003-10, Vol.12 (10), p.2291-2302
issn 0961-8368
1469-896X
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2366918
source Wiley-Blackwell Journals; MEDLINE; Wiley Free Archive; PubMed Central; Free Full-Text Journals in Chemistry; EZB Electronic Journals Library
subjects Algorithms
Amino Acid Sequence
Computational Biology - methods
Confidence Intervals
Databases, Nucleic Acid
Databases, Protein
fold recognition
Glycosyltransferase
Glycosyltransferases - chemistry
Glycosyltransferases - genetics
Protein Folding
protein structure prediction
Protein Structure, Tertiary
Sequence Alignment
sequence‐similarity searching
structural genomics
Structural Homology, Protein
title Exploring the sequence‐structure protein landscape in the glycosyltransferase family
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T18%3A06%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploring%20the%20sequence%E2%80%90structure%20protein%20landscape%20in%20the%20glycosyltransferase%20family&rft.jtitle=Protein%20science&rft.au=Zhang,%20Ziding&rft.date=2003-10&rft.volume=12&rft.issue=10&rft.spage=2291&rft.epage=2302&rft.pages=2291-2302&rft.issn=0961-8368&rft.eissn=1469-896X&rft_id=info:doi/10.1110/ps.03131303&rft_dat=%3Cproquest_pubme%3E75700698%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=75700698&rft_id=info:pmid/14500887&rfr_iscdi=true