Determining functional specificity from protein sequences

Motivation: Given a large family of homologous protein sequences, many methods can divide the family into smaller groups that correspond to the different functions carried out by proteins within the family. One important problem, however, has been the absence of a general method for selecting an app...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2005-06, Vol.21 (11), p.2629-2635
Hauptverfasser: Donald, Jason E., Shakhnovich, Eugene I.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Motivation: Given a large family of homologous protein sequences, many methods can divide the family into smaller groups that correspond to the different functions carried out by proteins within the family. One important problem, however, has been the absence of a general method for selecting an appropriate level of granularity, or size of the groups. Results: We propose a consistent way of choosing the granularity that is independent of the sequence similarity and sequence clustering method used. We study three large, well-investigated protein families: basic leucine zippers, nuclear receptors and proteins with three consecutive C2H2 zinc fingers. Our method is tested against known functional information, the experimentally determined binding specificities, using a simple scoring method. The significance of the groups is also measured by randomizing the data. Finally, we compare our algorithm against a popular method of grouping proteins, the TRIBE-MCL method. In the end, we determine that dividing the families at the proposed level of granularity creates very significant and useful groups of proteins that correspond to the different DNA-binding motifs. We expect that such groupings will be useful in studying not only DNA binding but also other protein interactions. Contact: shakhnovich@chemistry.harvard.edu Supplementary information: The supplementary material contains: experimental binding specificities, a list of proteins in the proposed clusters, a table listing the percentage of proteins with binding data and from humans, visualizations of nuclear receptor and zinc finger proteins from humans, gene trees for two families, BLAST results and TRIBE-MCL results.
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/bti396