Genomic and evolutionary insights into genes encoding proteins with single amino acid repeats

Mutations causing expansion of amino acid repeats are responsible for 19 hereditary disorders. Repeats in several other proteins also show length variations. These observations prompted us to identify single amino acid repeat-containing proteins (SARPs) in humans and to understand their functional a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular biology and evolution 2006-07, Vol.23 (7), p.1357-1369
Hauptverfasser: Siwach, Pratibha, Pophaly, Saurabh Dilip, Ganesh, Subramaniam
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Mutations causing expansion of amino acid repeats are responsible for 19 hereditary disorders. Repeats in several other proteins also show length variations. These observations prompted us to identify single amino acid repeat-containing proteins (SARPs) in humans and to understand their functional and evolutionary significance. We identified 8812 SARPs containing 17 146 repeat domains, each harboring 4 or more residues. In all, 5% of SARPs (471) showed repeat length variations, and nearly 84% of them (394) have repeats of 10 residues or less. We find that SARPs are involved in functions that require formation of multiprotein complexes. Nearly 78% (6859) of the SARPs did not find a paralogue in the human proteome, and such proteins are considered as orphan SARPs. Orphan SARPs show longer repeat stretches, longer peptide length, and lower expression levels as compared with SARPs belonging to protein family. Because the intensity of gene expression is known to relate inversely with the rate of protein sequence evolution, our results suggest that the orphan SARPs evolve faster than the familial forms and therefore are under a weaker selection pressure. We also find that while GC-rich codons are favored for coding the repeat tracts of SARPs, specific codons and not nucleotide motifs per se are selected, suggesting functional constraints placed on the usage of codons. One of the constraints could be the mRNA stability as clustering of rare codons is known to destabilize the transcripts and rare codons are not favored for coding repeat tracts. Genes encoding polymorphic SARPs show preferential localization toward the telomeric segments. Further, the sex-specific recombination rates of the chromosomal locus strongly correlate with the parental gender that influence the repeat instability in disorder caused by dynamic mutation. Therefore, instability associated with repeats might be driven by processes that are specific to sperm or oocyte development, and the recombination frequency might play a positive role in this process.
ISSN:0737-4038
1537-1719
DOI:10.1093/molbev/msk022