Poster: Identification and classification of internal repeats in proteins
Internal repeats are widely found in proteins and considered to be important in protein evolution and function. Three major types of internal repeat including domain, solenoid, and fibrous repeats are. These repeats may involve in protein-protein interaction as well as binding to various ligands suc...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Internal repeats are widely found in proteins and considered to be important in protein evolution and function. Three major types of internal repeat including domain, solenoid, and fibrous repeats are. These repeats may involve in protein-protein interaction as well as binding to various ligands such as DNA and RNA. For example, the tetratrico peptide repeats (TPR) that are involved in cell-cycle regulation, transcriptional regulation, protein transport, and assisting protein folding, and the TATA-binding protein (TBP) is a transcription factor that binds specifically to a DNA sequence. To identify and classify various types of protein repeats with different lengths from a query protein sequence or structure, we have designed a comprehensive system which focuses on analyzing autocorrelation relationships of sequence contents and topology of secondary structures within a protein. A complete database containing verified fundamental repeat sequence peptides and structural units for homologous matching analysis is also constructed. The data flow diagram of the proposed identification system contains two major parts: Repeat Database and Internal Repeat Analyzer. The Repeat Database is constructed by evaluating proteins from SCOP and Pfam through an autocorrelation mechanism. The Internal Repeat Analyzer is designed as a three-level hierarchical analysis for detecting domain, solenoid, and fibrous repeat respectively. In addition, an iteratively refined multiple structure alignment tool has been developed for comparing and verifying those extracted internal repeat substructures. In this study, the collected database contains 162 domain families with repeat characteristics, 28 fundamental repeat structure units and 129 repeat subsequences retrieved from 1,961 superfamilies, and we have demonstrated the proposed system can efficiently identify repeat topologies of proteins. |
---|---|
DOI: | 10.1109/ICCABS.2011.5729920 |