Improvement of the GenTHREADER method for genomic fold recognition

Motivation: In order to enhance genome annotation, the fully automatic fold recognition method GenTHREADER has been improved and benchmarked. The previous version of GenTHREADER consisted of a simple neural network which was trained to combine sequence alignment score, length information and energy...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2003-05, Vol.19 (7), p.874-881
Hauptverfasser:	McGuffin, Liam J., Jones, David T.
Format:	Artikel
Sprache:	eng
Schlagworte:	Amino Acid Sequence Animals Biological and medical sciences Databases, Protein False Positive Reactions Fundamental and applied biological sciences. Psychology General aspects Genome Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Molecular Sequence Data Protein Folding Protein Structure, Secondary Proteins - chemistry Proteins - classification Quality Control Reproducibility of Results Sensitivity and Specificity Sequence Alignment - methods Sequence Alignment - standards Sequence Analysis, Protein Sequence Homology Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Motivation: In order to enhance genome annotation, the fully automatic fold recognition method GenTHREADER has been improved and benchmarked. The previous version of GenTHREADER consisted of a simple neural network which was trained to combine sequence alignment score, length information and energy potentials derived from threading into a single score representing the relationship between two proteins, as designated by CATH. The improved version incorporates PSI-BLAST searches, which have been jumpstarted with structural alignment profiles from FSSP, and now also makes use of PSIPRED predicted secondary structure and bi-directional scoring in order to calculate the final alignment score. Pairwise potentials and solvation potentials are calculated from the given sequence alignment which are then used as inputs to a multi-layer, feed-forward neural network, along with the alignment score, alignment length and sequence length. The neural network has also been expanded to accommodate the secondary structure element alignment (SSEA) score as an extra input and it is now trained to learn the FSSP Z-score as a measurement of similarity between two proteins. Results: The improvements made to GenTHREADER increase the number of remote homologues that can be detected with a low error rate, implying higher reliability of score, whilst also increasing the quality of the models produced. We find that up to five times as many true positives can be detected with low error rate per query. Total MaxSub score is doubled at low false positive rates using the improved method. Availability: http://www.psipred.net Contact: l.mcguffin@cs.ucl.ac.uk * To whom correspondence should be addressed.
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/btg097