IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices

Motivation: Many studies have shown that database searches using position-specific score matrices (PSSMs) or profiles as queries are more effective at identifying distant protein relationships than are searches that use simple sequences as queries. One popular program for constructing a PSSM and com...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 1999-12, Vol.15 (12), p.1000-1011
Hauptverfasser:	A.Schäffer, Alejandro, I.Wolf, Yuri, P.Ponting, Chris, V.Koonin, Eugene, Aravind, L., F.Altschul, Stephen
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Bacterial Proteins - genetics Biological and medical sciences Databases, Factual False Negative Reactions False Positive Reactions Fundamental and applied biological sciences. Psychology General aspects Information Storage and Retrieval - methods Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Odds Ratio Sequence Alignment Sequence Analysis, Protein - methods Sequence Homology Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Motivation: Many studies have shown that database searches using position-specific score matrices (PSSMs) or profiles as queries are more effective at identifying distant protein relationships than are searches that use simple sequences as queries. One popular program for constructing a PSSM and comparing it with a database of sequences is Position-Specific Iterated BLAST (PSI-BLAST). Results: This paper describes a new software package, IMPALA, designed for the complementary procedure of comparing a single query sequence with a database of PSI-BLAST-generated PSSMs. We illustrate the use of IMPALA to search a database of PSSMs for protein folds, and one for protein domains involved in signal transduction. IMPALA’s sensitivity to distant biological relationships is very similar to that of PSI-BLAST. However, IMPALA employs a more refined analysis of statistical significance and, unlike PSI-BLAST, guarantees the output of the optimal local alignment by using the rigorous Smith–Waterman algorithm. Also, it is considerably faster when run with a large database of PSSMs than is BLAST or PSI-BLAST when run against the complete non-redundant protein database. Availability: The IMPALA source code, the wolf1187 database, and the aravind105 database are freely available from the NCBI ftp site ncbi.nlm.nih.gov. The databases may be found in the subdirectory ftp://ncbi.nlm.nih.gov/pub/impala. The source code is in ftp://ncbi.nlm.nih.gov/toolbox/ncbi˙tools. Some IMPALA executables for different implementations of UNIX are in ftp://ncbi.nlm.nih.gov/blast/executables. IMPALA has been added as a search option on the Blocks Database Server (http://blocks.fhcrc.org/blocks/impala.html)using a library of PSSMs derived from the BLOCKS database. Contact: schaffer@helix.nih.gov
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/15.12.1000