Filtering erroneous protein annotation

Motivation: Automatically generated annotation on protein data of UniProt (Universal Protein Resource) is planned to be publicly available on the UniProt web pages in April 2004. It is expected that the data content of over 500 000 protein entries in the TrEMBL section will be enhanced by the output...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2004-08, Vol.20 (suppl-1), p.i342-i347
Hauptverfasser:	Wieser, D., Kretschmann, E., Apweiler, R.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Amino Acid Sequence Databases, Protein Documentation - methods Information Storage and Retrieval - methods Molecular Sequence Data Proteins - chemistry Proteins - classification Sequence Analysis, Protein - methods Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Motivation: Automatically generated annotation on protein data of UniProt (Universal Protein Resource) is planned to be publicly available on the UniProt web pages in April 2004. It is expected that the data content of over 500 000 protein entries in the TrEMBL section will be enhanced by the output of an automated annotation pipeline. However, a part of the automatically added data will be erroneous, as are parts of the information coming from other sources. We present a post-processing system called Xanthippe that is based on a simple exclusion mechanism and a decision tree approach using the C4.5 data-mining algorithm. Results: It is shown that Xanthippe detects and flags a large part of the annotation errors and considerably increases the reliability of both automatically generated data and annotation from other sources. As a cross-validation to Swiss-Prot shows, errors in protein descriptions, comments and keywords are successfully filtered out. Xanthippe is a contradictive application that can be combined seamlessly with predictive systems. It can be used either to improve the precision of automated annotation at a constant level of recall or increase the recall at a constant level of precision. Availability: The application of the Xanthippe rules can be browsed at http://www.ebi.uniprot.org/
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/bth938