A legume specific protein database (LegProt) improves the number of identified peptides, confidence scores and overall protein identification success rates for legume proteomics

A legume specific protein database derived from legume genomic sequences, tentative consensus sequences and singleton ESTs significantly increases confidence levels and success rates of legume protein identification. [Display omitted] ► A legume specific protein database (LegProt) was constructed. ►...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Phytochemistry (Oxford) 2011-07, Vol.72 (10), p.1020-1027
Hauptverfasser: Lei, Zhentian, Dai, Xinbin, Watson, Bonnie S., Zhao, Patrick X., Sumner, Lloyd W.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A legume specific protein database derived from legume genomic sequences, tentative consensus sequences and singleton ESTs significantly increases confidence levels and success rates of legume protein identification. [Display omitted] ► A legume specific protein database (LegProt) was constructed. ► LegProt was assembled using sequence data from seven legumes. ► The utility of LegProt was assessed and compared to the NCBI nr. ► LegProt improved protein identification success rates and confidence levels. ► LegProt is publicly available http://bioinfo.noble.org/manuscript-support/legumedb. A legume specific protein database (LegProt) has been created containing sequences from seven legume species, i.e., Glycine max, Lotus japonicus, Medicago sativa, Medicago truncatula, Lupinus albus, Phaseolus vulgaris, and Pisum sativum. The database consists of amino acid sequences translated from predicted gene models and 6-frame translations of tentative consensus (TC) sequences assembled from expressed sequence tags (ESTs) and singleton ESTs. This database was queried using mass spectral data for protein identification and identification success rates were compared to the NCBI nr database. Specifically, Mascot MS/MS ion searches of tandem nano-LC Q-TOFMS/MS mass spectral data showed that relative to the NCBI nr protein database, the LegProt database yielded a 54% increase in the average protein score (i.e., from NCBI nr 480 to LegProt 739) and a 50% increase in the average number of matched peptides (i.e., from NCBI nr 8 to LegProt 12). The overall identification success rate also increased from 88% (NCBI nr) to 93% (LegProt). Mascot peptide mass fingerprinting (PMF) searches of the LegProt database using MALDI-TOFMS data yielded a significant increase in the identification success rate from 19% (NCBI nr) to 34% (LegProt) while the average scores and average number of matched peptides showed insignificant changes. The results demonstrate that the LegProt database significantly increases legume protein identification success rates and the confidence levels compared to the commonly used NCBI nr. These improvements are primarily due to the presence of a large number of legume specific TC sequences in the LegProt database that were not found in NCBI nr. The LegProt database is freely available for download ( http://bioinfo.noble.org/manuscript-support/legumedb) and will serve as a valuable resource for legume proteomics.
ISSN:0031-9422
1873-3700
DOI:10.1016/j.phytochem.2011.01.026