From protein microarrays to diagnostic antigen discovery: a study of the pathogen Francisella tularensis

Motivation: An important application of protein microarray data analysis is identifying a serodiagnostic antigen set that can reliably detect patterns and classify antigen expression profiles. This work addresses this problem using antibody responses to protein markers measured by a novel high-throu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2007-07, Vol.23 (13), p.i508-i518
Hauptverfasser: Sundaresh, Suman, Randall, Arlo, Unal, Berkay, Petersen, Jeannine M., Belisle, John T., Gill Hartley, M., Duffield, Melanie, Titball, Richard W., Davies, D. Huw, Felgner, Philip L., Baldi, Pierre
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Motivation: An important application of protein microarray data analysis is identifying a serodiagnostic antigen set that can reliably detect patterns and classify antigen expression profiles. This work addresses this problem using antibody responses to protein markers measured by a novel high-throughput microarray technology. The findings from this study have direct relevance to rapid, broad-based diagnostic and vaccine development. Results: Protein microarray chips are probed with sera from individuals infected with the bacteria Francisella tularensis, a category A biodefense pathogen. A two-step approach to the diagnostic process is presented (1) feature (antigen) selection and (2) classification using antigen response measurements obtained from F.tularensis microarrays (244 antigens, 46 infected and 54 healthy human sera measurements). To select antigens, a ranking scheme based on the identification of significant immune responses and differential expression analysis is described. Classification methods including k-nearest neighbors, support vector machines (SVM) and k-Means clustering are applied to training data using selected antigen sets of various sizes. SVM based models yield prediction accuracy rates in the range of ∼90% on validation data, when antigen set sizes are between 25 and 50. These results strongly indicate that the top-ranked antigens can be considered high-priority candidates for diagnostic development. Availability: All software programs are written in R and available at http://www.igb.uci.edu/index.php?page=tools and at http://www.r-project.org Contact: pfbaldi@uci.edu Supplementary information: Supplementary data are available at Bioinformatics online.
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/btm207