Defining an informativeness metric for clustering gene expression data

Unsupervised 'cluster' analysis is an invaluable tool for exploratory microarray data analysis, as it organizes the data into groups of genes or samples in which the elements share common patterns. Once the data are clustered, finding the optimal number of informative subgroups within a da...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2011-04, Vol.27 (8), p.1094-1100
Hauptverfasser:	MAR, Jessica C, WELLS, Christine A, QUACKENBUSH, John
Format:	Artikel
Sprache:	eng
Schlagworte:	Bioinformatics Biological and medical sciences Cluster Analysis Data Interpretation, Statistical Fundamental and applied biological sciences. Psychology Gene Expression Gene Expression Profiling - methods General aspects Humans Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Oligonucleotide Array Sequence Analysis Original Papers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Unsupervised 'cluster' analysis is an invaluable tool for exploratory microarray data analysis, as it organizes the data into groups of genes or samples in which the elements share common patterns. Once the data are clustered, finding the optimal number of informative subgroups within a dataset is a problem that, while important for understanding the underlying phenotypes, is one for which there is no robust, widely accepted solution. To address this problem we developed an 'informativeness metric' based on a simple analysis of variance statistic that identifies the number of clusters which best separate phenotypic groups. The performance of the informativeness metric has been tested on both experimental and simulated datasets, and we contrast these results with those obtained using alternative methods such as the gap statistic. The method has been implemented in the Bioconductor R package attract; it is also freely available from http://compbio.dfci.harvard.edu/pubs/attract_1.0.1.zip. jess@jimmy.harvard.edu; johnq@jimmy.harvard.edu Supplementary data are available at Bioinformatics online.
ISSN:	1367-4803 1367-4811 1460-2059
DOI:	10.1093/bioinformatics/btr074