Reuse of public genome-wide gene expression data

Key Points Over the past decade, high-throughput gene expression experiments have generated data from millions of assays. Data sets linked to publications are stored in functional genomics data archives: ArrayExpress at the European Bioinformatics Institute, Gene Expression Omnibus at the US Nationa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature reviews. Genetics 2013-02, Vol.14 (2), p.89-99
Hauptverfasser: Rung, Johan, Brazma, Alvis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Key Points Over the past decade, high-throughput gene expression experiments have generated data from millions of assays. Data sets linked to publications are stored in functional genomics data archives: ArrayExpress at the European Bioinformatics Institute, Gene Expression Omnibus at the US National Center for Biotechnology Information and at the DNA Databank of Japan Omics Archive. Secondary added-value and topical databases process data from the primary archives, adding analysis and annotation to make these data accessible to every biologist by allowing queries such as 'in which tissue is a particular gene expressed?' or 'which genes are differentially expressed between a particular disease and normal samples?' Public gene expression data are commonly reused to study biological questions, both by reanalysis of primary data and by queries to secondary resources. Approximately half of the studies that use public gene expression data rely solely on existing data without adding newly generated data, and half of them use the public data in combination with new data. The reproducibility of published microarray-based studies is limited, mostly owing to insufficient experiment annotation and sometimes to unavailability of the raw or processed data. A stricter enforcement of Minimum Information About a Microarray Experiment (MIAME) requirements and also development of easy-to-use experiment annotation tools are needed to achieve a better reproducibility. Although most of the public gene expression data still are based on microarray experiments, the contribution of high-throughput-sequencing-based expression studies, known as RNA sequencing (RNA-seq), are growing rapidly. Reuse of RNA-seq data can potentially be even more valuable than reuse of microarray data, partly owing to the costs of experiments and data storage but even more importantly because of a more quantitative nature of sequencing-based expression data. Community standards such as Minimum Information about Sequencing Experiments (MINSEQE) should be adopted to make RNA-seq data maximally reusable. The bioinformatics resources that store and manage public data are sensitive to short-term funding changes, complicating the maintenance of important databases. The development of long-term infrastructure in bioinformatics, such as the ELIXIR project in Europe, is needed to ensure the long term availability of public data. A wealth of microarray gene expression data and a growing volume of RNA sequencing da
ISSN:1471-0056
1471-0064
DOI:10.1038/nrg3394