Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd

Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature communications 2016-09, Vol.7 (1), p.12846-11, Article 12846
Hauptverfasser: Wang, Zichen, Monteiro, Caroline D., Jagodnik, Kathleen M., Fernandez, Nicolas F., Gundersen, Gregory W., Rouillard, Andrew D., Jenkins, Sherry L., Feldmann, Axel S., Hu, Kevin S., McDermott, Michael G., Duan, Qiaonan, Clark, Neil R., Jones, Matthew R., Kou, Yan, Goff, Troy, Woodland, Holly, Amaral, Fabio M R., Szeto, Gregory L., Fuchs, Oliver, Schüssler-Fiorenza Rose, Sophia M., Sharma, Shvetank, Schwartz, Uwe, Bausela, Xabier Bengoetxea, Szymkiewicz, Maciej, Maroulis, Vasileios, Salykin, Anton, Barra, Carolina M., Kruth, Candice D., Bongio, Nicholas J., Mathur, Vaibhav, Todoric, Radmila D, Rubin, Udi E., Malatras, Apostolos, Fulp, Carl T., Galindo, John A., Motiejunaite, Ruta, Jüschke, Christoph, Dishuck, Philip C., Lahl, Katharina, Jafari, Mohieddin, Aibar, Sara, Zaravinos, Apostolos, Steenhuizen, Linda H., Allison, Lindsey R., Gamallo, Pablo, de Andres Segura, Fernando, Dae Devlin, Tyler, Pérez-García, Vicente, Ma’ayan, Avi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization. A wealth of gene expression data is publicly available, yet is little use without additional human curation. Ma’ayan and colleagues report a crowdsourcing project involving over 70 participants to annotate and analyse thousands of human disease-related gene expression datasets.
ISSN:2041-1723
2041-1723
DOI:10.1038/ncomms12846