Abstract 5386: The Seven Bridges Cancer Genomics Cloud: Enabling reproducible and cost-effective analysis in the cloud

Next-generation sequencing has led to the generation of petabytes of public data with the potential to significantly advance biomedical research. The Cancer Genome Atlas (TCGA) network alone, for example, has produced more than 2.5 petabytes of data. The logistical difficulties that researchers face...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cancer research (Chicago, Ill.) Ill.), 2018-07, Vol.78 (13_Supplement), p.5386-5386
Hauptverfasser: Jordanski, Milos, Bierman, Robert, Lehnert, Erik, Damljanovic, Ana, Freeman, Eric, Hsieh, Gillian, Salzman, Julia
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Next-generation sequencing has led to the generation of petabytes of public data with the potential to significantly advance biomedical research. The Cancer Genome Atlas (TCGA) network alone, for example, has produced more than 2.5 petabytes of data. The logistical difficulties that researchers face while accessing such large datasets continue to present challenges, however. Downloading the complete TCGA dataset to a local data store can take several weeks or more, and, traditionally, integrated analysis has required resources available only to a limited number of researchers with access to large institutional compute clusters. In 2015, the National Cancer Institute (NCI) launched three Cancer Genomics Cloud Pilots, including the Seven Bridges Cancer Genomics Cloud (CGC; cancergenomicscloud.org), to democratize access to datasets such as TCGA by colocalizing data and computational resources in the cloud. In 2017, NCI expanded this effort to the development of an NCI Cancer Research Data Commons in which the CGC and other Cloud Pilots, now known as Cloud Resources, continue to deliver cloud-based access to petabyte-scale data and analysis resources. The Seven Bridges CGC is a customizable and scalable data access and analysis platform that connects users via the web to extensive public datasets, including multi-omic data from TCGA, the Simons Genome Diversity Project, the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative, the International Cancer Genome Consortium (ICGC), the Cancer Cell Line Encyclopedia, The Cancer Imaging Archive (TCIA), and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). The CGC enables collaborative, reproducible analysis across both public and private cohorts through access to customizable workspaces, a public toolkit containing more than 300 common analytical tools and workflows, and additional resources including an open-source Software Development Kit known as Rabix. Since the launch of the CGC in early 2016, more than 2500 researchers from more than 150 institutions in 30 countries have used the platform to deploy more than 5,000 applications to perform analyses representing more than 100 years of computation time. To illustrate the potential of the CGC to provide a customizable and scalable research environment, we present a collaborative project that enables unprecedented precision in detection of gene fusions and splice variants using novel statistical algorithm called Machete. W
ISSN:0008-5472
1538-7445
DOI:10.1158/1538-7445.AM2018-5386