Fast randomization of large genomic datasets while preserving alteration counts

Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2014-09, Vol.30 (17), p.i617-i623
Hauptverfasser:	Gobbi, Andrea, Iorio, Francesco, Dawson, Kevin J, Wedge, David C, Tamborero, David, Alexandrov, Ludmil B, Lopez-Bigas, Nuria, Garnett, Mathew J, Jurman, Giuseppe, Saez-Rodriguez, Julio
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Bioinformatics Cancer Computation Computer simulation Eccb 2014 Proceedings Papers Committee Genomics - methods Humans Markov Chains Mathematical analysis Mathematical models Monte Carlo Method Neoplasms - genetics Networks Random Allocation Software Switching theory
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks. BiRewire is available on BioConductor at http://www.bioconductor.org/packages/2.13/bioc/html/BiRewire.html. Supplementary data are available at Bioinformatics online.
ISSN:	1367-4803 1367-4811 1460-2059
DOI:	10.1093/bioinformatics/btu474