Abstract 2466: Identifying confidently measured genes in single pediatric cancer patient samples using RNA sequencing
In the UC Santa Cruz Treehouse Childhood Cancer Initiative (treehousegenomics.soe.ucsc.edu), we are exploring the utility of using RNA-Seq analysis of tumor samples from children to identify potential novel therapeutic options for each individual. Within a single RNA-Seq data set, the gene expressio...
Gespeichert in:
Veröffentlicht in: | Cancer research (Chicago, Ill.) Ill.), 2017-07, Vol.77 (13_Supplement), p.2466-2466 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In the UC Santa Cruz Treehouse Childhood Cancer Initiative (treehousegenomics.soe.ucsc.edu), we are exploring the utility of using RNA-Seq analysis of tumor samples from children to identify potential novel therapeutic options for each individual. Within a single RNA-Seq data set, the gene expression measurements are not equally accurate. The identification of activated, druggable pathways requires accurate gene-level expression measurements.
We receive samples from a variety of clinical and research settings, and the quantity and complexity of the available input material and the depth of sequencing differ. These factors inspired us to develop a tool that will allow us to identify accurate measurements in most RNA-Seq samples we receive.
First, we characterized the relationship between depth of sequencing and the accuracy of the gene expression measurement. We analyzed subsets of reads in samples with more than 50 million Uniquely Mapped, Exonic, Non-duplicate (UMEND) reads. UMEND reads typically constitute over 80% of the reads in a high quality experiment with sufficient starting material. We compared gene expression across the subsets of reads to calculate how many UMEND reads are required to produce consistent measurements. We found that, on average, genes expressed at 1-5 TPM in our data require 30 million reads to be accurately measured. For this calculation, we define accuracy as the condition in which 75% of genes are measured to within 25% of the true value.
Secondly, we use these known relationships to identify genes that have been accurately measured in our tumor RNA-Seq samples. For a sample with 15 million UMEND reads, we find that genes expressed above 5 TPM can be accurately measured and are retained. In the first twelve samples analyzed, samples with more than 10 million UMEND reads retained at least 46% of the genes expressed above zero. We exclude as references those samples with fewer than 10 million UMEND reads due to the marked gene loss after thresholding for this group.
Using accurately measured genes allows us to more confidently assess similarity to other samples, identify enriched pathways, and confirm the expression of drug targets and related molecules under consideration. For example, we reconsidered the CDK4 inhibitor Palbociclib in one patient because the expression of RB1, downstream effector required for Palbociclib-mediated tumor cell death, was under our accuracy threshold. Accuracy thresholds can also be used in experim |
---|---|
ISSN: | 0008-5472 1538-7445 |
DOI: | 10.1158/1538-7445.AM2017-2466 |