A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data

Normalization is an essential step with considerable impact on high-throughput RNA sequencing (RNA-seq) data analysis. Although there are numerous methods for read count normalization, it remains a challenge to choose an optimal method due to multiple factors contributing to read count variability t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2017-05, Vol.12 (5), p.e0176185-e0176185
Hauptverfasser:	Li, Xiaohong, Brock, Guy N, Rouchka, Eric C, Cooper, Nigel G F, Wu, Dongfeng, O'Toole, Timothy E, Gill, Ryan S, Eteleeb, Abdallah M, O'Brien, Liz, Rai, Shesh N
Format:	Artikel
Sprache:	eng
Schlagworte:	Abundance Alternative splicing Area Under Curve Aroma Assembly Bias Bioinformatics Biology and Life Sciences Biomarkers Brain Breast Breast cancer Breast Neoplasms - metabolism Cardiology Color Comparative studies Computer programs Computer Simulation Construction Data analysis Data processing Datasets as Topic Diagnosis DNA microarrays Engineering Epidemiology Error analysis Exons Experimental design Factor analysis Fragmentation Gene expression Gene Expression Profiling - methods Gene mapping Genomes High-Throughput Nucleotide Sequencing - methods Humans Informatics Instrumentation Isoforms Luteinizing hormone Mapping Mathematical analysis Mathematical models Medicine and Health Sciences Methods Microarray Analysis - methods miRNA Models, Statistical Nervous system Packages Pattern recognition Physical Sciences Quality control Radiology Regression analysis Research and Analysis Methods Ribonucleic acid RNA RNA editing RNA sequencing ROC Curve Samples Scaling Sequence Analysis, RNA - methods Software Statistical analysis Statistical methods Tissues Transcription
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Normalization is an essential step with considerable impact on high-throughput RNA sequencing (RNA-seq) data analysis. Although there are numerous methods for read count normalization, it remains a challenge to choose an optimal method due to multiple factors contributing to read count variability that affects the overall sensitivity and specificity. In order to properly determine the most appropriate normalization methods, it is critical to compare the performance and shortcomings of a representative set of normalization routines based on different dataset characteristics. Therefore, we set out to evaluate the performance of the commonly used methods (DESeq, TMM-edgeR, FPKM-CuffDiff, TC, Med UQ and FQ) and two new methods we propose: Med-pgQ2 and UQ-pgQ2 (per-gene normalization after per-sample median or upper-quartile global scaling). Our per-gene normalization approach allows for comparisons between conditions based on similar count levels. Using the benchmark Microarray Quality Control Project (MAQC) and simulated datasets, we performed differential gene expression analysis to evaluate these methods. When evaluating MAQC2 with two replicates, we observed that Med-pgQ2 and UQ-pgQ2 achieved a slightly higher area under the Receiver Operating Characteristic Curve (AUC), a specificity rate > 85%, the detection power > 92% and an actual false discovery rate (FDR) under 0.06 given the nominal FDR (≤0.05). Although the top commonly used methods (DESeq and TMM-edgeR) yield a higher power (>93%) for MAQC2 data, they trade off with a reduced specificity (
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0176185