Evaluation of Correction Methods for NASA GeneLab Transcriptomic Datasets

Conducting space biology experiments aboard the International Space Station, particularly those utilizing complex model organisms like mice, is expensive and difficult due to limited crew availability, hardware, and space. As a result, sample numbers from these studies are low, reducing the statisti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Samson, Finsam, Saravia-Butler, Amanda Marie
Format: Other
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Conducting space biology experiments aboard the International Space Station, particularly those utilizing complex model organisms like mice, is expensive and difficult due to limited crew availability, hardware, and space. As a result, sample numbers from these studies are low, reducing the statistical power of any one experiment. Aggregating spaceflight datasets serves as a method to increase sample numbers, allowing for novel insights through bioinformatic analysis of ‘omics data from merged datasets. However, aggregating datasets can introduce unwanted variation including 1) differences in sample handling, processing, and sequencing platforms between datasets (technical variation) as well as 2) differences in experimental design between datasets. In the present study, NASA GeneLab-hosted RNAseq datasets from mouse liver tissues were used to evaluate several statistical methods to correct for this unwanted variation through two approaches, reference-based and standard. The following correction algorithms were applied with (reference-based) and/or without (standard) considering Universal Mouse RNA Reference samples: ComBat and ComBat_seq from the SVA package, median polish, empirical Bayes, and ANOVA-based algorithms from the MBatch package, and negative binomial regression normalization in the DESeq2 package. For each approach, after the correction algorithm was applied, differential gene expression (DGE) analysis of flight and ground control samples was performed with the combined data. The robustness of each tool was evaluated using BatchQC to determine statistical differences between datasets before and after correction, Principal Component Analysis to evaluate global gene expression in samples before and after correction, and by comparing DGE analysis of individual datasets and combined datasets before and after correction. The results showed that the reference-based approach introduced several additional (and likely artificial) DEGs when compared with the respective standard approach. Of the methods tested, standard ComBat and DESeq2 were identified as the most robust correction methods for combining spaceflight mouse liver RNAseq datasets hosted on GeneLab.