Roastgsa: a comparison of rotation-based scores for gene set enrichment analysis

BackgroundGene-wise differential expression is usually the first major step in the statistical analysis of high-throughput data obtained from techniques such as microarrays or RNA-sequencing. The analysis at gene level is often complemented by interrogating the data in a broader biological context t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BMC bioinformatics 2023-10, Vol.24 (1), p.1-408, Article 408
Hauptverfasser: Caballé-Mestres, Adrià, Berenguer-Llergo, Antoni, Stephan-Otto Attolini, Camille
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:BackgroundGene-wise differential expression is usually the first major step in the statistical analysis of high-throughput data obtained from techniques such as microarrays or RNA-sequencing. The analysis at gene level is often complemented by interrogating the data in a broader biological context that considers as unit of measure groups of genes that may have a common function or biological trait. Among the vast number of publications about gene set analysis (GSA), the rotation test for gene set analysis, also referred to as roast, is a general sample randomization approach that maintains the integrity of the intra-gene set correlation structure in defining the null distribution of the test.ResultsWe present roastgsa, an R package that contains several enrichment score functions that feed the roast algorithm for hypothesis testing. These implemented methods are evaluated using both simulated and benchmarking data in microarray and RNA-seq datasets. We find that computationally intensive measures based on Kolmogorov-Smirnov (KS) statistics fail to improve the rates of simpler measures of GSA like mean and maxmean scores. We also show the importance of accounting for the gene linear dependence structure of the testing set, which is linked to the loss of effective signature size. Complete graphical representation of the results, including an approximation for the effective signature size, can be obtained as part of the roastgsa output.ConclusionsWe encourage the usage of the absmean (non-directional), mean (directional) and maxmean (directional) scores for roast GSA analysis as these are simple measures of enrichment that have presented dominant results in all provided analyses in comparison to the more complex KS measures.
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-023-05510-x