A rarefaction-without-resampling extension of PERMANOVA for testing presence–absence associations in the microbiome
Abstract Motivation PERMANOVA is currently the most commonly used method for testing community-level hypotheses about microbiome associations with covariates of interest. PERMANOVA can test for associations that result from changes in which taxa are present or absent by using the Jaccard or unweight...
Gespeichert in:
Veröffentlicht in: | Bioinformatics 2022-08, Vol.38 (15), p.3689-3697 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Abstract
Motivation
PERMANOVA is currently the most commonly used method for testing community-level hypotheses about microbiome associations with covariates of interest. PERMANOVA can test for associations that result from changes in which taxa are present or absent by using the Jaccard or unweighted UniFrac distance. However, such presence–absence analyses face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias but at the potential costs of information loss and the introduction of a stochastic component into the analysis.
Results
Here, we develop a non-stochastic approach to PERMANOVA presence–absence analyses that aggregates information over all potential rarefaction replicates without actual resampling, when the Jaccard or unweighted UniFrac distance is used. We compare this new approach to three possible ways of aggregating PERMANOVA over multiple rarefactions obtained from resampling: averaging the distance matrix, averaging the (element-wise) squared distance matrix and averaging the F-statistic. Our simulations indicate that our non-stochastic approach is robust to confounding by library size and outperforms each of the stochastic resampling approaches. We also show that, when overdispersion is low, averaging the (element-wise) squared distance outperforms averaging the unsquared distance, currently implemented in the R package vegan. We illustrate our methods using an analysis of data on inflammatory bowel disease in which samples from case participants have systematically smaller library sizes than samples from control participants.
Availability and implementation
We have implemented all the approaches described above, including the function for calculating the analytical average of the squared or unsquared distance matrix, in our R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM.
Supplementary information
Supplementary data are available at Bioinformatics online. |
---|---|
ISSN: | 1367-4803 1460-2059 1367-4811 |
DOI: | 10.1093/bioinformatics/btac399 |