Variable selection and inference strategies for multiple compositional regression

An important problem in compositional data analysis is variable selection in linear regression models with compositional covariates. In the context of microbiome data analysis, there is a demand for considering grouping information such as structures among taxa and multiple sampling sites, resulting...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Chemometrics and intelligent laboratory systems 2024-05, Vol.248, p.105121, Article 105121
Hauptverfasser: Lee, Sujin, Jung, Sungkyu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:An important problem in compositional data analysis is variable selection in linear regression models with compositional covariates. In the context of microbiome data analysis, there is a demand for considering grouping information such as structures among taxa and multiple sampling sites, resulting in multiple compositional covariates. We develop and compare two different methods of variable selection and inference strategies, based on the debiased lasso and a resampling-based approach. Confidence intervals for individual regression coefficients, obtained from each of the two methods, are shown to be asymptotically valid even in a high-dimension, low-sample-size regime. However, microbiome data often have extremely small sample sizes, rendering asymptotic results unreliable. Through extensive numerical comparisons of the finite-sample performances of the two methods, we find that resampling-based approaches outperform the debiased compositional lasso in cases of extremely small sample sizes, showing higher positive predictive values. Conversely, for larger sample sizes, debiasing yields better results. We apply the proposed multiple compositional regression to steer microbiome data, identifying key bacterial taxa associated with important cattle quality measures. •Resampling-based inference tools for multiple compositional regressions developed•Valid CIs obtained using debiased lasso and resampling•Performances of statistical inference compared in the case of low sample size
ISSN:0169-7439
1873-3239
DOI:10.1016/j.chemolab.2024.105121