R$^{2}$s for Correlated Data: Phylogenetic Models, LMMs, and GLMMs
Abstract Many researchers want to report an $R^{2}$ to measure the variance explained by a model. When the model includes correlation among data, such as phylogenetic models and mixed models, defining an $R^{2}$ faces two conceptual problems. (i) It is unclear how to measure the variance explained b...
Gespeichert in:
Veröffentlicht in: | Systematic biology 2019-03, Vol.68 (2), p.234-251 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Abstract
Many researchers want to report an $R^{2}$ to measure the variance explained by a model. When the model includes correlation among data, such as phylogenetic models and mixed models, defining an $R^{2}$ faces two conceptual problems. (i) It is unclear how to measure the variance explained by predictor (independent) variables when the model contains covariances. (ii) Researchers may want the $R^{2}$ to include the variance explained by the covariances by asking questions such as “How much of the data is explained by phylogeny?” Here, I investigated three $R^{2}$s for phylogenetic and mixed models. $R^{2}_{resid}$ is an extension of the ordinary least-squares $R^{2}$ that weights residuals by variances and covariances estimated by the model; it is closely related to $R^{2}_{glmm}$ presented by Nakagawa and Schielzeth (2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol. Evol. 4:133–142). $R^{2}_{pred}$ is based on predicting each residual from the fitted model and computing the variance between observed and predicted values. $R^{2}_{lik}$ is based on the likelihood of fitted models, and therefore, reflects the amount of information that the models contain. These three $R^{2}$s are formulated as partial $R^{2}$s, making it possible to compare the contributions of predictor variables and variance components (phylogenetic signal and random effects) to the fit of models. Because partial $R^{2}$s compare a full model with a reduced model without components of the full model, they are distinct from marginal $R^{2}$s that partition additive components of the variance. I assessed the properties of the $R^{2}$s for phylogenetic models using simulations for continuous and binary response data (phylogenetic generalized least squares and phylogenetic logistic regression). Because the $R^{2}$s are designed broadly for any model for correlated data, I also compared $R^{2}$s for linear mixed models and generalized linear mixed models. $R^{2}_{resid}$, $R^{2}_{pred}$, and $R^{2}_{lik}$ all have similar performance in describing the variance explained by different components of models. However, $R^{2}_{pred}$ gives the most direct answer to the question of how much variance in the data is explained by a model. $R^{2}_{resid}$ is most appropriate for comparing models fit to different data sets, because it does not depend on sample sizes. And $R^{2}_{lik}$ is most appropriate to assess the importance of differe |
---|---|
ISSN: | 1063-5157 1076-836X |
DOI: | 10.1093/sysbio/syy060 |