Global multivariate model learning from hierarchically correlated data
Inverse statistical physics aims at inferring models compatible with a set of empirical averages estimated from a high-dimensional dataset of independently distributed equilibrium configurations of a given system. However, in several applications such as biology, data result from stochastic evolutio...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Inverse statistical physics aims at inferring models compatible with a set of
empirical averages estimated from a high-dimensional dataset of independently
distributed equilibrium configurations of a given system. However, in several
applications such as biology, data result from stochastic evolutionary
processes, and configurations are related through a hierarchical structure,
typically represented by a tree, and therefore not independent. In turn,
empirical averages of observables superpose intrinsic signal related to the
equilibrium distribution of the studied system and spurious historical (or
phylogenetic) signal resulting from the structure underlying the
data-generating process. The naive application of inverse statistical physics
techniques therefore leads to systematic biases and an effective reduction of
the sample size. To advance on the currently open task of extracting intrinsic
signals from correlated data, we study a system described by a multivariate
Ornstein-Uhlenbeck process defined on a finite tree. Using a Bayesian
framework, we can disentangle covariances in the data corresponding to their
multivariate Gaussian equilibrium distribution from those resulting from the
historical correlations. Our approach leads to a clear gain in accuracy in the
inferred equilibrium distribution, which corresponds to an effective two- to
fourfold increase in sample size. |
---|---|
DOI: | 10.48550/arxiv.2102.06036 |