Causal discovery with Bayesian networks
One of the most widely used tools for causal discovery is based on causal models represented by the framework of Bayesian network. In the most challenging cases of causal discovery the underlying BN structure is not known and must be computed in a way that it takes into account the uncertainty that...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dissertation |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | One of the most widely used tools for causal discovery is based on causal models represented by the framework of Bayesian network. In the most challenging cases of causal discovery the underlying BN structure is not known and must be computed in a way that it takes into account the uncertainty that exist when trying to predict the underlying structure. The structure uncertainty can then be transformed into an uncertainty regarding a causal relationship between variables reflecting the strength of how likely a causal relationship is given data assumed to come from the underlying causal model. There are different methods account for such uncertainty. We will focus on Bayesian model averaging over structures implemented trough Markov Chain Monte Carlo(MCMC) and a state-the-art dynamic programming algorithm.The general way of expressing parameters for a causal model is through the use of conditional probability tables CPTs. It has been demonstrated that more expressive models that account for additional structures in each CPT may lead to improved predication over traditional causal models. We will represent the regularities within CPTs through more refined independency relations, defined according to the concept of context-specific independence(CSI), in the form of CSI-trees which are learned with a greedy algorithm. To identify plausible models, we use a score-equivalent Bayesian score. An optimal combination of these models will be found with the help of Bayesian model averaging in order to find the posterior distribution over the causal target of interest. These methodologies where tested on synthetic data generated from known benchmark Bayesian networks. A comparison between CPTs and CSI-trees with the help of AUC show that no significant improvement was made on the tested networks. However for some data sizes some improvement could be seen. One reason might be that no exact CSI-tree representation of the conditional distribution exist for these networks,since the true distributions are defined through CPD tables. Another reason might be that it was necessary to regulate the model fit with a model structure prior to avoid overfitting in the learning process. The prior used in this work might have been suboptimal. A comparison between MCMC and state-the-art dynamic programming algorithm shows that the result under AUC are similar,however the convergence of the MCMC over structure for some networks tested is slow. |
---|