texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery
Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most re...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Algorithms for causal discovery have recently undergone rapid advances and
increasingly draw on flexible nonparametric methods to process complex data.
With these advances comes a need for adequate empirical validation of the
causal relationships learned by different algorithms. However, for most real
data sources true causal relations remain unknown. This issue is further
compounded by privacy concerns surrounding the release of suitable high-quality
data. To help address these challenges, we gather a complex dataset comprising
measurements from an assembly line in a manufacturing context. This line
consists of numerous physical processes for which we are able to provide ground
truth causal relationships on the basis of a detailed study of the underlying
physics. We use the assembly line data and associated ground truth information
to build a system for generation of semisynthetic manufacturing data that
supports benchmarking of causal discovery methods. To accomplish this, we
employ distributional random forests in order to flexibly estimate and
represent conditional distributions that may be combined into joint
distributions that strictly adhere to a causal model over the observed
variables. The estimated conditionals and tools for data generation are made
available in our Python library $\texttt{causalAssembly}$. Using the library,
we showcase how to benchmark several well-known causal discovery algorithms. |
---|---|
DOI: | 10.48550/arxiv.2306.10816 |