texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery

Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most re...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Göbler, Konstantin, Windisch, Tobias, Drton, Mathias, Pychynski, Tim, Sonntag, Steffen, Roth, Martin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Göbler, Konstantin
Windisch, Tobias
Drton, Mathias
Pychynski, Tim
Sonntag, Steffen
Roth, Martin
description Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To help address these challenges, we gather a complex dataset comprising measurements from an assembly line in a manufacturing context. This line consists of numerous physical processes for which we are able to provide ground truth causal relationships on the basis of a detailed study of the underlying physics. We use the assembly line data and associated ground truth information to build a system for generation of semisynthetic manufacturing data that supports benchmarking of causal discovery methods. To accomplish this, we employ distributional random forests in order to flexibly estimate and represent conditional distributions that may be combined into joint distributions that strictly adhere to a causal model over the observed variables. The estimated conditionals and tools for data generation are made available in our Python library $\texttt{causalAssembly}$. Using the library, we showcase how to benchmark several well-known causal discovery algorithms.
doi_str_mv 10.48550/arxiv.2306.10816
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_10816</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_10816</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-9bd2a0d2434fc264bd868f79a8b5dedaeb8152c2a72360eca3c97b454003258f3</originalsourceid><addsrcrecordid>eNotj7tOwzAYRr0woMIDMOGBNcHxLQ5bSaFFqkRVdY9-XwIWaYJst2qEeHdoYPqWT0fnIHRTkJwrIcg9hJM_5pQRmRdEFfISNcmdUkpfBg4RunmMbq-78fvuAS9d7wIk37_hrYPOx-QN3oTBHkzyQ48XkAC3Q8CPrjfvewgf52s9cfDCRzMcXRiv0EULXXTX_ztDu-enXb3K1q_Ll3q-zkCWMqu0pUAs5Yy3hkqurZKqLStQWlhnwWlVCGoolJRJ4gwwU5WaC04Io0K1bIZu_7BTYfMZ_K_Q2JxLm6mU_QBo8lAr</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery</title><source>arXiv.org</source><creator>Göbler, Konstantin ; Windisch, Tobias ; Drton, Mathias ; Pychynski, Tim ; Sonntag, Steffen ; Roth, Martin</creator><creatorcontrib>Göbler, Konstantin ; Windisch, Tobias ; Drton, Mathias ; Pychynski, Tim ; Sonntag, Steffen ; Roth, Martin</creatorcontrib><description>Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To help address these challenges, we gather a complex dataset comprising measurements from an assembly line in a manufacturing context. This line consists of numerous physical processes for which we are able to provide ground truth causal relationships on the basis of a detailed study of the underlying physics. We use the assembly line data and associated ground truth information to build a system for generation of semisynthetic manufacturing data that supports benchmarking of causal discovery methods. To accomplish this, we employ distributional random forests in order to flexibly estimate and represent conditional distributions that may be combined into joint distributions that strictly adhere to a causal model over the observed variables. The estimated conditionals and tools for data generation are made available in our Python library $\texttt{causalAssembly}$. Using the library, we showcase how to benchmark several well-known causal discovery algorithms.</description><identifier>DOI: 10.48550/arxiv.2306.10816</identifier><language>eng</language><subject>Computer Science - Learning ; Statistics - Machine Learning ; Statistics - Methodology</subject><creationdate>2023-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.10816$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.10816$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Göbler, Konstantin</creatorcontrib><creatorcontrib>Windisch, Tobias</creatorcontrib><creatorcontrib>Drton, Mathias</creatorcontrib><creatorcontrib>Pychynski, Tim</creatorcontrib><creatorcontrib>Sonntag, Steffen</creatorcontrib><creatorcontrib>Roth, Martin</creatorcontrib><title>texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery</title><description>Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To help address these challenges, we gather a complex dataset comprising measurements from an assembly line in a manufacturing context. This line consists of numerous physical processes for which we are able to provide ground truth causal relationships on the basis of a detailed study of the underlying physics. We use the assembly line data and associated ground truth information to build a system for generation of semisynthetic manufacturing data that supports benchmarking of causal discovery methods. To accomplish this, we employ distributional random forests in order to flexibly estimate and represent conditional distributions that may be combined into joint distributions that strictly adhere to a causal model over the observed variables. The estimated conditionals and tools for data generation are made available in our Python library $\texttt{causalAssembly}$. Using the library, we showcase how to benchmark several well-known causal discovery algorithms.</description><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><subject>Statistics - Methodology</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7tOwzAYRr0woMIDMOGBNcHxLQ5bSaFFqkRVdY9-XwIWaYJst2qEeHdoYPqWT0fnIHRTkJwrIcg9hJM_5pQRmRdEFfISNcmdUkpfBg4RunmMbq-78fvuAS9d7wIk37_hrYPOx-QN3oTBHkzyQ48XkAC3Q8CPrjfvewgf52s9cfDCRzMcXRiv0EULXXTX_ztDu-enXb3K1q_Ll3q-zkCWMqu0pUAs5Yy3hkqurZKqLStQWlhnwWlVCGoolJRJ4gwwU5WaC04Io0K1bIZu_7BTYfMZ_K_Q2JxLm6mU_QBo8lAr</recordid><startdate>20230619</startdate><enddate>20230619</enddate><creator>Göbler, Konstantin</creator><creator>Windisch, Tobias</creator><creator>Drton, Mathias</creator><creator>Pychynski, Tim</creator><creator>Sonntag, Steffen</creator><creator>Roth, Martin</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20230619</creationdate><title>texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery</title><author>Göbler, Konstantin ; Windisch, Tobias ; Drton, Mathias ; Pychynski, Tim ; Sonntag, Steffen ; Roth, Martin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-9bd2a0d2434fc264bd868f79a8b5dedaeb8152c2a72360eca3c97b454003258f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><topic>Statistics - Methodology</topic><toplevel>online_resources</toplevel><creatorcontrib>Göbler, Konstantin</creatorcontrib><creatorcontrib>Windisch, Tobias</creatorcontrib><creatorcontrib>Drton, Mathias</creatorcontrib><creatorcontrib>Pychynski, Tim</creatorcontrib><creatorcontrib>Sonntag, Steffen</creatorcontrib><creatorcontrib>Roth, Martin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Göbler, Konstantin</au><au>Windisch, Tobias</au><au>Drton, Mathias</au><au>Pychynski, Tim</au><au>Sonntag, Steffen</au><au>Roth, Martin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery</atitle><date>2023-06-19</date><risdate>2023</risdate><abstract>Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To help address these challenges, we gather a complex dataset comprising measurements from an assembly line in a manufacturing context. This line consists of numerous physical processes for which we are able to provide ground truth causal relationships on the basis of a detailed study of the underlying physics. We use the assembly line data and associated ground truth information to build a system for generation of semisynthetic manufacturing data that supports benchmarking of causal discovery methods. To accomplish this, we employ distributional random forests in order to flexibly estimate and represent conditional distributions that may be combined into joint distributions that strictly adhere to a causal model over the observed variables. The estimated conditionals and tools for data generation are made available in our Python library $\texttt{causalAssembly}$. Using the library, we showcase how to benchmark several well-known causal discovery algorithms.</abstract><doi>10.48550/arxiv.2306.10816</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2306.10816
ispartof
issn
language eng
recordid cdi_arxiv_primary_2306_10816
source arXiv.org
subjects Computer Science - Learning
Statistics - Machine Learning
Statistics - Methodology
title texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T08%3A34%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=texttt%7BcausalAssembly%7D$:%20Generating%20Realistic%20Production%20Data%20for%20Benchmarking%20Causal%20Discovery&rft.au=G%C3%B6bler,%20Konstantin&rft.date=2023-06-19&rft_id=info:doi/10.48550/arxiv.2306.10816&rft_dat=%3Carxiv_GOX%3E2306_10816%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true