Causal Decision Trees

Uncovering causal relationships in data is a major objective of data analytics. Currently, there is a need for scalable and automated methods for causal relationship exploration in data. Classification methods are fast and they could be practical substitutes for finding causal signals in data. Howev...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2017-02, Vol.29 (2), p.257-271
Hauptverfasser: Li, Jiuyong, Ma, Saisai, Le, Thuc, Liu, Lin, Liu, Jixue
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 271
container_issue 2
container_start_page 257
container_title IEEE transactions on knowledge and data engineering
container_volume 29
creator Li, Jiuyong
Ma, Saisai
Le, Thuc
Liu, Lin
Liu, Jixue
description Uncovering causal relationships in data is a major objective of data analytics. Currently, there is a need for scalable and automated methods for causal relationship exploration in data. Classification methods are fast and they could be practical substitutes for finding causal signals in data. However, classification methods are not designed for causal discovery and a classification method may find false causal signals and miss the true ones. In this paper, we develop a causal decision tree (CDT) where nodes have causal interpretations. Our method follows a well-established causal inference framework and makes use of a classic statistical test to establish the causal relationship between a predictor variable and the outcome variable. At the same time, by taking the advantages of normal decision trees, a CDT provides a compact graphical representation of the causal relationships, and the construction of a CDT is fast as a result of the divide and conquer strategy employed, making CDTs practical for representing and finding causal signals in large data sets. Experiment results demonstrate that CDTs can identify meaningful causal relationships and the CDT algorithm is scalable.
doi_str_mv 10.1109/TKDE.2016.2619350
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_7600471</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7600471</ieee_id><sourcerecordid>1858794767</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-7b871ae6511a54a9e2151a62b13cc71aa2b97bd84f73a332917015cdf68f3af43</originalsourceid><addsrcrecordid>eNo9jz1PwzAQhi0EEqUwMiCWSswJPn-dPaK0fIhKLGG2HNeWUpWm2M3AvydRqk73Sve8d3oIeQBaAlDzXH8uVyWjoEqmwHBJL8gMpNQFAwOXQ6YCCsEFXpObnLeUUo0aZuS-cn12u8Uy-Da33X5RpxDyLbmKbpfD3WnOyffrqq7ei_XX20f1si485-pYYKMRXFASwEnhTGAgwSnWAPd-2DjWGGw2WkTkjnNmAClIv4lKR-6i4HPyNN09pO63D_lot12f9sNLC1pqNAIVDhRMlE9dzilEe0jtj0t_Fqgd7e1ob0d7e7IfOo9Tpw0hnHlUlAoE_g8oaFLN</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1858794767</pqid></control><display><type>article</type><title>Causal Decision Trees</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Jiuyong ; Ma, Saisai ; Le, Thuc ; Liu, Lin ; Liu, Jixue</creator><creatorcontrib>Li, Jiuyong ; Ma, Saisai ; Le, Thuc ; Liu, Lin ; Liu, Jixue</creatorcontrib><description>Uncovering causal relationships in data is a major objective of data analytics. Currently, there is a need for scalable and automated methods for causal relationship exploration in data. Classification methods are fast and they could be practical substitutes for finding causal signals in data. However, classification methods are not designed for causal discovery and a classification method may find false causal signals and miss the true ones. In this paper, we develop a causal decision tree (CDT) where nodes have causal interpretations. Our method follows a well-established causal inference framework and makes use of a classic statistical test to establish the causal relationship between a predictor variable and the outcome variable. At the same time, by taking the advantages of normal decision trees, a CDT provides a compact graphical representation of the causal relationships, and the construction of a CDT is fast as a result of the divide and conquer strategy employed, making CDTs practical for representing and finding causal signals in large data sets. Experiment results demonstrate that CDTs can identify meaningful causal relationships and the CDT algorithm is scalable.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2016.2619350</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Bayes methods ; causal relationship ; Classification ; Context ; Data analysis ; Decision tree ; Decision trees ; Graphical representations ; Knowledge engineering ; Mathematical model ; partial association ; potential outcome model ; Remuneration ; Statistical inference ; Statistical tests</subject><ispartof>IEEE transactions on knowledge and data engineering, 2017-02, Vol.29 (2), p.257-271</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-7b871ae6511a54a9e2151a62b13cc71aa2b97bd84f73a332917015cdf68f3af43</citedby><cites>FETCH-LOGICAL-c336t-7b871ae6511a54a9e2151a62b13cc71aa2b97bd84f73a332917015cdf68f3af43</cites><orcidid>0000-0002-9023-1878</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7600471$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7600471$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Li, Jiuyong</creatorcontrib><creatorcontrib>Ma, Saisai</creatorcontrib><creatorcontrib>Le, Thuc</creatorcontrib><creatorcontrib>Liu, Lin</creatorcontrib><creatorcontrib>Liu, Jixue</creatorcontrib><title>Causal Decision Trees</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>Uncovering causal relationships in data is a major objective of data analytics. Currently, there is a need for scalable and automated methods for causal relationship exploration in data. Classification methods are fast and they could be practical substitutes for finding causal signals in data. However, classification methods are not designed for causal discovery and a classification method may find false causal signals and miss the true ones. In this paper, we develop a causal decision tree (CDT) where nodes have causal interpretations. Our method follows a well-established causal inference framework and makes use of a classic statistical test to establish the causal relationship between a predictor variable and the outcome variable. At the same time, by taking the advantages of normal decision trees, a CDT provides a compact graphical representation of the causal relationships, and the construction of a CDT is fast as a result of the divide and conquer strategy employed, making CDTs practical for representing and finding causal signals in large data sets. Experiment results demonstrate that CDTs can identify meaningful causal relationships and the CDT algorithm is scalable.</description><subject>Algorithms</subject><subject>Bayes methods</subject><subject>causal relationship</subject><subject>Classification</subject><subject>Context</subject><subject>Data analysis</subject><subject>Decision tree</subject><subject>Decision trees</subject><subject>Graphical representations</subject><subject>Knowledge engineering</subject><subject>Mathematical model</subject><subject>partial association</subject><subject>potential outcome model</subject><subject>Remuneration</subject><subject>Statistical inference</subject><subject>Statistical tests</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9jz1PwzAQhi0EEqUwMiCWSswJPn-dPaK0fIhKLGG2HNeWUpWm2M3AvydRqk73Sve8d3oIeQBaAlDzXH8uVyWjoEqmwHBJL8gMpNQFAwOXQ6YCCsEFXpObnLeUUo0aZuS-cn12u8Uy-Da33X5RpxDyLbmKbpfD3WnOyffrqq7ei_XX20f1si485-pYYKMRXFASwEnhTGAgwSnWAPd-2DjWGGw2WkTkjnNmAClIv4lKR-6i4HPyNN09pO63D_lot12f9sNLC1pqNAIVDhRMlE9dzilEe0jtj0t_Fqgd7e1ob0d7e7IfOo9Tpw0hnHlUlAoE_g8oaFLN</recordid><startdate>20170201</startdate><enddate>20170201</enddate><creator>Li, Jiuyong</creator><creator>Ma, Saisai</creator><creator>Le, Thuc</creator><creator>Liu, Lin</creator><creator>Liu, Jixue</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-9023-1878</orcidid></search><sort><creationdate>20170201</creationdate><title>Causal Decision Trees</title><author>Li, Jiuyong ; Ma, Saisai ; Le, Thuc ; Liu, Lin ; Liu, Jixue</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-7b871ae6511a54a9e2151a62b13cc71aa2b97bd84f73a332917015cdf68f3af43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Bayes methods</topic><topic>causal relationship</topic><topic>Classification</topic><topic>Context</topic><topic>Data analysis</topic><topic>Decision tree</topic><topic>Decision trees</topic><topic>Graphical representations</topic><topic>Knowledge engineering</topic><topic>Mathematical model</topic><topic>partial association</topic><topic>potential outcome model</topic><topic>Remuneration</topic><topic>Statistical inference</topic><topic>Statistical tests</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Jiuyong</creatorcontrib><creatorcontrib>Ma, Saisai</creatorcontrib><creatorcontrib>Le, Thuc</creatorcontrib><creatorcontrib>Liu, Lin</creatorcontrib><creatorcontrib>Liu, Jixue</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Jiuyong</au><au>Ma, Saisai</au><au>Le, Thuc</au><au>Liu, Lin</au><au>Liu, Jixue</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Causal Decision Trees</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2017-02-01</date><risdate>2017</risdate><volume>29</volume><issue>2</issue><spage>257</spage><epage>271</epage><pages>257-271</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>Uncovering causal relationships in data is a major objective of data analytics. Currently, there is a need for scalable and automated methods for causal relationship exploration in data. Classification methods are fast and they could be practical substitutes for finding causal signals in data. However, classification methods are not designed for causal discovery and a classification method may find false causal signals and miss the true ones. In this paper, we develop a causal decision tree (CDT) where nodes have causal interpretations. Our method follows a well-established causal inference framework and makes use of a classic statistical test to establish the causal relationship between a predictor variable and the outcome variable. At the same time, by taking the advantages of normal decision trees, a CDT provides a compact graphical representation of the causal relationships, and the construction of a CDT is fast as a result of the divide and conquer strategy employed, making CDTs practical for representing and finding causal signals in large data sets. Experiment results demonstrate that CDTs can identify meaningful causal relationships and the CDT algorithm is scalable.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2016.2619350</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-9023-1878</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1041-4347
ispartof IEEE transactions on knowledge and data engineering, 2017-02, Vol.29 (2), p.257-271
issn 1041-4347
1558-2191
language eng
recordid cdi_ieee_primary_7600471
source IEEE Electronic Library (IEL)
subjects Algorithms
Bayes methods
causal relationship
Classification
Context
Data analysis
Decision tree
Decision trees
Graphical representations
Knowledge engineering
Mathematical model
partial association
potential outcome model
Remuneration
Statistical inference
Statistical tests
title Causal Decision Trees
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-18T01%3A39%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Causal%20Decision%20Trees&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Li,%20Jiuyong&rft.date=2017-02-01&rft.volume=29&rft.issue=2&rft.spage=257&rft.epage=271&rft.pages=257-271&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2016.2619350&rft_dat=%3Cproquest_RIE%3E1858794767%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1858794767&rft_id=info:pmid/&rft_ieee_id=7600471&rfr_iscdi=true