What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?

We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:SN computer science 2022-11, Vol.3 (6), p.421, Article 421
Hauptverfasser: F. Ribeiro, Andre, Neffke, Frank, Hausmann, Ricardo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 6
container_start_page 421
container_title SN computer science
container_volume 3
creator F. Ribeiro, Andre
Neffke, Frank
Hausmann, Ricardo
description We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide more accurate estimates of causal effects in nonexperimental data, provided a statistical model relating combinatorial properties of treatments to the accuracy and unbiasedness of their effects. The article introduces one such model and a Bayesian approach to combine the O ( n 2 ) pairwise observations typically available in nonexperimental data. This also leads to an interpretation of nonexperimental datasets as incomplete, or noisy, versions of ideal factorial experimental designs. This approach to causal effect estimation has several advantages: (1) it expands the number of observations, converting thousands of individuals into millions of observational treatments; (2) starting with treatments closest to the experimental ideal, it identifies noncausal variables that can be ignored in the future, making estimation easier in each subsequent iteration while departing minimally from experiment-like conditions; (3) it recovers individual causal effects in heterogeneous populations. We evaluate the method in simulations and the National Supported Work (NSW) program, an intensively studied program whose effects are known from randomized field experiments. We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample with the entire national program data, outperforming Statistical, Econometrics and Machine Learning estimators in all cases. As a tool, the approach also allows researchers to represent and visualize possible causes, and heterogeneous subpopulations, in their samples.
doi_str_mv 10.1007/s42979-022-01319-2
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2933783643</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2933783643</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1852-308b82d2598ee1204d38944ddaa74061cf90fb43cd3a6584baa4b1a9db7bf8593</originalsourceid><addsrcrecordid>eNp9kM1Lw0AQxRdRsGj_AU8LnqOzH0l2T1LqJ1SFUulxmSQbm5Lu1t1U9L83NYKePM1jeO8N8yPkjMEFA8gvo-Q61wlwngATTCf8gIx4lrFEacgP_-hjMo5xDQA8BSmzdESWyxV2dIqOditLH5u2bbyL1Nd0jq7yG7oIFruNdV2kjaNP3tmPrQ3NfoMtvcYO6dy-215PCr_bV-2ijVen5KjGNtrxzzwhL7c3i-l9Mnu-e5hOZknJVMoTAapQvOKpVtYyDrISSktZVYi5hIyVtYa6kKKsBGapkgWiLBjqqsiLWqVanJDzoXcb_NvOxs6s_S64_qThWohciUyK3sUHVxl8jMHWZtu_gOHTMDB7hmZgaHqG5puh4X1IDKHYm92rDb_V_6S-AIrUczY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2933783643</pqid></control><display><type>article</type><title>What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?</title><source>Springer Online Journals Complete</source><source>ProQuest Central UK/Ireland</source><source>ProQuest Central</source><creator>F. Ribeiro, Andre ; Neffke, Frank ; Hausmann, Ricardo</creator><creatorcontrib>F. Ribeiro, Andre ; Neffke, Frank ; Hausmann, Ricardo</creatorcontrib><description>We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide more accurate estimates of causal effects in nonexperimental data, provided a statistical model relating combinatorial properties of treatments to the accuracy and unbiasedness of their effects. The article introduces one such model and a Bayesian approach to combine the O ( n 2 ) pairwise observations typically available in nonexperimental data. This also leads to an interpretation of nonexperimental datasets as incomplete, or noisy, versions of ideal factorial experimental designs. This approach to causal effect estimation has several advantages: (1) it expands the number of observations, converting thousands of individuals into millions of observational treatments; (2) starting with treatments closest to the experimental ideal, it identifies noncausal variables that can be ignored in the future, making estimation easier in each subsequent iteration while departing minimally from experiment-like conditions; (3) it recovers individual causal effects in heterogeneous populations. We evaluate the method in simulations and the National Supported Work (NSW) program, an intensively studied program whose effects are known from randomized field experiments. We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample with the entire national program data, outperforming Statistical, Econometrics and Machine Learning estimators in all cases. As a tool, the approach also allows researchers to represent and visualize possible causes, and heterogeneous subpopulations, in their samples.</description><identifier>ISSN: 2661-8907</identifier><identifier>ISSN: 2662-995X</identifier><identifier>EISSN: 2661-8907</identifier><identifier>DOI: 10.1007/s42979-022-01319-2</identifier><language>eng</language><publisher>Singapore: Springer Nature Singapore</publisher><subject>Bayesian analysis ; Combinatorial analysis ; Computer Imaging ; Computer Science ; Computer Systems Organization and Communication Networks ; Data Structures and Information Theory ; Datasets ; Econometrics ; Hypothesis testing ; Information Systems and Communication Service ; Iterative methods ; Machine learning ; Original Research ; Pattern Recognition and Graphics ; Samples ; Signal processing ; Software Engineering/Programming and Operating Systems ; Standard deviation ; Statistical models ; Variables ; Vision</subject><ispartof>SN computer science, 2022-11, Vol.3 (6), p.421, Article 421</ispartof><rights>The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1852-308b82d2598ee1204d38944ddaa74061cf90fb43cd3a6584baa4b1a9db7bf8593</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s42979-022-01319-2$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2933783643?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,21388,27924,27925,33744,41488,42557,43805,51319,64385,64389,72469</link.rule.ids></links><search><creatorcontrib>F. Ribeiro, Andre</creatorcontrib><creatorcontrib>Neffke, Frank</creatorcontrib><creatorcontrib>Hausmann, Ricardo</creatorcontrib><title>What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?</title><title>SN computer science</title><addtitle>SN COMPUT. SCI</addtitle><description>We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide more accurate estimates of causal effects in nonexperimental data, provided a statistical model relating combinatorial properties of treatments to the accuracy and unbiasedness of their effects. The article introduces one such model and a Bayesian approach to combine the O ( n 2 ) pairwise observations typically available in nonexperimental data. This also leads to an interpretation of nonexperimental datasets as incomplete, or noisy, versions of ideal factorial experimental designs. This approach to causal effect estimation has several advantages: (1) it expands the number of observations, converting thousands of individuals into millions of observational treatments; (2) starting with treatments closest to the experimental ideal, it identifies noncausal variables that can be ignored in the future, making estimation easier in each subsequent iteration while departing minimally from experiment-like conditions; (3) it recovers individual causal effects in heterogeneous populations. We evaluate the method in simulations and the National Supported Work (NSW) program, an intensively studied program whose effects are known from randomized field experiments. We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample with the entire national program data, outperforming Statistical, Econometrics and Machine Learning estimators in all cases. As a tool, the approach also allows researchers to represent and visualize possible causes, and heterogeneous subpopulations, in their samples.</description><subject>Bayesian analysis</subject><subject>Combinatorial analysis</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Computer Systems Organization and Communication Networks</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Econometrics</subject><subject>Hypothesis testing</subject><subject>Information Systems and Communication Service</subject><subject>Iterative methods</subject><subject>Machine learning</subject><subject>Original Research</subject><subject>Pattern Recognition and Graphics</subject><subject>Samples</subject><subject>Signal processing</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Standard deviation</subject><subject>Statistical models</subject><subject>Variables</subject><subject>Vision</subject><issn>2661-8907</issn><issn>2662-995X</issn><issn>2661-8907</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kM1Lw0AQxRdRsGj_AU8LnqOzH0l2T1LqJ1SFUulxmSQbm5Lu1t1U9L83NYKePM1jeO8N8yPkjMEFA8gvo-Q61wlwngATTCf8gIx4lrFEacgP_-hjMo5xDQA8BSmzdESWyxV2dIqOditLH5u2bbyL1Nd0jq7yG7oIFruNdV2kjaNP3tmPrQ3NfoMtvcYO6dy-215PCr_bV-2ijVen5KjGNtrxzzwhL7c3i-l9Mnu-e5hOZknJVMoTAapQvOKpVtYyDrISSktZVYi5hIyVtYa6kKKsBGapkgWiLBjqqsiLWqVanJDzoXcb_NvOxs6s_S64_qThWohciUyK3sUHVxl8jMHWZtu_gOHTMDB7hmZgaHqG5puh4X1IDKHYm92rDb_V_6S-AIrUczY</recordid><startdate>20221101</startdate><enddate>20221101</enddate><creator>F. Ribeiro, Andre</creator><creator>Neffke, Frank</creator><creator>Hausmann, Ricardo</creator><general>Springer Nature Singapore</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20221101</creationdate><title>What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?</title><author>F. Ribeiro, Andre ; Neffke, Frank ; Hausmann, Ricardo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1852-308b82d2598ee1204d38944ddaa74061cf90fb43cd3a6584baa4b1a9db7bf8593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Bayesian analysis</topic><topic>Combinatorial analysis</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Computer Systems Organization and Communication Networks</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Econometrics</topic><topic>Hypothesis testing</topic><topic>Information Systems and Communication Service</topic><topic>Iterative methods</topic><topic>Machine learning</topic><topic>Original Research</topic><topic>Pattern Recognition and Graphics</topic><topic>Samples</topic><topic>Signal processing</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Standard deviation</topic><topic>Statistical models</topic><topic>Variables</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>F. Ribeiro, Andre</creatorcontrib><creatorcontrib>Neffke, Frank</creatorcontrib><creatorcontrib>Hausmann, Ricardo</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>SN computer science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>F. Ribeiro, Andre</au><au>Neffke, Frank</au><au>Hausmann, Ricardo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?</atitle><jtitle>SN computer science</jtitle><stitle>SN COMPUT. SCI</stitle><date>2022-11-01</date><risdate>2022</risdate><volume>3</volume><issue>6</issue><spage>421</spage><pages>421-</pages><artnum>421</artnum><issn>2661-8907</issn><issn>2662-995X</issn><eissn>2661-8907</eissn><abstract>We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide more accurate estimates of causal effects in nonexperimental data, provided a statistical model relating combinatorial properties of treatments to the accuracy and unbiasedness of their effects. The article introduces one such model and a Bayesian approach to combine the O ( n 2 ) pairwise observations typically available in nonexperimental data. This also leads to an interpretation of nonexperimental datasets as incomplete, or noisy, versions of ideal factorial experimental designs. This approach to causal effect estimation has several advantages: (1) it expands the number of observations, converting thousands of individuals into millions of observational treatments; (2) starting with treatments closest to the experimental ideal, it identifies noncausal variables that can be ignored in the future, making estimation easier in each subsequent iteration while departing minimally from experiment-like conditions; (3) it recovers individual causal effects in heterogeneous populations. We evaluate the method in simulations and the National Supported Work (NSW) program, an intensively studied program whose effects are known from randomized field experiments. We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample with the entire national program data, outperforming Statistical, Econometrics and Machine Learning estimators in all cases. As a tool, the approach also allows researchers to represent and visualize possible causes, and heterogeneous subpopulations, in their samples.</abstract><cop>Singapore</cop><pub>Springer Nature Singapore</pub><doi>10.1007/s42979-022-01319-2</doi></addata></record>
fulltext fulltext
identifier ISSN: 2661-8907
ispartof SN computer science, 2022-11, Vol.3 (6), p.421, Article 421
issn 2661-8907
2662-995X
2661-8907
language eng
recordid cdi_proquest_journals_2933783643
source Springer Online Journals Complete; ProQuest Central UK/Ireland; ProQuest Central
subjects Bayesian analysis
Combinatorial analysis
Computer Imaging
Computer Science
Computer Systems Organization and Communication Networks
Data Structures and Information Theory
Datasets
Econometrics
Hypothesis testing
Information Systems and Communication Service
Iterative methods
Machine learning
Original Research
Pattern Recognition and Graphics
Samples
Signal processing
Software Engineering/Programming and Operating Systems
Standard deviation
Statistical models
Variables
Vision
title What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T16%3A45%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=What%20Can%20the%20Millions%20of%20Random%20Treatments%20in%20Nonexperimental%20Data%20Reveal%20About%20Causes?&rft.jtitle=SN%20computer%20science&rft.au=F.%20Ribeiro,%20Andre&rft.date=2022-11-01&rft.volume=3&rft.issue=6&rft.spage=421&rft.pages=421-&rft.artnum=421&rft.issn=2661-8907&rft.eissn=2661-8907&rft_id=info:doi/10.1007/s42979-022-01319-2&rft_dat=%3Cproquest_cross%3E2933783643%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2933783643&rft_id=info:pmid/&rfr_iscdi=true