What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?
We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide m...
Gespeichert in:
Veröffentlicht in: | SN computer science 2022-11, Vol.3 (6), p.421, Article 421 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 6 |
container_start_page | 421 |
container_title | SN computer science |
container_volume | 3 |
creator | F. Ribeiro, Andre Neffke, Frank Hausmann, Ricardo |
description | We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide more accurate estimates of causal effects in nonexperimental data, provided a statistical model relating combinatorial properties of treatments to the accuracy and unbiasedness of their effects. The article introduces one such model and a Bayesian approach to combine the
O
(
n
2
)
pairwise observations typically available in nonexperimental data. This also leads to an interpretation of nonexperimental datasets as incomplete, or noisy, versions of ideal factorial experimental designs. This approach to causal effect estimation has several advantages: (1) it expands the number of observations, converting thousands of individuals into millions of observational treatments; (2) starting with treatments closest to the experimental ideal, it identifies noncausal variables that can be ignored in the future, making estimation easier in each subsequent iteration while departing minimally from experiment-like conditions; (3) it recovers individual causal effects in heterogeneous populations. We evaluate the method in simulations and the National Supported Work (NSW) program, an intensively studied program whose effects are known from randomized field experiments. We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample with the entire national program data, outperforming Statistical, Econometrics and Machine Learning estimators in all cases. As a tool, the approach also allows researchers to represent and visualize possible causes, and heterogeneous subpopulations, in their samples. |
doi_str_mv | 10.1007/s42979-022-01319-2 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2933783643</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2933783643</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1852-308b82d2598ee1204d38944ddaa74061cf90fb43cd3a6584baa4b1a9db7bf8593</originalsourceid><addsrcrecordid>eNp9kM1Lw0AQxRdRsGj_AU8LnqOzH0l2T1LqJ1SFUulxmSQbm5Lu1t1U9L83NYKePM1jeO8N8yPkjMEFA8gvo-Q61wlwngATTCf8gIx4lrFEacgP_-hjMo5xDQA8BSmzdESWyxV2dIqOditLH5u2bbyL1Nd0jq7yG7oIFruNdV2kjaNP3tmPrQ3NfoMtvcYO6dy-215PCr_bV-2ijVen5KjGNtrxzzwhL7c3i-l9Mnu-e5hOZknJVMoTAapQvOKpVtYyDrISSktZVYi5hIyVtYa6kKKsBGapkgWiLBjqqsiLWqVanJDzoXcb_NvOxs6s_S64_qThWohciUyK3sUHVxl8jMHWZtu_gOHTMDB7hmZgaHqG5puh4X1IDKHYm92rDb_V_6S-AIrUczY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2933783643</pqid></control><display><type>article</type><title>What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?</title><source>Springer Online Journals Complete</source><source>ProQuest Central UK/Ireland</source><source>ProQuest Central</source><creator>F. Ribeiro, Andre ; Neffke, Frank ; Hausmann, Ricardo</creator><creatorcontrib>F. Ribeiro, Andre ; Neffke, Frank ; Hausmann, Ricardo</creatorcontrib><description>We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide more accurate estimates of causal effects in nonexperimental data, provided a statistical model relating combinatorial properties of treatments to the accuracy and unbiasedness of their effects. The article introduces one such model and a Bayesian approach to combine the
O
(
n
2
)
pairwise observations typically available in nonexperimental data. This also leads to an interpretation of nonexperimental datasets as incomplete, or noisy, versions of ideal factorial experimental designs. This approach to causal effect estimation has several advantages: (1) it expands the number of observations, converting thousands of individuals into millions of observational treatments; (2) starting with treatments closest to the experimental ideal, it identifies noncausal variables that can be ignored in the future, making estimation easier in each subsequent iteration while departing minimally from experiment-like conditions; (3) it recovers individual causal effects in heterogeneous populations. We evaluate the method in simulations and the National Supported Work (NSW) program, an intensively studied program whose effects are known from randomized field experiments. We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample with the entire national program data, outperforming Statistical, Econometrics and Machine Learning estimators in all cases. As a tool, the approach also allows researchers to represent and visualize possible causes, and heterogeneous subpopulations, in their samples.</description><identifier>ISSN: 2661-8907</identifier><identifier>ISSN: 2662-995X</identifier><identifier>EISSN: 2661-8907</identifier><identifier>DOI: 10.1007/s42979-022-01319-2</identifier><language>eng</language><publisher>Singapore: Springer Nature Singapore</publisher><subject>Bayesian analysis ; Combinatorial analysis ; Computer Imaging ; Computer Science ; Computer Systems Organization and Communication Networks ; Data Structures and Information Theory ; Datasets ; Econometrics ; Hypothesis testing ; Information Systems and Communication Service ; Iterative methods ; Machine learning ; Original Research ; Pattern Recognition and Graphics ; Samples ; Signal processing ; Software Engineering/Programming and Operating Systems ; Standard deviation ; Statistical models ; Variables ; Vision</subject><ispartof>SN computer science, 2022-11, Vol.3 (6), p.421, Article 421</ispartof><rights>The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1852-308b82d2598ee1204d38944ddaa74061cf90fb43cd3a6584baa4b1a9db7bf8593</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s42979-022-01319-2$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2933783643?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,21388,27924,27925,33744,41488,42557,43805,51319,64385,64389,72469</link.rule.ids></links><search><creatorcontrib>F. Ribeiro, Andre</creatorcontrib><creatorcontrib>Neffke, Frank</creatorcontrib><creatorcontrib>Hausmann, Ricardo</creatorcontrib><title>What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?</title><title>SN computer science</title><addtitle>SN COMPUT. SCI</addtitle><description>We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide more accurate estimates of causal effects in nonexperimental data, provided a statistical model relating combinatorial properties of treatments to the accuracy and unbiasedness of their effects. The article introduces one such model and a Bayesian approach to combine the
O
(
n
2
)
pairwise observations typically available in nonexperimental data. This also leads to an interpretation of nonexperimental datasets as incomplete, or noisy, versions of ideal factorial experimental designs. This approach to causal effect estimation has several advantages: (1) it expands the number of observations, converting thousands of individuals into millions of observational treatments; (2) starting with treatments closest to the experimental ideal, it identifies noncausal variables that can be ignored in the future, making estimation easier in each subsequent iteration while departing minimally from experiment-like conditions; (3) it recovers individual causal effects in heterogeneous populations. We evaluate the method in simulations and the National Supported Work (NSW) program, an intensively studied program whose effects are known from randomized field experiments. We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample with the entire national program data, outperforming Statistical, Econometrics and Machine Learning estimators in all cases. As a tool, the approach also allows researchers to represent and visualize possible causes, and heterogeneous subpopulations, in their samples.</description><subject>Bayesian analysis</subject><subject>Combinatorial analysis</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Computer Systems Organization and Communication Networks</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Econometrics</subject><subject>Hypothesis testing</subject><subject>Information Systems and Communication Service</subject><subject>Iterative methods</subject><subject>Machine learning</subject><subject>Original Research</subject><subject>Pattern Recognition and Graphics</subject><subject>Samples</subject><subject>Signal processing</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Standard deviation</subject><subject>Statistical models</subject><subject>Variables</subject><subject>Vision</subject><issn>2661-8907</issn><issn>2662-995X</issn><issn>2661-8907</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kM1Lw0AQxRdRsGj_AU8LnqOzH0l2T1LqJ1SFUulxmSQbm5Lu1t1U9L83NYKePM1jeO8N8yPkjMEFA8gvo-Q61wlwngATTCf8gIx4lrFEacgP_-hjMo5xDQA8BSmzdESWyxV2dIqOditLH5u2bbyL1Nd0jq7yG7oIFruNdV2kjaNP3tmPrQ3NfoMtvcYO6dy-215PCr_bV-2ijVen5KjGNtrxzzwhL7c3i-l9Mnu-e5hOZknJVMoTAapQvOKpVtYyDrISSktZVYi5hIyVtYa6kKKsBGapkgWiLBjqqsiLWqVanJDzoXcb_NvOxs6s_S64_qThWohciUyK3sUHVxl8jMHWZtu_gOHTMDB7hmZgaHqG5puh4X1IDKHYm92rDb_V_6S-AIrUczY</recordid><startdate>20221101</startdate><enddate>20221101</enddate><creator>F. Ribeiro, Andre</creator><creator>Neffke, Frank</creator><creator>Hausmann, Ricardo</creator><general>Springer Nature Singapore</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20221101</creationdate><title>What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?</title><author>F. Ribeiro, Andre ; Neffke, Frank ; Hausmann, Ricardo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1852-308b82d2598ee1204d38944ddaa74061cf90fb43cd3a6584baa4b1a9db7bf8593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Bayesian analysis</topic><topic>Combinatorial analysis</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Computer Systems Organization and Communication Networks</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Econometrics</topic><topic>Hypothesis testing</topic><topic>Information Systems and Communication Service</topic><topic>Iterative methods</topic><topic>Machine learning</topic><topic>Original Research</topic><topic>Pattern Recognition and Graphics</topic><topic>Samples</topic><topic>Signal processing</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Standard deviation</topic><topic>Statistical models</topic><topic>Variables</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>F. Ribeiro, Andre</creatorcontrib><creatorcontrib>Neffke, Frank</creatorcontrib><creatorcontrib>Hausmann, Ricardo</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>SN computer science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>F. Ribeiro, Andre</au><au>Neffke, Frank</au><au>Hausmann, Ricardo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?</atitle><jtitle>SN computer science</jtitle><stitle>SN COMPUT. SCI</stitle><date>2022-11-01</date><risdate>2022</risdate><volume>3</volume><issue>6</issue><spage>421</spage><pages>421-</pages><artnum>421</artnum><issn>2661-8907</issn><issn>2662-995X</issn><eissn>2661-8907</eissn><abstract>We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide more accurate estimates of causal effects in nonexperimental data, provided a statistical model relating combinatorial properties of treatments to the accuracy and unbiasedness of their effects. The article introduces one such model and a Bayesian approach to combine the
O
(
n
2
)
pairwise observations typically available in nonexperimental data. This also leads to an interpretation of nonexperimental datasets as incomplete, or noisy, versions of ideal factorial experimental designs. This approach to causal effect estimation has several advantages: (1) it expands the number of observations, converting thousands of individuals into millions of observational treatments; (2) starting with treatments closest to the experimental ideal, it identifies noncausal variables that can be ignored in the future, making estimation easier in each subsequent iteration while departing minimally from experiment-like conditions; (3) it recovers individual causal effects in heterogeneous populations. We evaluate the method in simulations and the National Supported Work (NSW) program, an intensively studied program whose effects are known from randomized field experiments. We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample with the entire national program data, outperforming Statistical, Econometrics and Machine Learning estimators in all cases. As a tool, the approach also allows researchers to represent and visualize possible causes, and heterogeneous subpopulations, in their samples.</abstract><cop>Singapore</cop><pub>Springer Nature Singapore</pub><doi>10.1007/s42979-022-01319-2</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2661-8907 |
ispartof | SN computer science, 2022-11, Vol.3 (6), p.421, Article 421 |
issn | 2661-8907 2662-995X 2661-8907 |
language | eng |
recordid | cdi_proquest_journals_2933783643 |
source | Springer Online Journals Complete; ProQuest Central UK/Ireland; ProQuest Central |
subjects | Bayesian analysis Combinatorial analysis Computer Imaging Computer Science Computer Systems Organization and Communication Networks Data Structures and Information Theory Datasets Econometrics Hypothesis testing Information Systems and Communication Service Iterative methods Machine learning Original Research Pattern Recognition and Graphics Samples Signal processing Software Engineering/Programming and Operating Systems Standard deviation Statistical models Variables Vision |
title | What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes? |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T16%3A45%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=What%20Can%20the%20Millions%20of%20Random%20Treatments%20in%20Nonexperimental%20Data%20Reveal%20About%20Causes?&rft.jtitle=SN%20computer%20science&rft.au=F.%20Ribeiro,%20Andre&rft.date=2022-11-01&rft.volume=3&rft.issue=6&rft.spage=421&rft.pages=421-&rft.artnum=421&rft.issn=2661-8907&rft.eissn=2661-8907&rft_id=info:doi/10.1007/s42979-022-01319-2&rft_dat=%3Cproquest_cross%3E2933783643%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2933783643&rft_id=info:pmid/&rfr_iscdi=true |