What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?

We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:SN computer science 2022-11, Vol.3 (6), p.421, Article 421
Hauptverfasser: F. Ribeiro, Andre, Neffke, Frank, Hausmann, Ricardo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide more accurate estimates of causal effects in nonexperimental data, provided a statistical model relating combinatorial properties of treatments to the accuracy and unbiasedness of their effects. The article introduces one such model and a Bayesian approach to combine the O ( n 2 ) pairwise observations typically available in nonexperimental data. This also leads to an interpretation of nonexperimental datasets as incomplete, or noisy, versions of ideal factorial experimental designs. This approach to causal effect estimation has several advantages: (1) it expands the number of observations, converting thousands of individuals into millions of observational treatments; (2) starting with treatments closest to the experimental ideal, it identifies noncausal variables that can be ignored in the future, making estimation easier in each subsequent iteration while departing minimally from experiment-like conditions; (3) it recovers individual causal effects in heterogeneous populations. We evaluate the method in simulations and the National Supported Work (NSW) program, an intensively studied program whose effects are known from randomized field experiments. We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample with the entire national program data, outperforming Statistical, Econometrics and Machine Learning estimators in all cases. As a tool, the approach also allows researchers to represent and visualize possible causes, and heterogeneous subpopulations, in their samples.
ISSN:2661-8907
2662-995X
2661-8907
DOI:10.1007/s42979-022-01319-2