Sufficient dimension reduction for average causal effect estimation

A large number of covariates can have a negative impact on the quality of causal effect estimation since confounding adjustment becomes unreliable when the number of covariates is large relative to the number of samples. Propensity score is a common way to deal with a large covariate set, but the ac...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Data mining and knowledge discovery 2022-05, Vol.36 (3), p.1174-1196
Hauptverfasser:	Cheng, Debo, Li, Jiuyong, Liu, Lin, Le, Thuc Duy, Liu, Jixue, Yu, Kui
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Chemistry and Earth Sciences Computer Science Data Mining and Knowledge Discovery Information Storage and Retrieval Physics Reduction Representations Statistics for Engineering
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A large number of covariates can have a negative impact on the quality of causal effect estimation since confounding adjustment becomes unreliable when the number of covariates is large relative to the number of samples. Propensity score is a common way to deal with a large covariate set, but the accuracy of propensity score estimation (normally done by logistic regression) is also challenged by the large number of covariates. In this paper, we prove that a large covariate set can be reduced to a lower dimensional representation which captures the complete information for adjustment in causal effect estimation. The theoretical result enables effective data-driven algorithms for causal effect estimation. Supported by the result, we develop an algorithm that employs a supervised kernel dimension reduction method to learn a lower dimensional representation from the original covariate space, and then utilises nearest neighbour matching in the reduced covariate space to impute the counterfactual outcomes to avoid the large sized covariate set problem. The proposed algorithm is evaluated on two semisynthetic and three real-world datasets and the results show the effectiveness of the proposed algorithm.
ISSN:	1384-5810 1573-756X
DOI:	10.1007/s10618-022-00832-5