Collaborative causal inference on distributed data

In recent years, the development of technologies for causal inference with privacy preservation of distributed data has gained considerable attention. Many existing methods for distributed data focus on resolving the lack of subjects (samples) and can only reduce random errors in estimating treatmen...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2024-06, Vol.244, p.123024, Article 123024
Hauptverfasser:	Kawamata, Yuji, Motai, Ryoki, Okada, Yukihiko, Imakura, Akira, Sakurai, Tetsuya
Format:	Artikel
Sprache:	eng
Schlagworte:	Collaborative data analysis Distributed data Privacy-preserving method Propensity score Quasi-experiment Statistical causal inference
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In recent years, the development of technologies for causal inference with privacy preservation of distributed data has gained considerable attention. Many existing methods for distributed data focus on resolving the lack of subjects (samples) and can only reduce random errors in estimating treatment effects. In this study, we propose a data collaboration quasi-experiment (DC-QE) that resolves the lack of both subjects and covariates, reducing random errors and biases in the estimation. Our method involves constructing dimensionality-reduced intermediate representations from private data from local parties, sharing intermediate representations instead of private data for privacy preservation, estimating propensity scores from the shared intermediate representations, and finally, estimating the treatment effects from propensity scores. Through numerical experiments on both artificial and real-world data, we confirm that our method leads to better estimation results than individual analyses. While dimensionality reduction loses some information in the private data and causes performance degradation, we observe that sharing intermediate representations with many parties to resolve the lack of subjects and covariates sufficiently improves performance to overcome the degradation caused by dimensionality reduction. Although external validity is not necessarily guaranteed, our results suggest that DC-QE is a promising method. With the widespread use of our method, intermediate representations can be published as open data to help researchers find causalities and accumulate a knowledge base. •A privacy-preserving statistical causal inference method on distributed data.•Our method can reduce both random errors and biases in treatment-effect estimation.•Privacy of data is preserved by sharing only the intermediate representations.•Numerical experiments showed good estimation results in artificial and real data.•Intermediate representations can be accumulated as a knowledge base.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2023.123024