Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples

Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and co...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-04, Vol.54 (8), p.6529-6542
Hauptverfasser:	Zhu, Qianyang, Sun, Heyuan, Yang, Bo
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Artificial Intelligence Bias Computer Science Embedding Feature extraction Machine learning Machines Manufacturing Mechanical Engineering Neural networks Processes Self-supervised learning Statistical analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-024-05518-7