Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples

Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and co...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-04, Vol.54 (8), p.6529-6542
Hauptverfasser: Zhu, Qianyang, Sun, Heyuan, Yang, Bo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 6542
container_issue 8
container_start_page 6529
container_title Applied intelligence (Dordrecht, Netherlands)
container_volume 54
creator Zhu, Qianyang
Sun, Heyuan
Yang, Bo
description Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.
doi_str_mv 10.1007/s10489-024-05518-7
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3068494474</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3068494474</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-9eab5e0ed9108875c2aa19ebed3c30838a9107e1e13b0b4f7c51e7f97b1960d13</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOI7-AVcF19Wbpm2apQy-YMSNgruQpredDH2ZpCMu_O9mpoI7V_fCOd85cAi5pHBNAfiNo5AWIoYkjSHLaBHzI7KgGWcxTwU_JgsQQcpz8X5KzpzbAgBjQBfk-9l40yhv-iZy2KL2Zuij0igXmT7Sw9R7tLXSflJtNFqszOzwGztMzWbP1LGbRrQ747CKqqFTAcSuxKrah7aobL9_Po3fRDtjD0lOdWOL7pyc1Kp1ePF7l-Tt_u519RivXx6eVrfrWCccfCxQlRkCVoJCUfBMJ0pRgaGCaQYFK1QQOFKkrIQyrbnOKPJa8JKKHCrKluRqzh3t8DGh83I7TLYPlZJBXqQiTXkaXMns0nZwzmItR2s6Zb8kBbmfWc4zyzCzPMwseYDYDLlg7hu0f9H_UD_UsIPL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3068494474</pqid></control><display><type>article</type><title>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</title><source>SpringerLink Journals - AutoHoldings</source><creator>Zhu, Qianyang ; Sun, Heyuan ; Yang, Bo</creator><creatorcontrib>Zhu, Qianyang ; Sun, Heyuan ; Yang, Bo</creatorcontrib><description>Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-024-05518-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Artificial Intelligence ; Bias ; Computer Science ; Embedding ; Feature extraction ; Machine learning ; Machines ; Manufacturing ; Mechanical Engineering ; Neural networks ; Processes ; Self-supervised learning ; Statistical analysis</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2024-04, Vol.54 (8), p.6529-6542</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. corrected publication 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-9eab5e0ed9108875c2aa19ebed3c30838a9107e1e13b0b4f7c51e7f97b1960d13</cites><orcidid>0000-0003-0805-7928</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-024-05518-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-024-05518-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,41486,42555,51317</link.rule.ids></links><search><creatorcontrib>Zhu, Qianyang</creatorcontrib><creatorcontrib>Sun, Heyuan</creatorcontrib><creatorcontrib>Yang, Bo</creatorcontrib><title>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.</description><subject>Accuracy</subject><subject>Artificial Intelligence</subject><subject>Bias</subject><subject>Computer Science</subject><subject>Embedding</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Mechanical Engineering</subject><subject>Neural networks</subject><subject>Processes</subject><subject>Self-supervised learning</subject><subject>Statistical analysis</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAUhYMoOI7-AVcF19Wbpm2apQy-YMSNgruQpredDH2ZpCMu_O9mpoI7V_fCOd85cAi5pHBNAfiNo5AWIoYkjSHLaBHzI7KgGWcxTwU_JgsQQcpz8X5KzpzbAgBjQBfk-9l40yhv-iZy2KL2Zuij0igXmT7Sw9R7tLXSflJtNFqszOzwGztMzWbP1LGbRrQ747CKqqFTAcSuxKrah7aobL9_Po3fRDtjD0lOdWOL7pyc1Kp1ePF7l-Tt_u519RivXx6eVrfrWCccfCxQlRkCVoJCUfBMJ0pRgaGCaQYFK1QQOFKkrIQyrbnOKPJa8JKKHCrKluRqzh3t8DGh83I7TLYPlZJBXqQiTXkaXMns0nZwzmItR2s6Zb8kBbmfWc4zyzCzPMwseYDYDLlg7hu0f9H_UD_UsIPL</recordid><startdate>20240401</startdate><enddate>20240401</enddate><creator>Zhu, Qianyang</creator><creator>Sun, Heyuan</creator><creator>Yang, Bo</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0805-7928</orcidid></search><sort><creationdate>20240401</creationdate><title>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</title><author>Zhu, Qianyang ; Sun, Heyuan ; Yang, Bo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-9eab5e0ed9108875c2aa19ebed3c30838a9107e1e13b0b4f7c51e7f97b1960d13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Artificial Intelligence</topic><topic>Bias</topic><topic>Computer Science</topic><topic>Embedding</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Mechanical Engineering</topic><topic>Neural networks</topic><topic>Processes</topic><topic>Self-supervised learning</topic><topic>Statistical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Qianyang</creatorcontrib><creatorcontrib>Sun, Heyuan</creatorcontrib><creatorcontrib>Yang, Bo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhu, Qianyang</au><au>Sun, Heyuan</au><au>Yang, Bo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2024-04-01</date><risdate>2024</risdate><volume>54</volume><issue>8</issue><spage>6529</spage><epage>6542</epage><pages>6529-6542</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-024-05518-7</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-0805-7928</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0924-669X
ispartof Applied intelligence (Dordrecht, Netherlands), 2024-04, Vol.54 (8), p.6529-6542
issn 0924-669X
1573-7497
language eng
recordid cdi_proquest_journals_3068494474
source SpringerLink Journals - AutoHoldings
subjects Accuracy
Artificial Intelligence
Bias
Computer Science
Embedding
Feature extraction
Machine learning
Machines
Manufacturing
Mechanical Engineering
Neural networks
Processes
Self-supervised learning
Statistical analysis
title Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T14%3A13%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mitigating%20selection%20bias%20in%20counterfactual%20prediction%20through%20self-supervised%20domain%20embedding%20learning%20with%20virtual%20samples&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Zhu,%20Qianyang&rft.date=2024-04-01&rft.volume=54&rft.issue=8&rft.spage=6529&rft.epage=6542&rft.pages=6529-6542&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-024-05518-7&rft_dat=%3Cproquest_cross%3E3068494474%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3068494474&rft_id=info:pmid/&rfr_iscdi=true