Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples

Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and co...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-04, Vol.54 (8), p.6529-6542
Hauptverfasser:	Zhu, Qianyang, Sun, Heyuan, Yang, Bo
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Artificial Intelligence Bias Computer Science Embedding Feature extraction Machine learning Machines Manufacturing Mechanical Engineering Neural networks Processes Self-supervised learning Statistical analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	6542
container_issue	8
container_start_page	6529
container_title	Applied intelligence (Dordrecht, Netherlands)
container_volume	54
creator	Zhu, Qianyang Sun, Heyuan Yang, Bo
description	Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.
doi_str_mv	10.1007/s10489-024-05518-7
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3068494474</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3068494474</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-9eab5e0ed9108875c2aa19ebed3c30838a9107e1e13b0b4f7c51e7f97b1960d13</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOI7-AVcF19Wbpm2apQy-YMSNgruQpredDH2ZpCMu_O9mpoI7V_fCOd85cAi5pHBNAfiNo5AWIoYkjSHLaBHzI7KgGWcxTwU_JgsQQcpz8X5KzpzbAgBjQBfk-9l40yhv-iZy2KL2Zuij0igXmT7Sw9R7tLXSflJtNFqszOzwGztMzWbP1LGbRrQ747CKqqFTAcSuxKrah7aobL9_Po3fRDtjD0lOdWOL7pyc1Kp1ePF7l-Tt_u519RivXx6eVrfrWCccfCxQlRkCVoJCUfBMJ0pRgaGCaQYFK1QQOFKkrIQyrbnOKPJa8JKKHCrKluRqzh3t8DGh83I7TLYPlZJBXqQiTXkaXMns0nZwzmItR2s6Zb8kBbmfWc4zyzCzPMwseYDYDLlg7hu0f9H_UD_UsIPL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3068494474</pqid></control><display><type>article</type><title>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</title><source>SpringerLink Journals - AutoHoldings</source><creator>Zhu, Qianyang ; Sun, Heyuan ; Yang, Bo</creator><creatorcontrib>Zhu, Qianyang ; Sun, Heyuan ; Yang, Bo</creatorcontrib><description>Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-024-05518-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Artificial Intelligence ; Bias ; Computer Science ; Embedding ; Feature extraction ; Machine learning ; Machines ; Manufacturing ; Mechanical Engineering ; Neural networks ; Processes ; Self-supervised learning ; Statistical analysis</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2024-04, Vol.54 (8), p.6529-6542</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. corrected publication 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-9eab5e0ed9108875c2aa19ebed3c30838a9107e1e13b0b4f7c51e7f97b1960d13</cites><orcidid>0000-0003-0805-7928</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-024-05518-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-024-05518-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,41486,42555,51317</link.rule.ids></links><search><creatorcontrib>Zhu, Qianyang</creatorcontrib><creatorcontrib>Sun, Heyuan</creatorcontrib><creatorcontrib>Yang, Bo</creatorcontrib><title>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.</description><subject>Accuracy</subject><subject>Artificial Intelligence</subject><subject>Bias</subject><subject>Computer Science</subject><subject>Embedding</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Mechanical Engineering</subject><subject>Neural networks</subject><subject>Processes</subject><subject>Self-supervised learning</subject><subject>Statistical analysis</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAUhYMoOI7-AVcF19Wbpm2apQy-YMSNgruQpredDH2ZpCMu_O9mpoI7V_fCOd85cAi5pHBNAfiNo5AWIoYkjSHLaBHzI7KgGWcxTwU_JgsQQcpz8X5KzpzbAgBjQBfk-9l40yhv-iZy2KL2Zuij0igXmT7Sw9R7tLXSflJtNFqszOzwGztMzWbP1LGbRrQ747CKqqFTAcSuxKrah7aobL9_Po3fRDtjD0lOdWOL7pyc1Kp1ePF7l-Tt_u519RivXx6eVrfrWCccfCxQlRkCVoJCUfBMJ0pRgaGCaQYFK1QQOFKkrIQyrbnOKPJa8JKKHCrKluRqzh3t8DGh83I7TLYPlZJBXqQiTXkaXMns0nZwzmItR2s6Zb8kBbmfWc4zyzCzPMwseYDYDLlg7hu0f9H_UD_UsIPL</recordid><startdate>20240401</startdate><enddate>20240401</enddate><creator>Zhu, Qianyang</creator><creator>Sun, Heyuan</creator><creator>Yang, Bo</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0805-7928</orcidid></search><sort><creationdate>20240401</creationdate><title>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</title><author>Zhu, Qianyang ; Sun, Heyuan ; Yang, Bo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-9eab5e0ed9108875c2aa19ebed3c30838a9107e1e13b0b4f7c51e7f97b1960d13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Artificial Intelligence</topic><topic>Bias</topic><topic>Computer Science</topic><topic>Embedding</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Mechanical Engineering</topic><topic>Neural networks</topic><topic>Processes</topic><topic>Self-supervised learning</topic><topic>Statistical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Qianyang</creatorcontrib><creatorcontrib>Sun, Heyuan</creatorcontrib><creatorcontrib>Yang, Bo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhu, Qianyang</au><au>Sun, Heyuan</au><au>Yang, Bo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2024-04-01</date><risdate>2024</risdate><volume>54</volume><issue>8</issue><spage>6529</spage><epage>6542</epage><pages>6529-6542</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-024-05518-7</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-0805-7928</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0924-669X
ispartof	Applied intelligence (Dordrecht, Netherlands), 2024-04, Vol.54 (8), p.6529-6542
issn	0924-669X 1573-7497
language	eng
recordid	cdi_proquest_journals_3068494474
source	SpringerLink Journals - AutoHoldings
subjects	Accuracy Artificial Intelligence Bias Computer Science Embedding Feature extraction Machine learning Machines Manufacturing Mechanical Engineering Neural networks Processes Self-supervised learning Statistical analysis
title	Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T14%3A13%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mitigating%20selection%20bias%20in%20counterfactual%20prediction%20through%20self-supervised%20domain%20embedding%20learning%20with%20virtual%20samples&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Zhu,%20Qianyang&rft.date=2024-04-01&rft.volume=54&rft.issue=8&rft.spage=6529&rft.epage=6542&rft.pages=6529-6542&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-024-05518-7&rft_dat=%3Cproquest_cross%3E3068494474%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3068494474&rft_id=info:pmid/&rfr_iscdi=true