Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples
Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and co...
Gespeichert in:
Veröffentlicht in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-04, Vol.54 (8), p.6529-6542 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 6542 |
---|---|
container_issue | 8 |
container_start_page | 6529 |
container_title | Applied intelligence (Dordrecht, Netherlands) |
container_volume | 54 |
creator | Zhu, Qianyang Sun, Heyuan Yang, Bo |
description | Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches. |
doi_str_mv | 10.1007/s10489-024-05518-7 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3068494474</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3068494474</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-9eab5e0ed9108875c2aa19ebed3c30838a9107e1e13b0b4f7c51e7f97b1960d13</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOI7-AVcF19Wbpm2apQy-YMSNgruQpredDH2ZpCMu_O9mpoI7V_fCOd85cAi5pHBNAfiNo5AWIoYkjSHLaBHzI7KgGWcxTwU_JgsQQcpz8X5KzpzbAgBjQBfk-9l40yhv-iZy2KL2Zuij0igXmT7Sw9R7tLXSflJtNFqszOzwGztMzWbP1LGbRrQ747CKqqFTAcSuxKrah7aobL9_Po3fRDtjD0lOdWOL7pyc1Kp1ePF7l-Tt_u519RivXx6eVrfrWCccfCxQlRkCVoJCUfBMJ0pRgaGCaQYFK1QQOFKkrIQyrbnOKPJa8JKKHCrKluRqzh3t8DGh83I7TLYPlZJBXqQiTXkaXMns0nZwzmItR2s6Zb8kBbmfWc4zyzCzPMwseYDYDLlg7hu0f9H_UD_UsIPL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3068494474</pqid></control><display><type>article</type><title>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</title><source>SpringerLink Journals - AutoHoldings</source><creator>Zhu, Qianyang ; Sun, Heyuan ; Yang, Bo</creator><creatorcontrib>Zhu, Qianyang ; Sun, Heyuan ; Yang, Bo</creatorcontrib><description>Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-024-05518-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Artificial Intelligence ; Bias ; Computer Science ; Embedding ; Feature extraction ; Machine learning ; Machines ; Manufacturing ; Mechanical Engineering ; Neural networks ; Processes ; Self-supervised learning ; Statistical analysis</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2024-04, Vol.54 (8), p.6529-6542</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. corrected publication 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-9eab5e0ed9108875c2aa19ebed3c30838a9107e1e13b0b4f7c51e7f97b1960d13</cites><orcidid>0000-0003-0805-7928</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-024-05518-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-024-05518-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,41486,42555,51317</link.rule.ids></links><search><creatorcontrib>Zhu, Qianyang</creatorcontrib><creatorcontrib>Sun, Heyuan</creatorcontrib><creatorcontrib>Yang, Bo</creatorcontrib><title>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.</description><subject>Accuracy</subject><subject>Artificial Intelligence</subject><subject>Bias</subject><subject>Computer Science</subject><subject>Embedding</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Mechanical Engineering</subject><subject>Neural networks</subject><subject>Processes</subject><subject>Self-supervised learning</subject><subject>Statistical analysis</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAUhYMoOI7-AVcF19Wbpm2apQy-YMSNgruQpredDH2ZpCMu_O9mpoI7V_fCOd85cAi5pHBNAfiNo5AWIoYkjSHLaBHzI7KgGWcxTwU_JgsQQcpz8X5KzpzbAgBjQBfk-9l40yhv-iZy2KL2Zuij0igXmT7Sw9R7tLXSflJtNFqszOzwGztMzWbP1LGbRrQ747CKqqFTAcSuxKrah7aobL9_Po3fRDtjD0lOdWOL7pyc1Kp1ePF7l-Tt_u519RivXx6eVrfrWCccfCxQlRkCVoJCUfBMJ0pRgaGCaQYFK1QQOFKkrIQyrbnOKPJa8JKKHCrKluRqzh3t8DGh83I7TLYPlZJBXqQiTXkaXMns0nZwzmItR2s6Zb8kBbmfWc4zyzCzPMwseYDYDLlg7hu0f9H_UD_UsIPL</recordid><startdate>20240401</startdate><enddate>20240401</enddate><creator>Zhu, Qianyang</creator><creator>Sun, Heyuan</creator><creator>Yang, Bo</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0805-7928</orcidid></search><sort><creationdate>20240401</creationdate><title>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</title><author>Zhu, Qianyang ; Sun, Heyuan ; Yang, Bo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-9eab5e0ed9108875c2aa19ebed3c30838a9107e1e13b0b4f7c51e7f97b1960d13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Artificial Intelligence</topic><topic>Bias</topic><topic>Computer Science</topic><topic>Embedding</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Mechanical Engineering</topic><topic>Neural networks</topic><topic>Processes</topic><topic>Self-supervised learning</topic><topic>Statistical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Qianyang</creatorcontrib><creatorcontrib>Sun, Heyuan</creatorcontrib><creatorcontrib>Yang, Bo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhu, Qianyang</au><au>Sun, Heyuan</au><au>Yang, Bo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2024-04-01</date><risdate>2024</risdate><volume>54</volume><issue>8</issue><spage>6529</spage><epage>6542</epage><pages>6529-6542</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-024-05518-7</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-0805-7928</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0924-669X |
ispartof | Applied intelligence (Dordrecht, Netherlands), 2024-04, Vol.54 (8), p.6529-6542 |
issn | 0924-669X 1573-7497 |
language | eng |
recordid | cdi_proquest_journals_3068494474 |
source | SpringerLink Journals - AutoHoldings |
subjects | Accuracy Artificial Intelligence Bias Computer Science Embedding Feature extraction Machine learning Machines Manufacturing Mechanical Engineering Neural networks Processes Self-supervised learning Statistical analysis |
title | Mitigating selection bias in counterfactual prediction through self-supervised domain embedding learning with virtual samples |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T14%3A13%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mitigating%20selection%20bias%20in%20counterfactual%20prediction%20through%20self-supervised%20domain%20embedding%20learning%20with%20virtual%20samples&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Zhu,%20Qianyang&rft.date=2024-04-01&rft.volume=54&rft.issue=8&rft.spage=6529&rft.epage=6542&rft.pages=6529-6542&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-024-05518-7&rft_dat=%3Cproquest_cross%3E3068494474%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3068494474&rft_id=info:pmid/&rfr_iscdi=true |