Counterfactual Data Augmentation with Contrastive Learning

Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of ind...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Aloui, Ahmed, Dong, Juncheng, Le, Cat P, Tarokh, Vahid
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Statistics - Machine Learning Statistics - Methodology
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Aloui, Ahmed Dong, Juncheng Le, Cat P Tarokh, Vahid
description	Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes. This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group. By augmenting the original dataset with these reliable imputations, we can effectively reduce the discrepancy between different treatment groups, while inducing minimal imputation error. The augmented dataset is subsequently employed to train CATE estimation models. Theoretical analysis and experimental studies on synthetic and semi-synthetic benchmarks demonstrate that our method achieves significant improvements in both performance and robustness to overfitting across state-of-the-art models.
doi_str_mv	10.48550/arxiv.2311.03630
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2311_03630</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2311_03630</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-375909893e44e87f9cc99a2d7c2f870204ad57e32036f56233d9efaa98a55fbb3</originalsourceid><addsrcrecordid>eNotj7FuwjAUAL10QMAHMNU_kNTxi2ObDaWlVIrEwh49kmdqCZzKOBT-vi10uu10x9iiEHlplBIvGK_-kksoilxABWLClvUwhkTRYZdGPPJXTMhX4-FEIWHyQ-DfPn3yeggp4jn5C_GGMAYfDjP25PB4pvk_p2y3ftvVm6zZvn_UqybDSosMtLLCGgtUlmS0s11nLcped9IZLaQosVeaQP4GOVVJgN6SQ7QGlXL7PUzZ80N7j2-_oj9hvLV_E-19An4AVJlBrQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Counterfactual Data Augmentation with Contrastive Learning</title><source>arXiv.org</source><creator>Aloui, Ahmed ; Dong, Juncheng ; Le, Cat P ; Tarokh, Vahid</creator><creatorcontrib>Aloui, Ahmed ; Dong, Juncheng ; Le, Cat P ; Tarokh, Vahid</creatorcontrib><description>Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes. This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group. By augmenting the original dataset with these reliable imputations, we can effectively reduce the discrepancy between different treatment groups, while inducing minimal imputation error. The augmented dataset is subsequently employed to train CATE estimation models. Theoretical analysis and experimental studies on synthetic and semi-synthetic benchmarks demonstrate that our method achieves significant improvements in both performance and robustness to overfitting across state-of-the-art models.</description><identifier>DOI: 10.48550/arxiv.2311.03630</identifier><language>eng</language><subject>Computer Science - Learning ; Statistics - Machine Learning ; Statistics - Methodology</subject><creationdate>2023-11</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2311.03630$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2311.03630$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Aloui, Ahmed</creatorcontrib><creatorcontrib>Dong, Juncheng</creatorcontrib><creatorcontrib>Le, Cat P</creatorcontrib><creatorcontrib>Tarokh, Vahid</creatorcontrib><title>Counterfactual Data Augmentation with Contrastive Learning</title><description>Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes. This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group. By augmenting the original dataset with these reliable imputations, we can effectively reduce the discrepancy between different treatment groups, while inducing minimal imputation error. The augmented dataset is subsequently employed to train CATE estimation models. Theoretical analysis and experimental studies on synthetic and semi-synthetic benchmarks demonstrate that our method achieves significant improvements in both performance and robustness to overfitting across state-of-the-art models.</description><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><subject>Statistics - Methodology</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7FuwjAUAL10QMAHMNU_kNTxi2ObDaWlVIrEwh49kmdqCZzKOBT-vi10uu10x9iiEHlplBIvGK_-kksoilxABWLClvUwhkTRYZdGPPJXTMhX4-FEIWHyQ-DfPn3yeggp4jn5C_GGMAYfDjP25PB4pvk_p2y3ftvVm6zZvn_UqybDSosMtLLCGgtUlmS0s11nLcped9IZLaQosVeaQP4GOVVJgN6SQ7QGlXL7PUzZ80N7j2-_oj9hvLV_E-19An4AVJlBrQ</recordid><startdate>20231106</startdate><enddate>20231106</enddate><creator>Aloui, Ahmed</creator><creator>Dong, Juncheng</creator><creator>Le, Cat P</creator><creator>Tarokh, Vahid</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20231106</creationdate><title>Counterfactual Data Augmentation with Contrastive Learning</title><author>Aloui, Ahmed ; Dong, Juncheng ; Le, Cat P ; Tarokh, Vahid</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-375909893e44e87f9cc99a2d7c2f870204ad57e32036f56233d9efaa98a55fbb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><topic>Statistics - Methodology</topic><toplevel>online_resources</toplevel><creatorcontrib>Aloui, Ahmed</creatorcontrib><creatorcontrib>Dong, Juncheng</creatorcontrib><creatorcontrib>Le, Cat P</creatorcontrib><creatorcontrib>Tarokh, Vahid</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aloui, Ahmed</au><au>Dong, Juncheng</au><au>Le, Cat P</au><au>Tarokh, Vahid</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Counterfactual Data Augmentation with Contrastive Learning</atitle><date>2023-11-06</date><risdate>2023</risdate><abstract>Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes. This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group. By augmenting the original dataset with these reliable imputations, we can effectively reduce the discrepancy between different treatment groups, while inducing minimal imputation error. The augmented dataset is subsequently employed to train CATE estimation models. Theoretical analysis and experimental studies on synthetic and semi-synthetic benchmarks demonstrate that our method achieves significant improvements in both performance and robustness to overfitting across state-of-the-art models.</abstract><doi>10.48550/arxiv.2311.03630</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2311.03630
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2311_03630
source	arXiv.org
subjects	Computer Science - Learning Statistics - Machine Learning Statistics - Methodology
title	Counterfactual Data Augmentation with Contrastive Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T20%3A20%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Counterfactual%20Data%20Augmentation%20with%20Contrastive%20Learning&rft.au=Aloui,%20Ahmed&rft.date=2023-11-06&rft_id=info:doi/10.48550/arxiv.2311.03630&rft_dat=%3Carxiv_GOX%3E2311_03630%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true