Generalizing Trimming Bounds for Endogenously Missing Outcome Data Using Random Forests

In many experimental or quasi-experimental studies, outcomes of interest are only observed for subjects who select (or are selected) to engage in the activity generating the outcome. Outcome data is thus endogenously missing for units who do not engage, in which case random or conditionally random t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Samii, Cyrus, Wang, Ye, Zhou, Junlong Aaron
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Samii, Cyrus
Wang, Ye
Zhou, Junlong Aaron
description In many experimental or quasi-experimental studies, outcomes of interest are only observed for subjects who select (or are selected) to engage in the activity generating the outcome. Outcome data is thus endogenously missing for units who do not engage, in which case random or conditionally random treatment assignment prior to such choices is insufficient to point identify treatment effects. Non-parametric partial identification bounds are a way to address endogenous missingness without having to make disputable parametric assumptions. Basic bounding approaches often yield bounds that are very wide and therefore minimally informative. We present methods for narrowing non-parametric bounds on treatment effects by adjusting for potentially large numbers of covariates, working with generalized random forests. Our approach allows for agnosticism about the data-generating process and honest inference. We use a simulation study and two replication exercises to demonstrate the benefits of our approach.
doi_str_mv 10.48550/arxiv.2309.08985
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2309_08985</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2309_08985</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-1a0932f975e6a9570b9ea6eac813e5f95de493a68eca5218bd806c6d14874a1e3</originalsourceid><addsrcrecordid>eNotj8FOwzAQRH3hgAofwAn_QIIdx459hNIWpKJKKFWP0TbZVJYSG9kJonw9TcppRjOj1T5CHjhLcy0le4LwY7_TTDCTMm20vCWHDToM0Nlf6060DLbvJ_PiR9dE2vpAV67xJ3R-jN2ZftgYp343DrXvkb7CAHQ_R59wGfZ07QPGId6Rmxa6iPf_uiDlelUu35LtbvO-fN4moAqZcGBGZK0pJCowsmBHg6AQas0FytbIBnMjQGmsQWZcHxvNVK0anusiB45iQR6vZ2ey6uvyP4RzNRFWM6H4AwFhTJ0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Generalizing Trimming Bounds for Endogenously Missing Outcome Data Using Random Forests</title><source>arXiv.org</source><creator>Samii, Cyrus ; Wang, Ye ; Zhou, Junlong Aaron</creator><creatorcontrib>Samii, Cyrus ; Wang, Ye ; Zhou, Junlong Aaron</creatorcontrib><description>In many experimental or quasi-experimental studies, outcomes of interest are only observed for subjects who select (or are selected) to engage in the activity generating the outcome. Outcome data is thus endogenously missing for units who do not engage, in which case random or conditionally random treatment assignment prior to such choices is insufficient to point identify treatment effects. Non-parametric partial identification bounds are a way to address endogenous missingness without having to make disputable parametric assumptions. Basic bounding approaches often yield bounds that are very wide and therefore minimally informative. We present methods for narrowing non-parametric bounds on treatment effects by adjusting for potentially large numbers of covariates, working with generalized random forests. Our approach allows for agnosticism about the data-generating process and honest inference. We use a simulation study and two replication exercises to demonstrate the benefits of our approach.</description><identifier>DOI: 10.48550/arxiv.2309.08985</identifier><language>eng</language><subject>Statistics - Applications ; Statistics - Methodology</subject><creationdate>2023-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2309.08985$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2309.08985$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Samii, Cyrus</creatorcontrib><creatorcontrib>Wang, Ye</creatorcontrib><creatorcontrib>Zhou, Junlong Aaron</creatorcontrib><title>Generalizing Trimming Bounds for Endogenously Missing Outcome Data Using Random Forests</title><description>In many experimental or quasi-experimental studies, outcomes of interest are only observed for subjects who select (or are selected) to engage in the activity generating the outcome. Outcome data is thus endogenously missing for units who do not engage, in which case random or conditionally random treatment assignment prior to such choices is insufficient to point identify treatment effects. Non-parametric partial identification bounds are a way to address endogenous missingness without having to make disputable parametric assumptions. Basic bounding approaches often yield bounds that are very wide and therefore minimally informative. We present methods for narrowing non-parametric bounds on treatment effects by adjusting for potentially large numbers of covariates, working with generalized random forests. Our approach allows for agnosticism about the data-generating process and honest inference. We use a simulation study and two replication exercises to demonstrate the benefits of our approach.</description><subject>Statistics - Applications</subject><subject>Statistics - Methodology</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOwzAQRH3hgAofwAn_QIIdx459hNIWpKJKKFWP0TbZVJYSG9kJonw9TcppRjOj1T5CHjhLcy0le4LwY7_TTDCTMm20vCWHDToM0Nlf6060DLbvJ_PiR9dE2vpAV67xJ3R-jN2ZftgYp343DrXvkb7CAHQ_R59wGfZ07QPGId6Rmxa6iPf_uiDlelUu35LtbvO-fN4moAqZcGBGZK0pJCowsmBHg6AQas0FytbIBnMjQGmsQWZcHxvNVK0anusiB45iQR6vZ2ey6uvyP4RzNRFWM6H4AwFhTJ0</recordid><startdate>20230916</startdate><enddate>20230916</enddate><creator>Samii, Cyrus</creator><creator>Wang, Ye</creator><creator>Zhou, Junlong Aaron</creator><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20230916</creationdate><title>Generalizing Trimming Bounds for Endogenously Missing Outcome Data Using Random Forests</title><author>Samii, Cyrus ; Wang, Ye ; Zhou, Junlong Aaron</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-1a0932f975e6a9570b9ea6eac813e5f95de493a68eca5218bd806c6d14874a1e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Statistics - Applications</topic><topic>Statistics - Methodology</topic><toplevel>online_resources</toplevel><creatorcontrib>Samii, Cyrus</creatorcontrib><creatorcontrib>Wang, Ye</creatorcontrib><creatorcontrib>Zhou, Junlong Aaron</creatorcontrib><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Samii, Cyrus</au><au>Wang, Ye</au><au>Zhou, Junlong Aaron</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generalizing Trimming Bounds for Endogenously Missing Outcome Data Using Random Forests</atitle><date>2023-09-16</date><risdate>2023</risdate><abstract>In many experimental or quasi-experimental studies, outcomes of interest are only observed for subjects who select (or are selected) to engage in the activity generating the outcome. Outcome data is thus endogenously missing for units who do not engage, in which case random or conditionally random treatment assignment prior to such choices is insufficient to point identify treatment effects. Non-parametric partial identification bounds are a way to address endogenous missingness without having to make disputable parametric assumptions. Basic bounding approaches often yield bounds that are very wide and therefore minimally informative. We present methods for narrowing non-parametric bounds on treatment effects by adjusting for potentially large numbers of covariates, working with generalized random forests. Our approach allows for agnosticism about the data-generating process and honest inference. We use a simulation study and two replication exercises to demonstrate the benefits of our approach.</abstract><doi>10.48550/arxiv.2309.08985</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2309.08985
ispartof
issn
language eng
recordid cdi_arxiv_primary_2309_08985
source arXiv.org
subjects Statistics - Applications
Statistics - Methodology
title Generalizing Trimming Bounds for Endogenously Missing Outcome Data Using Random Forests
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T16%3A19%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generalizing%20Trimming%20Bounds%20for%20Endogenously%20Missing%20Outcome%20Data%20Using%20Random%20Forests&rft.au=Samii,%20Cyrus&rft.date=2023-09-16&rft_id=info:doi/10.48550/arxiv.2309.08985&rft_dat=%3Carxiv_GOX%3E2309_08985%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true