Machine Learning for Experimental Design: Methods for Improved Blocking

Restricting randomization in the design of experiments (e.g., using blocking/stratification, pair-wise matching, or rerandomization) can improve the treatment-control balance on important covariates and therefore improve the estimation of the treatment effect, particularly for small- and medium-size...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Quistorff, Brian, Johnson, Gentry
Format: Artikel
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Quistorff, Brian
Johnson, Gentry
description Restricting randomization in the design of experiments (e.g., using blocking/stratification, pair-wise matching, or rerandomization) can improve the treatment-control balance on important covariates and therefore improve the estimation of the treatment effect, particularly for small- and medium-sized experiments. Existing guidance on how to identify these variables and implement the restrictions is incomplete and conflicting. We identify that differences are mainly due to the fact that what is important in the pre-treatment data may not translate to the post-treatment data. We highlight settings where there is sufficient data to provide clear guidance and outline improved methods to mostly automate the process using modern machine learning (ML) techniques. We show in simulations using real-world data, that these methods reduce both the mean squared error of the estimate (14%-34%) and the size of the standard error (6%-16%).
doi_str_mv 10.48550/arxiv.2010.15966
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2010_15966</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010_15966</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-6abf00701aa059c8f1a9ae89b0f010e75e04c2b713137b84ace985237a0a5b5d3</originalsourceid><addsrcrecordid>eNotj01OwzAUhL1hgQoHYIUvkGLH8R87KKVUSsWm--jZeW4tUidyqqrcnhBYjTSaGc1HyANny8pIyZ4gX-NlWbLJ4NIqdUs2O_DHmJDWCDnFdKChz3R9HTDHE6YzdPQNx3hIz3SH52PfjnNgexpyf8GWvna9_5pqd-QmQDfi_b8uyP59vV99FPXnZrt6qQtQWhUKXGBMMw7ApPUmcLCAxjoWpk-oJbLKl05zwYV2pgKP1shSaGAgnWzFgjz-zc4kzTCdhPzd_BI1M5H4ARjQRec</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Machine Learning for Experimental Design: Methods for Improved Blocking</title><source>arXiv.org</source><creator>Quistorff, Brian ; Johnson, Gentry</creator><creatorcontrib>Quistorff, Brian ; Johnson, Gentry</creatorcontrib><description>Restricting randomization in the design of experiments (e.g., using blocking/stratification, pair-wise matching, or rerandomization) can improve the treatment-control balance on important covariates and therefore improve the estimation of the treatment effect, particularly for small- and medium-sized experiments. Existing guidance on how to identify these variables and implement the restrictions is incomplete and conflicting. We identify that differences are mainly due to the fact that what is important in the pre-treatment data may not translate to the post-treatment data. We highlight settings where there is sufficient data to provide clear guidance and outline improved methods to mostly automate the process using modern machine learning (ML) techniques. We show in simulations using real-world data, that these methods reduce both the mean squared error of the estimate (14%-34%) and the size of the standard error (6%-16%).</description><identifier>DOI: 10.48550/arxiv.2010.15966</identifier><language>eng</language><creationdate>2020-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2010.15966$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.15966$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Quistorff, Brian</creatorcontrib><creatorcontrib>Johnson, Gentry</creatorcontrib><title>Machine Learning for Experimental Design: Methods for Improved Blocking</title><description>Restricting randomization in the design of experiments (e.g., using blocking/stratification, pair-wise matching, or rerandomization) can improve the treatment-control balance on important covariates and therefore improve the estimation of the treatment effect, particularly for small- and medium-sized experiments. Existing guidance on how to identify these variables and implement the restrictions is incomplete and conflicting. We identify that differences are mainly due to the fact that what is important in the pre-treatment data may not translate to the post-treatment data. We highlight settings where there is sufficient data to provide clear guidance and outline improved methods to mostly automate the process using modern machine learning (ML) techniques. We show in simulations using real-world data, that these methods reduce both the mean squared error of the estimate (14%-34%) and the size of the standard error (6%-16%).</description><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj01OwzAUhL1hgQoHYIUvkGLH8R87KKVUSsWm--jZeW4tUidyqqrcnhBYjTSaGc1HyANny8pIyZ4gX-NlWbLJ4NIqdUs2O_DHmJDWCDnFdKChz3R9HTDHE6YzdPQNx3hIz3SH52PfjnNgexpyf8GWvna9_5pqd-QmQDfi_b8uyP59vV99FPXnZrt6qQtQWhUKXGBMMw7ApPUmcLCAxjoWpk-oJbLKl05zwYV2pgKP1shSaGAgnWzFgjz-zc4kzTCdhPzd_BI1M5H4ARjQRec</recordid><startdate>20201029</startdate><enddate>20201029</enddate><creator>Quistorff, Brian</creator><creator>Johnson, Gentry</creator><scope>ADEOX</scope><scope>GOX</scope></search><sort><creationdate>20201029</creationdate><title>Machine Learning for Experimental Design: Methods for Improved Blocking</title><author>Quistorff, Brian ; Johnson, Gentry</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-6abf00701aa059c8f1a9ae89b0f010e75e04c2b713137b84ace985237a0a5b5d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Quistorff, Brian</creatorcontrib><creatorcontrib>Johnson, Gentry</creatorcontrib><collection>arXiv Economics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Quistorff, Brian</au><au>Johnson, Gentry</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Machine Learning for Experimental Design: Methods for Improved Blocking</atitle><date>2020-10-29</date><risdate>2020</risdate><abstract>Restricting randomization in the design of experiments (e.g., using blocking/stratification, pair-wise matching, or rerandomization) can improve the treatment-control balance on important covariates and therefore improve the estimation of the treatment effect, particularly for small- and medium-sized experiments. Existing guidance on how to identify these variables and implement the restrictions is incomplete and conflicting. We identify that differences are mainly due to the fact that what is important in the pre-treatment data may not translate to the post-treatment data. We highlight settings where there is sufficient data to provide clear guidance and outline improved methods to mostly automate the process using modern machine learning (ML) techniques. We show in simulations using real-world data, that these methods reduce both the mean squared error of the estimate (14%-34%) and the size of the standard error (6%-16%).</abstract><doi>10.48550/arxiv.2010.15966</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2010.15966
ispartof
issn
language eng
recordid cdi_arxiv_primary_2010_15966
source arXiv.org
title Machine Learning for Experimental Design: Methods for Improved Blocking
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T18%3A09%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Machine%20Learning%20for%20Experimental%20Design:%20Methods%20for%20Improved%20Blocking&rft.au=Quistorff,%20Brian&rft.date=2020-10-29&rft_id=info:doi/10.48550/arxiv.2010.15966&rft_dat=%3Carxiv_GOX%3E2010_15966%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true