Robust quasi‐randomization‐based estimation with ensemble learning for missing data

Missing data analysis requires assumptions about an outcome model or a response probability model to adjust for potential bias due to nonresponse. Doubly robust (DR) estimators are consistent if at least one of the models is correctly specified. Multiply robust (MR) estimators extend DR estimators b...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Scandinavian journal of statistics 2023-09, Vol.50 (3), p.1263-1278
Hauptverfasser:	Lee, Danhyang, Zhang, Li‐Chun, Chen, Sixia
Format:	Artikel
Sprache:	eng
Schlagworte:	cell mean model Data analysis Ensemble learning Estimators item nonresponse Machine learning missing at random Missing data Randomization Rao–Blackwell method Robustness Statistical analysis Statistical methods variance estimation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1278
container_issue	3
container_start_page	1263
container_title	Scandinavian journal of statistics
container_volume	50
creator	Lee, Danhyang Zhang, Li‐Chun Chen, Sixia
description	Missing data analysis requires assumptions about an outcome model or a response probability model to adjust for potential bias due to nonresponse. Doubly robust (DR) estimators are consistent if at least one of the models is correctly specified. Multiply robust (MR) estimators extend DR estimators by allowing for multiple models for both the outcome and/or response probability models and are consistent if at least one of the multiple models is correctly specified. We propose a robust quasi‐randomization‐based model approach to bring more protection against model misspecification than the existing DR and MR estimators, where any multiple semiparametric, nonparametric or machine learning models can be used for the outcome variable. The proposed estimator achieves unbiasedness by using a subsampling Rao–Blackwell method, given cell‐homogenous response, regardless of any working models for the outcome. An unbiased variance estimation formula is proposed, which does not use any replicate jackknife or bootstrap methods. A simulation study shows that our proposed method outperforms the existing multiply robust estimators.
doi_str_mv	10.1111/sjos.12626
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2851736172</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2851736172</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2966-7f2757e4131f8ab28c776a63ac4ee77433ccfeea37a2eb3aa7437b0a3f6537ae3</originalsourceid><addsrcrecordid>eNp9kEtOwzAQhi0EEqWw4QSR2CGl-JHayRJVUECVKlEQS2uSTsBVErd2oqqsOAJn5CS4DWu8Gc-vb14_IZeMjlh4N35l_YhxyeURGbBEqjhLZHZMBlRQEcs0S0_JmfcrSplMWDogb88273wbbTrw5ufr20GztLX5hNbYJuQ5eFxG6FtTH6Roa9qPCBuPdV5hVCG4xjTvUWldVBvv9_8ltHBOTkqoPF78xSF5vb97mTzEs_n0cXI7iwueSRmrkquxwoQJVqaQ87RQSoIUUCSISiVCFEWJCEIBx1wABEnlFEQpx0FDMSRXfd-1s5su7KlXtnNNGKl5OmZKSKZ4oK57qnDWe4elXrtwkNtpRvXeOL03Th-MCzDr4a2pcPcPqRdP80Vf8wvPqXRT</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2851736172</pqid></control><display><type>article</type><title>Robust quasi‐randomization‐based estimation with ensemble learning for missing data</title><source>Wiley Online Library Journals Frontfile Complete</source><source>Business Source Complete</source><creator>Lee, Danhyang ; Zhang, Li‐Chun ; Chen, Sixia</creator><creatorcontrib>Lee, Danhyang ; Zhang, Li‐Chun ; Chen, Sixia</creatorcontrib><description>Missing data analysis requires assumptions about an outcome model or a response probability model to adjust for potential bias due to nonresponse. Doubly robust (DR) estimators are consistent if at least one of the models is correctly specified. Multiply robust (MR) estimators extend DR estimators by allowing for multiple models for both the outcome and/or response probability models and are consistent if at least one of the multiple models is correctly specified. We propose a robust quasi‐randomization‐based model approach to bring more protection against model misspecification than the existing DR and MR estimators, where any multiple semiparametric, nonparametric or machine learning models can be used for the outcome variable. The proposed estimator achieves unbiasedness by using a subsampling Rao–Blackwell method, given cell‐homogenous response, regardless of any working models for the outcome. An unbiased variance estimation formula is proposed, which does not use any replicate jackknife or bootstrap methods. A simulation study shows that our proposed method outperforms the existing multiply robust estimators.</description><identifier>ISSN: 0303-6898</identifier><identifier>EISSN: 1467-9469</identifier><identifier>DOI: 10.1111/sjos.12626</identifier><language>eng</language><publisher>Oxford: Blackwell Publishing Ltd</publisher><subject>cell mean model ; Data analysis ; Ensemble learning ; Estimators ; item nonresponse ; Machine learning ; missing at random ; Missing data ; Randomization ; Rao–Blackwell method ; Robustness ; Statistical analysis ; Statistical methods ; variance estimation</subject><ispartof>Scandinavian journal of statistics, 2023-09, Vol.50 (3), p.1263-1278</ispartof><rights>2022 Board of the Foundation of the Scandinavian Journal of Statistics.</rights><rights>2023 Board of the Foundation of the Scandinavian Journal of Statistics</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2966-7f2757e4131f8ab28c776a63ac4ee77433ccfeea37a2eb3aa7437b0a3f6537ae3</cites><orcidid>0000-0003-2550-8460 ; 0000-0001-5082-281X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fsjos.12626$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fsjos.12626$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,777,781,1412,27905,27906,45555,45556</link.rule.ids></links><search><creatorcontrib>Lee, Danhyang</creatorcontrib><creatorcontrib>Zhang, Li‐Chun</creatorcontrib><creatorcontrib>Chen, Sixia</creatorcontrib><title>Robust quasi‐randomization‐based estimation with ensemble learning for missing data</title><title>Scandinavian journal of statistics</title><description>Missing data analysis requires assumptions about an outcome model or a response probability model to adjust for potential bias due to nonresponse. Doubly robust (DR) estimators are consistent if at least one of the models is correctly specified. Multiply robust (MR) estimators extend DR estimators by allowing for multiple models for both the outcome and/or response probability models and are consistent if at least one of the multiple models is correctly specified. We propose a robust quasi‐randomization‐based model approach to bring more protection against model misspecification than the existing DR and MR estimators, where any multiple semiparametric, nonparametric or machine learning models can be used for the outcome variable. The proposed estimator achieves unbiasedness by using a subsampling Rao–Blackwell method, given cell‐homogenous response, regardless of any working models for the outcome. An unbiased variance estimation formula is proposed, which does not use any replicate jackknife or bootstrap methods. A simulation study shows that our proposed method outperforms the existing multiply robust estimators.</description><subject>cell mean model</subject><subject>Data analysis</subject><subject>Ensemble learning</subject><subject>Estimators</subject><subject>item nonresponse</subject><subject>Machine learning</subject><subject>missing at random</subject><subject>Missing data</subject><subject>Randomization</subject><subject>Rao–Blackwell method</subject><subject>Robustness</subject><subject>Statistical analysis</subject><subject>Statistical methods</subject><subject>variance estimation</subject><issn>0303-6898</issn><issn>1467-9469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kEtOwzAQhi0EEqWw4QSR2CGl-JHayRJVUECVKlEQS2uSTsBVErd2oqqsOAJn5CS4DWu8Gc-vb14_IZeMjlh4N35l_YhxyeURGbBEqjhLZHZMBlRQEcs0S0_JmfcrSplMWDogb88273wbbTrw5ufr20GztLX5hNbYJuQ5eFxG6FtTH6Roa9qPCBuPdV5hVCG4xjTvUWldVBvv9_8ltHBOTkqoPF78xSF5vb97mTzEs_n0cXI7iwueSRmrkquxwoQJVqaQ87RQSoIUUCSISiVCFEWJCEIBx1wABEnlFEQpx0FDMSRXfd-1s5su7KlXtnNNGKl5OmZKSKZ4oK57qnDWe4elXrtwkNtpRvXeOL03Th-MCzDr4a2pcPcPqRdP80Vf8wvPqXRT</recordid><startdate>202309</startdate><enddate>202309</enddate><creator>Lee, Danhyang</creator><creator>Zhang, Li‐Chun</creator><creator>Chen, Sixia</creator><general>Blackwell Publishing Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2550-8460</orcidid><orcidid>https://orcid.org/0000-0001-5082-281X</orcidid></search><sort><creationdate>202309</creationdate><title>Robust quasi‐randomization‐based estimation with ensemble learning for missing data</title><author>Lee, Danhyang ; Zhang, Li‐Chun ; Chen, Sixia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2966-7f2757e4131f8ab28c776a63ac4ee77433ccfeea37a2eb3aa7437b0a3f6537ae3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>cell mean model</topic><topic>Data analysis</topic><topic>Ensemble learning</topic><topic>Estimators</topic><topic>item nonresponse</topic><topic>Machine learning</topic><topic>missing at random</topic><topic>Missing data</topic><topic>Randomization</topic><topic>Rao–Blackwell method</topic><topic>Robustness</topic><topic>Statistical analysis</topic><topic>Statistical methods</topic><topic>variance estimation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Danhyang</creatorcontrib><creatorcontrib>Zhang, Li‐Chun</creatorcontrib><creatorcontrib>Chen, Sixia</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Scandinavian journal of statistics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lee, Danhyang</au><au>Zhang, Li‐Chun</au><au>Chen, Sixia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust quasi‐randomization‐based estimation with ensemble learning for missing data</atitle><jtitle>Scandinavian journal of statistics</jtitle><date>2023-09</date><risdate>2023</risdate><volume>50</volume><issue>3</issue><spage>1263</spage><epage>1278</epage><pages>1263-1278</pages><issn>0303-6898</issn><eissn>1467-9469</eissn><abstract>Missing data analysis requires assumptions about an outcome model or a response probability model to adjust for potential bias due to nonresponse. Doubly robust (DR) estimators are consistent if at least one of the models is correctly specified. Multiply robust (MR) estimators extend DR estimators by allowing for multiple models for both the outcome and/or response probability models and are consistent if at least one of the multiple models is correctly specified. We propose a robust quasi‐randomization‐based model approach to bring more protection against model misspecification than the existing DR and MR estimators, where any multiple semiparametric, nonparametric or machine learning models can be used for the outcome variable. The proposed estimator achieves unbiasedness by using a subsampling Rao–Blackwell method, given cell‐homogenous response, regardless of any working models for the outcome. An unbiased variance estimation formula is proposed, which does not use any replicate jackknife or bootstrap methods. A simulation study shows that our proposed method outperforms the existing multiply robust estimators.</abstract><cop>Oxford</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/sjos.12626</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0003-2550-8460</orcidid><orcidid>https://orcid.org/0000-0001-5082-281X</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0303-6898
ispartof	Scandinavian journal of statistics, 2023-09, Vol.50 (3), p.1263-1278
issn	0303-6898 1467-9469
language	eng
recordid	cdi_proquest_journals_2851736172
source	Wiley Online Library Journals Frontfile Complete; Business Source Complete
subjects	cell mean model Data analysis Ensemble learning Estimators item nonresponse Machine learning missing at random Missing data Randomization Rao–Blackwell method Robustness Statistical analysis Statistical methods variance estimation
title	Robust quasi‐randomization‐based estimation with ensemble learning for missing data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T15%3A07%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20quasi%E2%80%90randomization%E2%80%90based%20estimation%20with%20ensemble%20learning%20for%20missing%20data&rft.jtitle=Scandinavian%20journal%20of%20statistics&rft.au=Lee,%20Danhyang&rft.date=2023-09&rft.volume=50&rft.issue=3&rft.spage=1263&rft.epage=1278&rft.pages=1263-1278&rft.issn=0303-6898&rft.eissn=1467-9469&rft_id=info:doi/10.1111/sjos.12626&rft_dat=%3Cproquest_cross%3E2851736172%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2851736172&rft_id=info:pmid/&rfr_iscdi=true