Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap

Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of se...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of survey statistics and methodology 2016-06, Vol.4 (2), p.139-170
Hauptverfasser:	Zhou, Hanzhi, Elliott, Michael R, Raghunathan, Trivellore E
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	170
container_issue	2
container_start_page	139
container_title	Journal of survey statistics and methodology
container_volume	4
creator	Zhou, Hanzhi Elliott, Michael R Raghunathan, Trivellore E
description	Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.
doi_str_mv	10.1093/jssam/smv031
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5719896</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1975596092</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</originalsourceid><addsrcrecordid>eNpVkc1LxDAQxYMoKurNs-TowWo-Nu3ORdDFL1AUXPEYssl0N9I2tUkV_3urq4ueZph5vDfDj5B9zo45A3nyEqOpT2L9xiRfI9tCCpUxALG-6sejLbIX4wtjjEsYF8A2yZYAIXKe822yuOur5NsK6U3d9skkHxrqGzp9D9ljMnOkk6qPCTv6aOpBFulT9M2cThdIn9HPFwkdvfSNT0gfQttXS4dz84HRm6EJIcXUmXaXbJSmirj3U3fI9PJiOrnObu-vbiZnt5kVADxz3EqFTjnFcodgi2FcAirphg9tyRFms5HjvOTWyBK4gLFiQtjClShHQu6Q06Vt289qdBabIbzSbedr033oYLz-v2n8Qs_Dm1YFhzHkg8Hhj0EXXnuMSdc-Wqwq02Doo-ZQKAU5g6-so6XUdiHGDstVDGf6C4_-xqOXeAb5wd_TVuJfGPITEjiPyQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1975596092</pqid></control><display><type>article</type><title>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</title><source>Oxford University Press Journals All Titles (1996-Current)</source><creator>Zhou, Hanzhi ; Elliott, Michael R ; Raghunathan, Trivellore E</creator><creatorcontrib>Zhou, Hanzhi ; Elliott, Michael R ; Raghunathan, Trivellore E</creatorcontrib><description>Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.</description><identifier>ISSN: 2325-0984</identifier><identifier>EISSN: 2325-0992</identifier><identifier>DOI: 10.1093/jssam/smv031</identifier><identifier>PMID: 29226161</identifier><language>eng</language><publisher>United States</publisher><ispartof>Journal of survey statistics and methodology, 2016-06, Vol.4 (2), p.139-170</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</citedby><cites>FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,315,781,785,886,27928,27929</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29226161$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhou, Hanzhi</creatorcontrib><creatorcontrib>Elliott, Michael R</creatorcontrib><creatorcontrib>Raghunathan, Trivellore E</creatorcontrib><title>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</title><title>Journal of survey statistics and methodology</title><addtitle>J Surv Stat Methodol</addtitle><description>Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.</description><issn>2325-0984</issn><issn>2325-0992</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNpVkc1LxDAQxYMoKurNs-TowWo-Nu3ORdDFL1AUXPEYssl0N9I2tUkV_3urq4ueZph5vDfDj5B9zo45A3nyEqOpT2L9xiRfI9tCCpUxALG-6sejLbIX4wtjjEsYF8A2yZYAIXKe822yuOur5NsK6U3d9skkHxrqGzp9D9ljMnOkk6qPCTv6aOpBFulT9M2cThdIn9HPFwkdvfSNT0gfQttXS4dz84HRm6EJIcXUmXaXbJSmirj3U3fI9PJiOrnObu-vbiZnt5kVADxz3EqFTjnFcodgi2FcAirphg9tyRFms5HjvOTWyBK4gLFiQtjClShHQu6Q06Vt289qdBabIbzSbedr033oYLz-v2n8Qs_Dm1YFhzHkg8Hhj0EXXnuMSdc-Wqwq02Doo-ZQKAU5g6-so6XUdiHGDstVDGf6C4_-xqOXeAb5wd_TVuJfGPITEjiPyQ</recordid><startdate>20160601</startdate><enddate>20160601</enddate><creator>Zhou, Hanzhi</creator><creator>Elliott, Michael R</creator><creator>Raghunathan, Trivellore E</creator><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20160601</creationdate><title>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</title><author>Zhou, Hanzhi ; Elliott, Michael R ; Raghunathan, Trivellore E</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Hanzhi</creatorcontrib><creatorcontrib>Elliott, Michael R</creatorcontrib><creatorcontrib>Raghunathan, Trivellore E</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of survey statistics and methodology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Hanzhi</au><au>Elliott, Michael R</au><au>Raghunathan, Trivellore E</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</atitle><jtitle>Journal of survey statistics and methodology</jtitle><addtitle>J Surv Stat Methodol</addtitle><date>2016-06-01</date><risdate>2016</risdate><volume>4</volume><issue>2</issue><spage>139</spage><epage>170</epage><pages>139-170</pages><issn>2325-0984</issn><eissn>2325-0992</eissn><abstract>Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.</abstract><cop>United States</cop><pmid>29226161</pmid><doi>10.1093/jssam/smv031</doi><tpages>32</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2325-0984
ispartof	Journal of survey statistics and methodology, 2016-06, Vol.4 (2), p.139-170
issn	2325-0984 2325-0992
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5719896
source	Oxford University Press Journals All Titles (1996-Current)
title	Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T20%3A42%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multiple%20Imputation%20in%20Two-Stage%20Cluster%20Samples%20Using%20The%20Weighted%20Finite%20Population%20Bayesian%20Bootstrap&rft.jtitle=Journal%20of%20survey%20statistics%20and%20methodology&rft.au=Zhou,%20Hanzhi&rft.date=2016-06-01&rft.volume=4&rft.issue=2&rft.spage=139&rft.epage=170&rft.pages=139-170&rft.issn=2325-0984&rft.eissn=2325-0992&rft_id=info:doi/10.1093/jssam/smv031&rft_dat=%3Cproquest_pubme%3E1975596092%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1975596092&rft_id=info:pmid/29226161&rfr_iscdi=true