Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap

Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of se...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of survey statistics and methodology 2016-06, Vol.4 (2), p.139-170
Hauptverfasser: Zhou, Hanzhi, Elliott, Michael R, Raghunathan, Trivellore E
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 170
container_issue 2
container_start_page 139
container_title Journal of survey statistics and methodology
container_volume 4
creator Zhou, Hanzhi
Elliott, Michael R
Raghunathan, Trivellore E
description Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.
doi_str_mv 10.1093/jssam/smv031
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5719896</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1975596092</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</originalsourceid><addsrcrecordid>eNpVkc1LxDAQxYMoKurNs-TowWo-Nu3ORdDFL1AUXPEYssl0N9I2tUkV_3urq4ueZph5vDfDj5B9zo45A3nyEqOpT2L9xiRfI9tCCpUxALG-6sejLbIX4wtjjEsYF8A2yZYAIXKe822yuOur5NsK6U3d9skkHxrqGzp9D9ljMnOkk6qPCTv6aOpBFulT9M2cThdIn9HPFwkdvfSNT0gfQttXS4dz84HRm6EJIcXUmXaXbJSmirj3U3fI9PJiOrnObu-vbiZnt5kVADxz3EqFTjnFcodgi2FcAirphg9tyRFms5HjvOTWyBK4gLFiQtjClShHQu6Q06Vt289qdBabIbzSbedr033oYLz-v2n8Qs_Dm1YFhzHkg8Hhj0EXXnuMSdc-Wqwq02Doo-ZQKAU5g6-so6XUdiHGDstVDGf6C4_-xqOXeAb5wd_TVuJfGPITEjiPyQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1975596092</pqid></control><display><type>article</type><title>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</title><source>Oxford University Press Journals All Titles (1996-Current)</source><creator>Zhou, Hanzhi ; Elliott, Michael R ; Raghunathan, Trivellore E</creator><creatorcontrib>Zhou, Hanzhi ; Elliott, Michael R ; Raghunathan, Trivellore E</creatorcontrib><description>Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.</description><identifier>ISSN: 2325-0984</identifier><identifier>EISSN: 2325-0992</identifier><identifier>DOI: 10.1093/jssam/smv031</identifier><identifier>PMID: 29226161</identifier><language>eng</language><publisher>United States</publisher><ispartof>Journal of survey statistics and methodology, 2016-06, Vol.4 (2), p.139-170</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</citedby><cites>FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,315,781,785,886,27928,27929</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29226161$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhou, Hanzhi</creatorcontrib><creatorcontrib>Elliott, Michael R</creatorcontrib><creatorcontrib>Raghunathan, Trivellore E</creatorcontrib><title>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</title><title>Journal of survey statistics and methodology</title><addtitle>J Surv Stat Methodol</addtitle><description>Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.</description><issn>2325-0984</issn><issn>2325-0992</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNpVkc1LxDAQxYMoKurNs-TowWo-Nu3ORdDFL1AUXPEYssl0N9I2tUkV_3urq4ueZph5vDfDj5B9zo45A3nyEqOpT2L9xiRfI9tCCpUxALG-6sejLbIX4wtjjEsYF8A2yZYAIXKe822yuOur5NsK6U3d9skkHxrqGzp9D9ljMnOkk6qPCTv6aOpBFulT9M2cThdIn9HPFwkdvfSNT0gfQttXS4dz84HRm6EJIcXUmXaXbJSmirj3U3fI9PJiOrnObu-vbiZnt5kVADxz3EqFTjnFcodgi2FcAirphg9tyRFms5HjvOTWyBK4gLFiQtjClShHQu6Q06Vt289qdBabIbzSbedr033oYLz-v2n8Qs_Dm1YFhzHkg8Hhj0EXXnuMSdc-Wqwq02Doo-ZQKAU5g6-so6XUdiHGDstVDGf6C4_-xqOXeAb5wd_TVuJfGPITEjiPyQ</recordid><startdate>20160601</startdate><enddate>20160601</enddate><creator>Zhou, Hanzhi</creator><creator>Elliott, Michael R</creator><creator>Raghunathan, Trivellore E</creator><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20160601</creationdate><title>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</title><author>Zhou, Hanzhi ; Elliott, Michael R ; Raghunathan, Trivellore E</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Hanzhi</creatorcontrib><creatorcontrib>Elliott, Michael R</creatorcontrib><creatorcontrib>Raghunathan, Trivellore E</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of survey statistics and methodology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Hanzhi</au><au>Elliott, Michael R</au><au>Raghunathan, Trivellore E</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</atitle><jtitle>Journal of survey statistics and methodology</jtitle><addtitle>J Surv Stat Methodol</addtitle><date>2016-06-01</date><risdate>2016</risdate><volume>4</volume><issue>2</issue><spage>139</spage><epage>170</epage><pages>139-170</pages><issn>2325-0984</issn><eissn>2325-0992</eissn><abstract>Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.</abstract><cop>United States</cop><pmid>29226161</pmid><doi>10.1093/jssam/smv031</doi><tpages>32</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2325-0984
ispartof Journal of survey statistics and methodology, 2016-06, Vol.4 (2), p.139-170
issn 2325-0984
2325-0992
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5719896
source Oxford University Press Journals All Titles (1996-Current)
title Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T20%3A42%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multiple%20Imputation%20in%20Two-Stage%20Cluster%20Samples%20Using%20The%20Weighted%20Finite%20Population%20Bayesian%20Bootstrap&rft.jtitle=Journal%20of%20survey%20statistics%20and%20methodology&rft.au=Zhou,%20Hanzhi&rft.date=2016-06-01&rft.volume=4&rft.issue=2&rft.spage=139&rft.epage=170&rft.pages=139-170&rft.issn=2325-0984&rft.eissn=2325-0992&rft_id=info:doi/10.1093/jssam/smv031&rft_dat=%3Cproquest_pubme%3E1975596092%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1975596092&rft_id=info:pmid/29226161&rfr_iscdi=true