Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap
Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of se...
Gespeichert in:
Veröffentlicht in: | Journal of survey statistics and methodology 2016-06, Vol.4 (2), p.139-170 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 170 |
---|---|
container_issue | 2 |
container_start_page | 139 |
container_title | Journal of survey statistics and methodology |
container_volume | 4 |
creator | Zhou, Hanzhi Elliott, Michael R Raghunathan, Trivellore E |
description | Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure. |
doi_str_mv | 10.1093/jssam/smv031 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5719896</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1975596092</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</originalsourceid><addsrcrecordid>eNpVkc1LxDAQxYMoKurNs-TowWo-Nu3ORdDFL1AUXPEYssl0N9I2tUkV_3urq4ueZph5vDfDj5B9zo45A3nyEqOpT2L9xiRfI9tCCpUxALG-6sejLbIX4wtjjEsYF8A2yZYAIXKe822yuOur5NsK6U3d9skkHxrqGzp9D9ljMnOkk6qPCTv6aOpBFulT9M2cThdIn9HPFwkdvfSNT0gfQttXS4dz84HRm6EJIcXUmXaXbJSmirj3U3fI9PJiOrnObu-vbiZnt5kVADxz3EqFTjnFcodgi2FcAirphg9tyRFms5HjvOTWyBK4gLFiQtjClShHQu6Q06Vt289qdBabIbzSbedr033oYLz-v2n8Qs_Dm1YFhzHkg8Hhj0EXXnuMSdc-Wqwq02Doo-ZQKAU5g6-so6XUdiHGDstVDGf6C4_-xqOXeAb5wd_TVuJfGPITEjiPyQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1975596092</pqid></control><display><type>article</type><title>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</title><source>Oxford University Press Journals All Titles (1996-Current)</source><creator>Zhou, Hanzhi ; Elliott, Michael R ; Raghunathan, Trivellore E</creator><creatorcontrib>Zhou, Hanzhi ; Elliott, Michael R ; Raghunathan, Trivellore E</creatorcontrib><description>Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.</description><identifier>ISSN: 2325-0984</identifier><identifier>EISSN: 2325-0992</identifier><identifier>DOI: 10.1093/jssam/smv031</identifier><identifier>PMID: 29226161</identifier><language>eng</language><publisher>United States</publisher><ispartof>Journal of survey statistics and methodology, 2016-06, Vol.4 (2), p.139-170</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</citedby><cites>FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,315,781,785,886,27928,27929</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29226161$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhou, Hanzhi</creatorcontrib><creatorcontrib>Elliott, Michael R</creatorcontrib><creatorcontrib>Raghunathan, Trivellore E</creatorcontrib><title>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</title><title>Journal of survey statistics and methodology</title><addtitle>J Surv Stat Methodol</addtitle><description>Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.</description><issn>2325-0984</issn><issn>2325-0992</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNpVkc1LxDAQxYMoKurNs-TowWo-Nu3ORdDFL1AUXPEYssl0N9I2tUkV_3urq4ueZph5vDfDj5B9zo45A3nyEqOpT2L9xiRfI9tCCpUxALG-6sejLbIX4wtjjEsYF8A2yZYAIXKe822yuOur5NsK6U3d9skkHxrqGzp9D9ljMnOkk6qPCTv6aOpBFulT9M2cThdIn9HPFwkdvfSNT0gfQttXS4dz84HRm6EJIcXUmXaXbJSmirj3U3fI9PJiOrnObu-vbiZnt5kVADxz3EqFTjnFcodgi2FcAirphg9tyRFms5HjvOTWyBK4gLFiQtjClShHQu6Q06Vt289qdBabIbzSbedr033oYLz-v2n8Qs_Dm1YFhzHkg8Hhj0EXXnuMSdc-Wqwq02Doo-ZQKAU5g6-so6XUdiHGDstVDGf6C4_-xqOXeAb5wd_TVuJfGPITEjiPyQ</recordid><startdate>20160601</startdate><enddate>20160601</enddate><creator>Zhou, Hanzhi</creator><creator>Elliott, Michael R</creator><creator>Raghunathan, Trivellore E</creator><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20160601</creationdate><title>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</title><author>Zhou, Hanzhi ; Elliott, Michael R ; Raghunathan, Trivellore E</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2991-d1c35ed5d506de9c7c29f9e53d031cf1e9bb4d11f1ca3f912985022c7dfe3423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Hanzhi</creatorcontrib><creatorcontrib>Elliott, Michael R</creatorcontrib><creatorcontrib>Raghunathan, Trivellore E</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of survey statistics and methodology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Hanzhi</au><au>Elliott, Michael R</au><au>Raghunathan, Trivellore E</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap</atitle><jtitle>Journal of survey statistics and methodology</jtitle><addtitle>J Surv Stat Methodol</addtitle><date>2016-06-01</date><risdate>2016</risdate><volume>4</volume><issue>2</issue><spage>139</spage><epage>170</epage><pages>139-170</pages><issn>2325-0984</issn><eissn>2325-0992</eissn><abstract>Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.</abstract><cop>United States</cop><pmid>29226161</pmid><doi>10.1093/jssam/smv031</doi><tpages>32</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2325-0984 |
ispartof | Journal of survey statistics and methodology, 2016-06, Vol.4 (2), p.139-170 |
issn | 2325-0984 2325-0992 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5719896 |
source | Oxford University Press Journals All Titles (1996-Current) |
title | Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T20%3A42%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multiple%20Imputation%20in%20Two-Stage%20Cluster%20Samples%20Using%20The%20Weighted%20Finite%20Population%20Bayesian%20Bootstrap&rft.jtitle=Journal%20of%20survey%20statistics%20and%20methodology&rft.au=Zhou,%20Hanzhi&rft.date=2016-06-01&rft.volume=4&rft.issue=2&rft.spage=139&rft.epage=170&rft.pages=139-170&rft.issn=2325-0984&rft.eissn=2325-0992&rft_id=info:doi/10.1093/jssam/smv031&rft_dat=%3Cproquest_pubme%3E1975596092%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1975596092&rft_id=info:pmid/29226161&rfr_iscdi=true |