Imputation of missing longitudinal data: a comparison of methods
Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depressio...
Gespeichert in:
Veröffentlicht in: | Journal of clinical epidemiology 2003-10, Vol.56 (10), p.968-976 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 976 |
---|---|
container_issue | 10 |
container_start_page | 968 |
container_title | Journal of clinical epidemiology |
container_volume | 56 |
creator | Engels, Jean Mundahl Diehr, Paula |
description | Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depression, weight, cognitive functioning, and self-rated health in a longitudinal cohort of older adults.
We identified situations where a person had a known value following one or more missing values, and treated the known value as a “missing value.” This “missing value” was imputed using each method and compared to the observed value. Methods were compared on the root mean square error, mean absolute deviation, bias, and relative variance of the estimates.
Most imputation methods were biased toward estimating the “missing value” as too healthy, and most estimates had a variance that was too low. Imputed values based on a person's values before and after the “missing value” were superior to other methods, followed by imputations based on a person's values before the “missing value.” Imputations that used no information specific to the person, such as using the sample mean, had the worst performance.
We conclude that, in longitudinal studies where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data in a longitudinal sequence should be imputed from the available longitudinal data for that person. |
doi_str_mv | 10.1016/S0895-4356(03)00170-7 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_71314241</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0895435603001707</els_id><sourcerecordid>71314241</sourcerecordid><originalsourceid>FETCH-LOGICAL-c471t-4f7c365d357b0c4b2f7ba216fd4b2d2cb5b4e0d14d8be1eb2644c044507e99c23</originalsourceid><addsrcrecordid>eNqF0E1r2zAYwHExWpr05SNsGMbKdnCnR69OL2sJaxso9LDtLGRJzlRsK5XsQb99lcSs0EtP0uH3PEh_hD4CvgAM4vsvXC14ySgXXzH9hjFIXMoPaA6VrEq-IHCA5v_JDB2n9LhDkh-hGTAuKkGqObpadZtx0IMPfRGaovMp-X5dtKFf-2G0vtdtYfWgLwtdmNBtdPRpom74G2w6RYeNbpM7m84T9Ofm5-_lXXn_cLtaXt-XhkkYStZIQwW3lMsaG1aTRtaagGhsvltial4zhy0wW9UOXE0EYwYzxrF0i4Uh9ASd7_duYngaXRpUfqtxbat7F8akJFBghEGGn9_AxzDG_I-kAFMKglaiyorvlYkhpegatYm-0_E5I7UNrHaB1baewlTt2imZ5z5N28e6c_Z1aiqawZcJ6GR020TdG59eHScUKN66H3vncrR_3kWVjHe9cdZHZwZlg3_nKS8dnZZL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1033163868</pqid></control><display><type>article</type><title>Imputation of missing longitudinal data: a comparison of methods</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Engels, Jean Mundahl ; Diehr, Paula</creator><creatorcontrib>Engels, Jean Mundahl ; Diehr, Paula</creatorcontrib><description>Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depression, weight, cognitive functioning, and self-rated health in a longitudinal cohort of older adults.
We identified situations where a person had a known value following one or more missing values, and treated the known value as a “missing value.” This “missing value” was imputed using each method and compared to the observed value. Methods were compared on the root mean square error, mean absolute deviation, bias, and relative variance of the estimates.
Most imputation methods were biased toward estimating the “missing value” as too healthy, and most estimates had a variance that was too low. Imputed values based on a person's values before and after the “missing value” were superior to other methods, followed by imputations based on a person's values before the “missing value.” Imputations that used no information specific to the person, such as using the sample mean, had the worst performance.
We conclude that, in longitudinal studies where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data in a longitudinal sequence should be imputed from the available longitudinal data for that person.</description><identifier>ISSN: 0895-4356</identifier><identifier>EISSN: 1878-5921</identifier><identifier>DOI: 10.1016/S0895-4356(03)00170-7</identifier><identifier>PMID: 14568628</identifier><language>eng</language><publisher>New York, NY: Elsevier Inc</publisher><subject>Aged ; Analysis of Variance ; Bias ; Biological and medical sciences ; Cardiovascular disease ; Cognitive ability ; Cohort ; Computerized, statistical medical data processing and models in biomedicine ; Coronary Disease - epidemiology ; Data Interpretation, Statistical ; Depression ; Depression - epidemiology ; Epidemiology ; Female ; Generalized linear models ; Health Status ; Humans ; Imputation ; Longitudinal ; Longitudinal Studies ; Male ; Medical sciences ; Medical statistics ; Missing data ; Older people ; Research Design ; Risk Factors ; Statistical methods ; Stroke - epidemiology ; Studies ; United States - epidemiology ; Variables</subject><ispartof>Journal of clinical epidemiology, 2003-10, Vol.56 (10), p.968-976</ispartof><rights>2003 Elsevier Inc.</rights><rights>2004 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c471t-4f7c365d357b0c4b2f7ba216fd4b2d2cb5b4e0d14d8be1eb2644c044507e99c23</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0895435603001707$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=15231308$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/14568628$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Engels, Jean Mundahl</creatorcontrib><creatorcontrib>Diehr, Paula</creatorcontrib><title>Imputation of missing longitudinal data: a comparison of methods</title><title>Journal of clinical epidemiology</title><addtitle>J Clin Epidemiol</addtitle><description>Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depression, weight, cognitive functioning, and self-rated health in a longitudinal cohort of older adults.
We identified situations where a person had a known value following one or more missing values, and treated the known value as a “missing value.” This “missing value” was imputed using each method and compared to the observed value. Methods were compared on the root mean square error, mean absolute deviation, bias, and relative variance of the estimates.
Most imputation methods were biased toward estimating the “missing value” as too healthy, and most estimates had a variance that was too low. Imputed values based on a person's values before and after the “missing value” were superior to other methods, followed by imputations based on a person's values before the “missing value.” Imputations that used no information specific to the person, such as using the sample mean, had the worst performance.
We conclude that, in longitudinal studies where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data in a longitudinal sequence should be imputed from the available longitudinal data for that person.</description><subject>Aged</subject><subject>Analysis of Variance</subject><subject>Bias</subject><subject>Biological and medical sciences</subject><subject>Cardiovascular disease</subject><subject>Cognitive ability</subject><subject>Cohort</subject><subject>Computerized, statistical medical data processing and models in biomedicine</subject><subject>Coronary Disease - epidemiology</subject><subject>Data Interpretation, Statistical</subject><subject>Depression</subject><subject>Depression - epidemiology</subject><subject>Epidemiology</subject><subject>Female</subject><subject>Generalized linear models</subject><subject>Health Status</subject><subject>Humans</subject><subject>Imputation</subject><subject>Longitudinal</subject><subject>Longitudinal Studies</subject><subject>Male</subject><subject>Medical sciences</subject><subject>Medical statistics</subject><subject>Missing data</subject><subject>Older people</subject><subject>Research Design</subject><subject>Risk Factors</subject><subject>Statistical methods</subject><subject>Stroke - epidemiology</subject><subject>Studies</subject><subject>United States - epidemiology</subject><subject>Variables</subject><issn>0895-4356</issn><issn>1878-5921</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqF0E1r2zAYwHExWpr05SNsGMbKdnCnR69OL2sJaxso9LDtLGRJzlRsK5XsQb99lcSs0EtP0uH3PEh_hD4CvgAM4vsvXC14ySgXXzH9hjFIXMoPaA6VrEq-IHCA5v_JDB2n9LhDkh-hGTAuKkGqObpadZtx0IMPfRGaovMp-X5dtKFf-2G0vtdtYfWgLwtdmNBtdPRpom74G2w6RYeNbpM7m84T9Ofm5-_lXXn_cLtaXt-XhkkYStZIQwW3lMsaG1aTRtaagGhsvltial4zhy0wW9UOXE0EYwYzxrF0i4Uh9ASd7_duYngaXRpUfqtxbat7F8akJFBghEGGn9_AxzDG_I-kAFMKglaiyorvlYkhpegatYm-0_E5I7UNrHaB1baewlTt2imZ5z5N28e6c_Z1aiqawZcJ6GR020TdG59eHScUKN66H3vncrR_3kWVjHe9cdZHZwZlg3_nKS8dnZZL</recordid><startdate>20031001</startdate><enddate>20031001</enddate><creator>Engels, Jean Mundahl</creator><creator>Diehr, Paula</creator><general>Elsevier Inc</general><general>Elsevier</general><general>Elsevier Limited</general><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QL</scope><scope>7QP</scope><scope>7RV</scope><scope>7T2</scope><scope>7T7</scope><scope>7TK</scope><scope>7U7</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88C</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>H94</scope><scope>K9.</scope><scope>KB0</scope><scope>M0S</scope><scope>M0T</scope><scope>M1P</scope><scope>M2O</scope><scope>M7N</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope></search><sort><creationdate>20031001</creationdate><title>Imputation of missing longitudinal data: a comparison of methods</title><author>Engels, Jean Mundahl ; Diehr, Paula</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c471t-4f7c365d357b0c4b2f7ba216fd4b2d2cb5b4e0d14d8be1eb2644c044507e99c23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Aged</topic><topic>Analysis of Variance</topic><topic>Bias</topic><topic>Biological and medical sciences</topic><topic>Cardiovascular disease</topic><topic>Cognitive ability</topic><topic>Cohort</topic><topic>Computerized, statistical medical data processing and models in biomedicine</topic><topic>Coronary Disease - epidemiology</topic><topic>Data Interpretation, Statistical</topic><topic>Depression</topic><topic>Depression - epidemiology</topic><topic>Epidemiology</topic><topic>Female</topic><topic>Generalized linear models</topic><topic>Health Status</topic><topic>Humans</topic><topic>Imputation</topic><topic>Longitudinal</topic><topic>Longitudinal Studies</topic><topic>Male</topic><topic>Medical sciences</topic><topic>Medical statistics</topic><topic>Missing data</topic><topic>Older people</topic><topic>Research Design</topic><topic>Risk Factors</topic><topic>Statistical methods</topic><topic>Stroke - epidemiology</topic><topic>Studies</topic><topic>United States - epidemiology</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Engels, Jean Mundahl</creatorcontrib><creatorcontrib>Diehr, Paula</creatorcontrib><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Proquest Nursing & Allied Health Source</collection><collection>Health and Safety Science Abstracts (Full archive)</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Healthcare Administration Database</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Research Library (Corporate)</collection><collection>Nursing & Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of clinical epidemiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Engels, Jean Mundahl</au><au>Diehr, Paula</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Imputation of missing longitudinal data: a comparison of methods</atitle><jtitle>Journal of clinical epidemiology</jtitle><addtitle>J Clin Epidemiol</addtitle><date>2003-10-01</date><risdate>2003</risdate><volume>56</volume><issue>10</issue><spage>968</spage><epage>976</epage><pages>968-976</pages><issn>0895-4356</issn><eissn>1878-5921</eissn><abstract>Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depression, weight, cognitive functioning, and self-rated health in a longitudinal cohort of older adults.
We identified situations where a person had a known value following one or more missing values, and treated the known value as a “missing value.” This “missing value” was imputed using each method and compared to the observed value. Methods were compared on the root mean square error, mean absolute deviation, bias, and relative variance of the estimates.
Most imputation methods were biased toward estimating the “missing value” as too healthy, and most estimates had a variance that was too low. Imputed values based on a person's values before and after the “missing value” were superior to other methods, followed by imputations based on a person's values before the “missing value.” Imputations that used no information specific to the person, such as using the sample mean, had the worst performance.
We conclude that, in longitudinal studies where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data in a longitudinal sequence should be imputed from the available longitudinal data for that person.</abstract><cop>New York, NY</cop><pub>Elsevier Inc</pub><pmid>14568628</pmid><doi>10.1016/S0895-4356(03)00170-7</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0895-4356 |
ispartof | Journal of clinical epidemiology, 2003-10, Vol.56 (10), p.968-976 |
issn | 0895-4356 1878-5921 |
language | eng |
recordid | cdi_proquest_miscellaneous_71314241 |
source | MEDLINE; Elsevier ScienceDirect Journals |
subjects | Aged Analysis of Variance Bias Biological and medical sciences Cardiovascular disease Cognitive ability Cohort Computerized, statistical medical data processing and models in biomedicine Coronary Disease - epidemiology Data Interpretation, Statistical Depression Depression - epidemiology Epidemiology Female Generalized linear models Health Status Humans Imputation Longitudinal Longitudinal Studies Male Medical sciences Medical statistics Missing data Older people Research Design Risk Factors Statistical methods Stroke - epidemiology Studies United States - epidemiology Variables |
title | Imputation of missing longitudinal data: a comparison of methods |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T02%3A54%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Imputation%20of%20missing%20longitudinal%20data:%20a%20comparison%20of%20methods&rft.jtitle=Journal%20of%20clinical%20epidemiology&rft.au=Engels,%20Jean%20Mundahl&rft.date=2003-10-01&rft.volume=56&rft.issue=10&rft.spage=968&rft.epage=976&rft.pages=968-976&rft.issn=0895-4356&rft.eissn=1878-5921&rft_id=info:doi/10.1016/S0895-4356(03)00170-7&rft_dat=%3Cproquest_cross%3E71314241%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1033163868&rft_id=info:pmid/14568628&rft_els_id=S0895435603001707&rfr_iscdi=true |