Imputation of missing longitudinal data: a comparison of methods

Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depressio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of clinical epidemiology 2003-10, Vol.56 (10), p.968-976
Hauptverfasser: Engels, Jean Mundahl, Diehr, Paula
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 976
container_issue 10
container_start_page 968
container_title Journal of clinical epidemiology
container_volume 56
creator Engels, Jean Mundahl
Diehr, Paula
description Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depression, weight, cognitive functioning, and self-rated health in a longitudinal cohort of older adults. We identified situations where a person had a known value following one or more missing values, and treated the known value as a “missing value.” This “missing value” was imputed using each method and compared to the observed value. Methods were compared on the root mean square error, mean absolute deviation, bias, and relative variance of the estimates. Most imputation methods were biased toward estimating the “missing value” as too healthy, and most estimates had a variance that was too low. Imputed values based on a person's values before and after the “missing value” were superior to other methods, followed by imputations based on a person's values before the “missing value.” Imputations that used no information specific to the person, such as using the sample mean, had the worst performance. We conclude that, in longitudinal studies where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data in a longitudinal sequence should be imputed from the available longitudinal data for that person.
doi_str_mv 10.1016/S0895-4356(03)00170-7
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_71314241</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0895435603001707</els_id><sourcerecordid>71314241</sourcerecordid><originalsourceid>FETCH-LOGICAL-c471t-4f7c365d357b0c4b2f7ba216fd4b2d2cb5b4e0d14d8be1eb2644c044507e99c23</originalsourceid><addsrcrecordid>eNqF0E1r2zAYwHExWpr05SNsGMbKdnCnR69OL2sJaxso9LDtLGRJzlRsK5XsQb99lcSs0EtP0uH3PEh_hD4CvgAM4vsvXC14ySgXXzH9hjFIXMoPaA6VrEq-IHCA5v_JDB2n9LhDkh-hGTAuKkGqObpadZtx0IMPfRGaovMp-X5dtKFf-2G0vtdtYfWgLwtdmNBtdPRpom74G2w6RYeNbpM7m84T9Ofm5-_lXXn_cLtaXt-XhkkYStZIQwW3lMsaG1aTRtaagGhsvltial4zhy0wW9UOXE0EYwYzxrF0i4Uh9ASd7_duYngaXRpUfqtxbat7F8akJFBghEGGn9_AxzDG_I-kAFMKglaiyorvlYkhpegatYm-0_E5I7UNrHaB1baewlTt2imZ5z5N28e6c_Z1aiqawZcJ6GR020TdG59eHScUKN66H3vncrR_3kWVjHe9cdZHZwZlg3_nKS8dnZZL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1033163868</pqid></control><display><type>article</type><title>Imputation of missing longitudinal data: a comparison of methods</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Engels, Jean Mundahl ; Diehr, Paula</creator><creatorcontrib>Engels, Jean Mundahl ; Diehr, Paula</creatorcontrib><description>Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depression, weight, cognitive functioning, and self-rated health in a longitudinal cohort of older adults. We identified situations where a person had a known value following one or more missing values, and treated the known value as a “missing value.” This “missing value” was imputed using each method and compared to the observed value. Methods were compared on the root mean square error, mean absolute deviation, bias, and relative variance of the estimates. Most imputation methods were biased toward estimating the “missing value” as too healthy, and most estimates had a variance that was too low. Imputed values based on a person's values before and after the “missing value” were superior to other methods, followed by imputations based on a person's values before the “missing value.” Imputations that used no information specific to the person, such as using the sample mean, had the worst performance. We conclude that, in longitudinal studies where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data in a longitudinal sequence should be imputed from the available longitudinal data for that person.</description><identifier>ISSN: 0895-4356</identifier><identifier>EISSN: 1878-5921</identifier><identifier>DOI: 10.1016/S0895-4356(03)00170-7</identifier><identifier>PMID: 14568628</identifier><language>eng</language><publisher>New York, NY: Elsevier Inc</publisher><subject>Aged ; Analysis of Variance ; Bias ; Biological and medical sciences ; Cardiovascular disease ; Cognitive ability ; Cohort ; Computerized, statistical medical data processing and models in biomedicine ; Coronary Disease - epidemiology ; Data Interpretation, Statistical ; Depression ; Depression - epidemiology ; Epidemiology ; Female ; Generalized linear models ; Health Status ; Humans ; Imputation ; Longitudinal ; Longitudinal Studies ; Male ; Medical sciences ; Medical statistics ; Missing data ; Older people ; Research Design ; Risk Factors ; Statistical methods ; Stroke - epidemiology ; Studies ; United States - epidemiology ; Variables</subject><ispartof>Journal of clinical epidemiology, 2003-10, Vol.56 (10), p.968-976</ispartof><rights>2003 Elsevier Inc.</rights><rights>2004 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c471t-4f7c365d357b0c4b2f7ba216fd4b2d2cb5b4e0d14d8be1eb2644c044507e99c23</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0895435603001707$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=15231308$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/14568628$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Engels, Jean Mundahl</creatorcontrib><creatorcontrib>Diehr, Paula</creatorcontrib><title>Imputation of missing longitudinal data: a comparison of methods</title><title>Journal of clinical epidemiology</title><addtitle>J Clin Epidemiol</addtitle><description>Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depression, weight, cognitive functioning, and self-rated health in a longitudinal cohort of older adults. We identified situations where a person had a known value following one or more missing values, and treated the known value as a “missing value.” This “missing value” was imputed using each method and compared to the observed value. Methods were compared on the root mean square error, mean absolute deviation, bias, and relative variance of the estimates. Most imputation methods were biased toward estimating the “missing value” as too healthy, and most estimates had a variance that was too low. Imputed values based on a person's values before and after the “missing value” were superior to other methods, followed by imputations based on a person's values before the “missing value.” Imputations that used no information specific to the person, such as using the sample mean, had the worst performance. We conclude that, in longitudinal studies where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data in a longitudinal sequence should be imputed from the available longitudinal data for that person.</description><subject>Aged</subject><subject>Analysis of Variance</subject><subject>Bias</subject><subject>Biological and medical sciences</subject><subject>Cardiovascular disease</subject><subject>Cognitive ability</subject><subject>Cohort</subject><subject>Computerized, statistical medical data processing and models in biomedicine</subject><subject>Coronary Disease - epidemiology</subject><subject>Data Interpretation, Statistical</subject><subject>Depression</subject><subject>Depression - epidemiology</subject><subject>Epidemiology</subject><subject>Female</subject><subject>Generalized linear models</subject><subject>Health Status</subject><subject>Humans</subject><subject>Imputation</subject><subject>Longitudinal</subject><subject>Longitudinal Studies</subject><subject>Male</subject><subject>Medical sciences</subject><subject>Medical statistics</subject><subject>Missing data</subject><subject>Older people</subject><subject>Research Design</subject><subject>Risk Factors</subject><subject>Statistical methods</subject><subject>Stroke - epidemiology</subject><subject>Studies</subject><subject>United States - epidemiology</subject><subject>Variables</subject><issn>0895-4356</issn><issn>1878-5921</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqF0E1r2zAYwHExWpr05SNsGMbKdnCnR69OL2sJaxso9LDtLGRJzlRsK5XsQb99lcSs0EtP0uH3PEh_hD4CvgAM4vsvXC14ySgXXzH9hjFIXMoPaA6VrEq-IHCA5v_JDB2n9LhDkh-hGTAuKkGqObpadZtx0IMPfRGaovMp-X5dtKFf-2G0vtdtYfWgLwtdmNBtdPRpom74G2w6RYeNbpM7m84T9Ofm5-_lXXn_cLtaXt-XhkkYStZIQwW3lMsaG1aTRtaagGhsvltial4zhy0wW9UOXE0EYwYzxrF0i4Uh9ASd7_duYngaXRpUfqtxbat7F8akJFBghEGGn9_AxzDG_I-kAFMKglaiyorvlYkhpegatYm-0_E5I7UNrHaB1baewlTt2imZ5z5N28e6c_Z1aiqawZcJ6GR020TdG59eHScUKN66H3vncrR_3kWVjHe9cdZHZwZlg3_nKS8dnZZL</recordid><startdate>20031001</startdate><enddate>20031001</enddate><creator>Engels, Jean Mundahl</creator><creator>Diehr, Paula</creator><general>Elsevier Inc</general><general>Elsevier</general><general>Elsevier Limited</general><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QL</scope><scope>7QP</scope><scope>7RV</scope><scope>7T2</scope><scope>7T7</scope><scope>7TK</scope><scope>7U7</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88C</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>H94</scope><scope>K9.</scope><scope>KB0</scope><scope>M0S</scope><scope>M0T</scope><scope>M1P</scope><scope>M2O</scope><scope>M7N</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope></search><sort><creationdate>20031001</creationdate><title>Imputation of missing longitudinal data: a comparison of methods</title><author>Engels, Jean Mundahl ; Diehr, Paula</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c471t-4f7c365d357b0c4b2f7ba216fd4b2d2cb5b4e0d14d8be1eb2644c044507e99c23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Aged</topic><topic>Analysis of Variance</topic><topic>Bias</topic><topic>Biological and medical sciences</topic><topic>Cardiovascular disease</topic><topic>Cognitive ability</topic><topic>Cohort</topic><topic>Computerized, statistical medical data processing and models in biomedicine</topic><topic>Coronary Disease - epidemiology</topic><topic>Data Interpretation, Statistical</topic><topic>Depression</topic><topic>Depression - epidemiology</topic><topic>Epidemiology</topic><topic>Female</topic><topic>Generalized linear models</topic><topic>Health Status</topic><topic>Humans</topic><topic>Imputation</topic><topic>Longitudinal</topic><topic>Longitudinal Studies</topic><topic>Male</topic><topic>Medical sciences</topic><topic>Medical statistics</topic><topic>Missing data</topic><topic>Older people</topic><topic>Research Design</topic><topic>Risk Factors</topic><topic>Statistical methods</topic><topic>Stroke - epidemiology</topic><topic>Studies</topic><topic>United States - epidemiology</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Engels, Jean Mundahl</creatorcontrib><creatorcontrib>Diehr, Paula</creatorcontrib><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Proquest Nursing &amp; Allied Health Source</collection><collection>Health and Safety Science Abstracts (Full archive)</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Healthcare Administration Database</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Research Library (Corporate)</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of clinical epidemiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Engels, Jean Mundahl</au><au>Diehr, Paula</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Imputation of missing longitudinal data: a comparison of methods</atitle><jtitle>Journal of clinical epidemiology</jtitle><addtitle>J Clin Epidemiol</addtitle><date>2003-10-01</date><risdate>2003</risdate><volume>56</volume><issue>10</issue><spage>968</spage><epage>976</epage><pages>968-976</pages><issn>0895-4356</issn><eissn>1878-5921</eissn><abstract>Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depression, weight, cognitive functioning, and self-rated health in a longitudinal cohort of older adults. We identified situations where a person had a known value following one or more missing values, and treated the known value as a “missing value.” This “missing value” was imputed using each method and compared to the observed value. Methods were compared on the root mean square error, mean absolute deviation, bias, and relative variance of the estimates. Most imputation methods were biased toward estimating the “missing value” as too healthy, and most estimates had a variance that was too low. Imputed values based on a person's values before and after the “missing value” were superior to other methods, followed by imputations based on a person's values before the “missing value.” Imputations that used no information specific to the person, such as using the sample mean, had the worst performance. We conclude that, in longitudinal studies where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data in a longitudinal sequence should be imputed from the available longitudinal data for that person.</abstract><cop>New York, NY</cop><pub>Elsevier Inc</pub><pmid>14568628</pmid><doi>10.1016/S0895-4356(03)00170-7</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0895-4356
ispartof Journal of clinical epidemiology, 2003-10, Vol.56 (10), p.968-976
issn 0895-4356
1878-5921
language eng
recordid cdi_proquest_miscellaneous_71314241
source MEDLINE; Elsevier ScienceDirect Journals
subjects Aged
Analysis of Variance
Bias
Biological and medical sciences
Cardiovascular disease
Cognitive ability
Cohort
Computerized, statistical medical data processing and models in biomedicine
Coronary Disease - epidemiology
Data Interpretation, Statistical
Depression
Depression - epidemiology
Epidemiology
Female
Generalized linear models
Health Status
Humans
Imputation
Longitudinal
Longitudinal Studies
Male
Medical sciences
Medical statistics
Missing data
Older people
Research Design
Risk Factors
Statistical methods
Stroke - epidemiology
Studies
United States - epidemiology
Variables
title Imputation of missing longitudinal data: a comparison of methods
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T02%3A54%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Imputation%20of%20missing%20longitudinal%20data:%20a%20comparison%20of%20methods&rft.jtitle=Journal%20of%20clinical%20epidemiology&rft.au=Engels,%20Jean%20Mundahl&rft.date=2003-10-01&rft.volume=56&rft.issue=10&rft.spage=968&rft.epage=976&rft.pages=968-976&rft.issn=0895-4356&rft.eissn=1878-5921&rft_id=info:doi/10.1016/S0895-4356(03)00170-7&rft_dat=%3Cproquest_cross%3E71314241%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1033163868&rft_id=info:pmid/14568628&rft_els_id=S0895435603001707&rfr_iscdi=true