Missing value imputation methods for electronic health records

Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sh...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023-01, Vol.11, p.1-1
Hauptverfasser: Psychogyios, Konstantinos, Ilias, Loukas, Ntanos, Christos, Askounis, Dimitris
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue
container_start_page 1
container_title IEEE access
container_volume 11
creator Psychogyios, Konstantinos
Ilias, Loukas
Ntanos, Christos
Askounis, Dimitris
description Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sharing across different organizations. Moreover, this type of data is commonly used by researchers for predictive and classification purposes, employing statistical and machine learning methods. However, missingness is a phenomenon that is observed very frequently for such measurements. Even though this missingness is often significant, it is usually treated poorly with either case deletion or simple methods, resulting in suboptimal and/or inaccurate predictive results. This happens because the simple methods, e.g., k-nearest neighbors (kNN) and mean/mode imputation, fail in most cases to incorporate the complex relationships that define these medical datasets. To address these limitations, in this paper we test and improve state-of-the-art missing data imputation models and practices. We propose a new missing value imputation method based on denoising autoencoders (DAE) with kNN for the pre-imputation task. We optimize the training methodology by re-applying kNN to the missing data every N epochs using a different value for the variable k each time to yield more accurate results. We also revise a state-of-the-art missing data imputation approach based on a generative adversarial network (GAN). Using this as a baseline, we introduce improvements regarding both the architecture and the training procedure. These models are compared with the ones usually employed within clinical research studies for both the task of imputation and post-imputation prediction. Results show that our proposed deep learning approaches outperform the standard baselines, yielding better imputation and predictive results.
doi_str_mv 10.1109/ACCESS.2023.3251919
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2784554726</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10057378</ieee_id><doaj_id>oai_doaj_org_article_6a3685fc71a340bf9db964ef99427319</doaj_id><sourcerecordid>2784554726</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-a657361f9962f8238ddfb35a0bd228cb10b73df70fdc8a8a2e58c28fa760283b3</originalsourceid><addsrcrecordid>eNpNUE1Lw0AQDaJgqf0Fegh4Tt2P7NdFKKFqQfFQPS-b_Wi3pNm6mwj-e1NTpHOZ4THvvZmXZbcQzCEE4mFRVcv1eo4AwnOMCBRQXGQTBKkoMMH08my-zmYp7cBQfIAIm2SPbz4l327yb9X0Nvf7Q9-pzoc239tuG0zKXYi5bazuYmi9zrdWNd02j1aHaNJNduVUk-zs1KfZ59Pyo3opXt-fV9XitdAlEF2hKGGYQicERY4jzI1xNSYK1AYhrmsIaoaNY8AZzRVXyBKuEXeKUYA4rvE0W426JqidPES_V_FHBuXlHxDiRqrYed1YSRWmnDjNoMIlqJ0wtaClHbxLxDAUg9b9qHWI4au3qZO70Md2OF8ixktCSobosIXHLR1DStG6f1cI5DF3OeYuj7nLU-4D625keWvtGQMM_zOOfwER3334</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2784554726</pqid></control><display><type>article</type><title>Missing value imputation methods for electronic health records</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Psychogyios, Konstantinos ; Ilias, Loukas ; Ntanos, Christos ; Askounis, Dimitris</creator><creatorcontrib>Psychogyios, Konstantinos ; Ilias, Loukas ; Ntanos, Christos ; Askounis, Dimitris</creatorcontrib><description>Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sharing across different organizations. Moreover, this type of data is commonly used by researchers for predictive and classification purposes, employing statistical and machine learning methods. However, missingness is a phenomenon that is observed very frequently for such measurements. Even though this missingness is often significant, it is usually treated poorly with either case deletion or simple methods, resulting in suboptimal and/or inaccurate predictive results. This happens because the simple methods, e.g., k-nearest neighbors (kNN) and mean/mode imputation, fail in most cases to incorporate the complex relationships that define these medical datasets. To address these limitations, in this paper we test and improve state-of-the-art missing data imputation models and practices. We propose a new missing value imputation method based on denoising autoencoders (DAE) with kNN for the pre-imputation task. We optimize the training methodology by re-applying kNN to the missing data every N epochs using a different value for the variable k each time to yield more accurate results. We also revise a state-of-the-art missing data imputation approach based on a generative adversarial network (GAN). Using this as a baseline, we introduce improvements regarding both the architecture and the training procedure. These models are compared with the ones usually employed within clinical research studies for both the task of imputation and post-imputation prediction. Results show that our proposed deep learning approaches outperform the standard baselines, yielding better imputation and predictive results.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3251919</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Autoencoders ; Deep learning ; EHR ; Electronic health records ; Electronic medical records ; Generative adversarial networks ; Heart ; Laboratory tests ; Machine learning ; Missing data ; Missing value imputation ; Noise reduction ; Task analysis ; Training</subject><ispartof>IEEE access, 2023-01, Vol.11, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-a657361f9962f8238ddfb35a0bd228cb10b73df70fdc8a8a2e58c28fa760283b3</citedby><cites>FETCH-LOGICAL-c409t-a657361f9962f8238ddfb35a0bd228cb10b73df70fdc8a8a2e58c28fa760283b3</cites><orcidid>0000-0002-5162-6500 ; 0000-0002-4483-4264 ; 0000-0002-9971-9271</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10057378$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Psychogyios, Konstantinos</creatorcontrib><creatorcontrib>Ilias, Loukas</creatorcontrib><creatorcontrib>Ntanos, Christos</creatorcontrib><creatorcontrib>Askounis, Dimitris</creatorcontrib><title>Missing value imputation methods for electronic health records</title><title>IEEE access</title><addtitle>Access</addtitle><description>Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sharing across different organizations. Moreover, this type of data is commonly used by researchers for predictive and classification purposes, employing statistical and machine learning methods. However, missingness is a phenomenon that is observed very frequently for such measurements. Even though this missingness is often significant, it is usually treated poorly with either case deletion or simple methods, resulting in suboptimal and/or inaccurate predictive results. This happens because the simple methods, e.g., k-nearest neighbors (kNN) and mean/mode imputation, fail in most cases to incorporate the complex relationships that define these medical datasets. To address these limitations, in this paper we test and improve state-of-the-art missing data imputation models and practices. We propose a new missing value imputation method based on denoising autoencoders (DAE) with kNN for the pre-imputation task. We optimize the training methodology by re-applying kNN to the missing data every N epochs using a different value for the variable k each time to yield more accurate results. We also revise a state-of-the-art missing data imputation approach based on a generative adversarial network (GAN). Using this as a baseline, we introduce improvements regarding both the architecture and the training procedure. These models are compared with the ones usually employed within clinical research studies for both the task of imputation and post-imputation prediction. Results show that our proposed deep learning approaches outperform the standard baselines, yielding better imputation and predictive results.</description><subject>Autoencoders</subject><subject>Deep learning</subject><subject>EHR</subject><subject>Electronic health records</subject><subject>Electronic medical records</subject><subject>Generative adversarial networks</subject><subject>Heart</subject><subject>Laboratory tests</subject><subject>Machine learning</subject><subject>Missing data</subject><subject>Missing value imputation</subject><subject>Noise reduction</subject><subject>Task analysis</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1Lw0AQDaJgqf0Fegh4Tt2P7NdFKKFqQfFQPS-b_Wi3pNm6mwj-e1NTpHOZ4THvvZmXZbcQzCEE4mFRVcv1eo4AwnOMCBRQXGQTBKkoMMH08my-zmYp7cBQfIAIm2SPbz4l327yb9X0Nvf7Q9-pzoc239tuG0zKXYi5bazuYmi9zrdWNd02j1aHaNJNduVUk-zs1KfZ59Pyo3opXt-fV9XitdAlEF2hKGGYQicERY4jzI1xNSYK1AYhrmsIaoaNY8AZzRVXyBKuEXeKUYA4rvE0W426JqidPES_V_FHBuXlHxDiRqrYed1YSRWmnDjNoMIlqJ0wtaClHbxLxDAUg9b9qHWI4au3qZO70Md2OF8ixktCSobosIXHLR1DStG6f1cI5DF3OeYuj7nLU-4D625keWvtGQMM_zOOfwER3334</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Psychogyios, Konstantinos</creator><creator>Ilias, Loukas</creator><creator>Ntanos, Christos</creator><creator>Askounis, Dimitris</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5162-6500</orcidid><orcidid>https://orcid.org/0000-0002-4483-4264</orcidid><orcidid>https://orcid.org/0000-0002-9971-9271</orcidid></search><sort><creationdate>20230101</creationdate><title>Missing value imputation methods for electronic health records</title><author>Psychogyios, Konstantinos ; Ilias, Loukas ; Ntanos, Christos ; Askounis, Dimitris</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-a657361f9962f8238ddfb35a0bd228cb10b73df70fdc8a8a2e58c28fa760283b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Autoencoders</topic><topic>Deep learning</topic><topic>EHR</topic><topic>Electronic health records</topic><topic>Electronic medical records</topic><topic>Generative adversarial networks</topic><topic>Heart</topic><topic>Laboratory tests</topic><topic>Machine learning</topic><topic>Missing data</topic><topic>Missing value imputation</topic><topic>Noise reduction</topic><topic>Task analysis</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Psychogyios, Konstantinos</creatorcontrib><creatorcontrib>Ilias, Loukas</creatorcontrib><creatorcontrib>Ntanos, Christos</creatorcontrib><creatorcontrib>Askounis, Dimitris</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Psychogyios, Konstantinos</au><au>Ilias, Loukas</au><au>Ntanos, Christos</au><au>Askounis, Dimitris</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Missing value imputation methods for electronic health records</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023-01-01</date><risdate>2023</risdate><volume>11</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sharing across different organizations. Moreover, this type of data is commonly used by researchers for predictive and classification purposes, employing statistical and machine learning methods. However, missingness is a phenomenon that is observed very frequently for such measurements. Even though this missingness is often significant, it is usually treated poorly with either case deletion or simple methods, resulting in suboptimal and/or inaccurate predictive results. This happens because the simple methods, e.g., k-nearest neighbors (kNN) and mean/mode imputation, fail in most cases to incorporate the complex relationships that define these medical datasets. To address these limitations, in this paper we test and improve state-of-the-art missing data imputation models and practices. We propose a new missing value imputation method based on denoising autoencoders (DAE) with kNN for the pre-imputation task. We optimize the training methodology by re-applying kNN to the missing data every N epochs using a different value for the variable k each time to yield more accurate results. We also revise a state-of-the-art missing data imputation approach based on a generative adversarial network (GAN). Using this as a baseline, we introduce improvements regarding both the architecture and the training procedure. These models are compared with the ones usually employed within clinical research studies for both the task of imputation and post-imputation prediction. Results show that our proposed deep learning approaches outperform the standard baselines, yielding better imputation and predictive results.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3251919</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-5162-6500</orcidid><orcidid>https://orcid.org/0000-0002-4483-4264</orcidid><orcidid>https://orcid.org/0000-0002-9971-9271</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2023-01, Vol.11, p.1-1
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2784554726
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects Autoencoders
Deep learning
EHR
Electronic health records
Electronic medical records
Generative adversarial networks
Heart
Laboratory tests
Machine learning
Missing data
Missing value imputation
Noise reduction
Task analysis
Training
title Missing value imputation methods for electronic health records
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T20%3A43%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Missing%20value%20imputation%20methods%20for%20electronic%20health%20records&rft.jtitle=IEEE%20access&rft.au=Psychogyios,%20Konstantinos&rft.date=2023-01-01&rft.volume=11&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3251919&rft_dat=%3Cproquest_ieee_%3E2784554726%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2784554726&rft_id=info:pmid/&rft_ieee_id=10057378&rft_doaj_id=oai_doaj_org_article_6a3685fc71a340bf9db964ef99427319&rfr_iscdi=true