Missing value imputation methods for electronic health records
Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sh...
Gespeichert in:
Veröffentlicht in: | IEEE access 2023-01, Vol.11, p.1-1 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE access |
container_volume | 11 |
creator | Psychogyios, Konstantinos Ilias, Loukas Ntanos, Christos Askounis, Dimitris |
description | Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sharing across different organizations. Moreover, this type of data is commonly used by researchers for predictive and classification purposes, employing statistical and machine learning methods. However, missingness is a phenomenon that is observed very frequently for such measurements. Even though this missingness is often significant, it is usually treated poorly with either case deletion or simple methods, resulting in suboptimal and/or inaccurate predictive results. This happens because the simple methods, e.g., k-nearest neighbors (kNN) and mean/mode imputation, fail in most cases to incorporate the complex relationships that define these medical datasets. To address these limitations, in this paper we test and improve state-of-the-art missing data imputation models and practices. We propose a new missing value imputation method based on denoising autoencoders (DAE) with kNN for the pre-imputation task. We optimize the training methodology by re-applying kNN to the missing data every N epochs using a different value for the variable k each time to yield more accurate results. We also revise a state-of-the-art missing data imputation approach based on a generative adversarial network (GAN). Using this as a baseline, we introduce improvements regarding both the architecture and the training procedure. These models are compared with the ones usually employed within clinical research studies for both the task of imputation and post-imputation prediction. Results show that our proposed deep learning approaches outperform the standard baselines, yielding better imputation and predictive results. |
doi_str_mv | 10.1109/ACCESS.2023.3251919 |
format | Article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2784554726</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10057378</ieee_id><doaj_id>oai_doaj_org_article_6a3685fc71a340bf9db964ef99427319</doaj_id><sourcerecordid>2784554726</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-a657361f9962f8238ddfb35a0bd228cb10b73df70fdc8a8a2e58c28fa760283b3</originalsourceid><addsrcrecordid>eNpNUE1Lw0AQDaJgqf0Fegh4Tt2P7NdFKKFqQfFQPS-b_Wi3pNm6mwj-e1NTpHOZ4THvvZmXZbcQzCEE4mFRVcv1eo4AwnOMCBRQXGQTBKkoMMH08my-zmYp7cBQfIAIm2SPbz4l327yb9X0Nvf7Q9-pzoc239tuG0zKXYi5bazuYmi9zrdWNd02j1aHaNJNduVUk-zs1KfZ59Pyo3opXt-fV9XitdAlEF2hKGGYQicERY4jzI1xNSYK1AYhrmsIaoaNY8AZzRVXyBKuEXeKUYA4rvE0W426JqidPES_V_FHBuXlHxDiRqrYed1YSRWmnDjNoMIlqJ0wtaClHbxLxDAUg9b9qHWI4au3qZO70Md2OF8ixktCSobosIXHLR1DStG6f1cI5DF3OeYuj7nLU-4D625keWvtGQMM_zOOfwER3334</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2784554726</pqid></control><display><type>article</type><title>Missing value imputation methods for electronic health records</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Psychogyios, Konstantinos ; Ilias, Loukas ; Ntanos, Christos ; Askounis, Dimitris</creator><creatorcontrib>Psychogyios, Konstantinos ; Ilias, Loukas ; Ntanos, Christos ; Askounis, Dimitris</creatorcontrib><description>Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sharing across different organizations. Moreover, this type of data is commonly used by researchers for predictive and classification purposes, employing statistical and machine learning methods. However, missingness is a phenomenon that is observed very frequently for such measurements. Even though this missingness is often significant, it is usually treated poorly with either case deletion or simple methods, resulting in suboptimal and/or inaccurate predictive results. This happens because the simple methods, e.g., k-nearest neighbors (kNN) and mean/mode imputation, fail in most cases to incorporate the complex relationships that define these medical datasets. To address these limitations, in this paper we test and improve state-of-the-art missing data imputation models and practices. We propose a new missing value imputation method based on denoising autoencoders (DAE) with kNN for the pre-imputation task. We optimize the training methodology by re-applying kNN to the missing data every N epochs using a different value for the variable k each time to yield more accurate results. We also revise a state-of-the-art missing data imputation approach based on a generative adversarial network (GAN). Using this as a baseline, we introduce improvements regarding both the architecture and the training procedure. These models are compared with the ones usually employed within clinical research studies for both the task of imputation and post-imputation prediction. Results show that our proposed deep learning approaches outperform the standard baselines, yielding better imputation and predictive results.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3251919</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Autoencoders ; Deep learning ; EHR ; Electronic health records ; Electronic medical records ; Generative adversarial networks ; Heart ; Laboratory tests ; Machine learning ; Missing data ; Missing value imputation ; Noise reduction ; Task analysis ; Training</subject><ispartof>IEEE access, 2023-01, Vol.11, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-a657361f9962f8238ddfb35a0bd228cb10b73df70fdc8a8a2e58c28fa760283b3</citedby><cites>FETCH-LOGICAL-c409t-a657361f9962f8238ddfb35a0bd228cb10b73df70fdc8a8a2e58c28fa760283b3</cites><orcidid>0000-0002-5162-6500 ; 0000-0002-4483-4264 ; 0000-0002-9971-9271</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10057378$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Psychogyios, Konstantinos</creatorcontrib><creatorcontrib>Ilias, Loukas</creatorcontrib><creatorcontrib>Ntanos, Christos</creatorcontrib><creatorcontrib>Askounis, Dimitris</creatorcontrib><title>Missing value imputation methods for electronic health records</title><title>IEEE access</title><addtitle>Access</addtitle><description>Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sharing across different organizations. Moreover, this type of data is commonly used by researchers for predictive and classification purposes, employing statistical and machine learning methods. However, missingness is a phenomenon that is observed very frequently for such measurements. Even though this missingness is often significant, it is usually treated poorly with either case deletion or simple methods, resulting in suboptimal and/or inaccurate predictive results. This happens because the simple methods, e.g., k-nearest neighbors (kNN) and mean/mode imputation, fail in most cases to incorporate the complex relationships that define these medical datasets. To address these limitations, in this paper we test and improve state-of-the-art missing data imputation models and practices. We propose a new missing value imputation method based on denoising autoencoders (DAE) with kNN for the pre-imputation task. We optimize the training methodology by re-applying kNN to the missing data every N epochs using a different value for the variable k each time to yield more accurate results. We also revise a state-of-the-art missing data imputation approach based on a generative adversarial network (GAN). Using this as a baseline, we introduce improvements regarding both the architecture and the training procedure. These models are compared with the ones usually employed within clinical research studies for both the task of imputation and post-imputation prediction. Results show that our proposed deep learning approaches outperform the standard baselines, yielding better imputation and predictive results.</description><subject>Autoencoders</subject><subject>Deep learning</subject><subject>EHR</subject><subject>Electronic health records</subject><subject>Electronic medical records</subject><subject>Generative adversarial networks</subject><subject>Heart</subject><subject>Laboratory tests</subject><subject>Machine learning</subject><subject>Missing data</subject><subject>Missing value imputation</subject><subject>Noise reduction</subject><subject>Task analysis</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1Lw0AQDaJgqf0Fegh4Tt2P7NdFKKFqQfFQPS-b_Wi3pNm6mwj-e1NTpHOZ4THvvZmXZbcQzCEE4mFRVcv1eo4AwnOMCBRQXGQTBKkoMMH08my-zmYp7cBQfIAIm2SPbz4l327yb9X0Nvf7Q9-pzoc239tuG0zKXYi5bazuYmi9zrdWNd02j1aHaNJNduVUk-zs1KfZ59Pyo3opXt-fV9XitdAlEF2hKGGYQicERY4jzI1xNSYK1AYhrmsIaoaNY8AZzRVXyBKuEXeKUYA4rvE0W426JqidPES_V_FHBuXlHxDiRqrYed1YSRWmnDjNoMIlqJ0wtaClHbxLxDAUg9b9qHWI4au3qZO70Md2OF8ixktCSobosIXHLR1DStG6f1cI5DF3OeYuj7nLU-4D625keWvtGQMM_zOOfwER3334</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Psychogyios, Konstantinos</creator><creator>Ilias, Loukas</creator><creator>Ntanos, Christos</creator><creator>Askounis, Dimitris</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5162-6500</orcidid><orcidid>https://orcid.org/0000-0002-4483-4264</orcidid><orcidid>https://orcid.org/0000-0002-9971-9271</orcidid></search><sort><creationdate>20230101</creationdate><title>Missing value imputation methods for electronic health records</title><author>Psychogyios, Konstantinos ; Ilias, Loukas ; Ntanos, Christos ; Askounis, Dimitris</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-a657361f9962f8238ddfb35a0bd228cb10b73df70fdc8a8a2e58c28fa760283b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Autoencoders</topic><topic>Deep learning</topic><topic>EHR</topic><topic>Electronic health records</topic><topic>Electronic medical records</topic><topic>Generative adversarial networks</topic><topic>Heart</topic><topic>Laboratory tests</topic><topic>Machine learning</topic><topic>Missing data</topic><topic>Missing value imputation</topic><topic>Noise reduction</topic><topic>Task analysis</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Psychogyios, Konstantinos</creatorcontrib><creatorcontrib>Ilias, Loukas</creatorcontrib><creatorcontrib>Ntanos, Christos</creatorcontrib><creatorcontrib>Askounis, Dimitris</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Psychogyios, Konstantinos</au><au>Ilias, Loukas</au><au>Ntanos, Christos</au><au>Askounis, Dimitris</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Missing value imputation methods for electronic health records</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023-01-01</date><risdate>2023</risdate><volume>11</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sharing across different organizations. Moreover, this type of data is commonly used by researchers for predictive and classification purposes, employing statistical and machine learning methods. However, missingness is a phenomenon that is observed very frequently for such measurements. Even though this missingness is often significant, it is usually treated poorly with either case deletion or simple methods, resulting in suboptimal and/or inaccurate predictive results. This happens because the simple methods, e.g., k-nearest neighbors (kNN) and mean/mode imputation, fail in most cases to incorporate the complex relationships that define these medical datasets. To address these limitations, in this paper we test and improve state-of-the-art missing data imputation models and practices. We propose a new missing value imputation method based on denoising autoencoders (DAE) with kNN for the pre-imputation task. We optimize the training methodology by re-applying kNN to the missing data every N epochs using a different value for the variable k each time to yield more accurate results. We also revise a state-of-the-art missing data imputation approach based on a generative adversarial network (GAN). Using this as a baseline, we introduce improvements regarding both the architecture and the training procedure. These models are compared with the ones usually employed within clinical research studies for both the task of imputation and post-imputation prediction. Results show that our proposed deep learning approaches outperform the standard baselines, yielding better imputation and predictive results.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3251919</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-5162-6500</orcidid><orcidid>https://orcid.org/0000-0002-4483-4264</orcidid><orcidid>https://orcid.org/0000-0002-9971-9271</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2023-01, Vol.11, p.1-1 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_2784554726 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals |
subjects | Autoencoders Deep learning EHR Electronic health records Electronic medical records Generative adversarial networks Heart Laboratory tests Machine learning Missing data Missing value imputation Noise reduction Task analysis Training |
title | Missing value imputation methods for electronic health records |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T20%3A43%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Missing%20value%20imputation%20methods%20for%20electronic%20health%20records&rft.jtitle=IEEE%20access&rft.au=Psychogyios,%20Konstantinos&rft.date=2023-01-01&rft.volume=11&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3251919&rft_dat=%3Cproquest_ieee_%3E2784554726%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2784554726&rft_id=info:pmid/&rft_ieee_id=10057378&rft_doaj_id=oai_doaj_org_article_6a3685fc71a340bf9db964ef99427319&rfr_iscdi=true |