Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning

The classification analysis of incomplete and imbalanced data is still a challenging task since these issues could negatively impact the training of classifiers, which were also found in our study on the physical fitness assessments of patients. And in fields such as healthcare, there are higher req...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of biomedical and health informatics 2024-05, Vol.28 (5), p.3102-3113
Hauptverfasser:	Li, Jiaxi, Wang, Zhelong, Wu, Lina, Qiu, Sen, Zhao, Hongyu, Lin, Fang, Zhang, Ke
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms class imbalance Classifiers Costs Data incompleteness Data models Databases, Factual Datasets Ensemble learning Humans Learning Machine Learning malignant tumor patients Mathematical models Multivariate Analysis multivariate imputation by chained equations Physical fitness Physical Fitness - physiology physical fitness assessment Support vector machines Task analysis Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3113
container_issue	5
container_start_page	3102
container_title	IEEE journal of biomedical and health informatics
container_volume	28
creator	Li, Jiaxi Wang, Zhelong Wu, Lina Qiu, Sen Zhao, Hongyu Lin, Fang Zhang, Ke
description	The classification analysis of incomplete and imbalanced data is still a challenging task since these issues could negatively impact the training of classifiers, which were also found in our study on the physical fitness assessments of patients. And in fields such as healthcare, there are higher requirements for the accuracy of the generated imputation values. To train a high-performance classifier and pursue high accuracy, we attempted to resolve any potential negative impact by using a novel algorithmic approach based on the combination of multivariate imputation by chained equations and the ensemble learning method (MICEEN), which can solve the two problems simultaneously. We used multivariate imputation by chained equations to generate more accurate imputation values for the training set passed to ensemble learning to build a predictor. On the other hand, missing values were introduced into minority classes and used them to generate new samples belonging to the minority classes in order to balance the distribution of classes. On real-world datasets, we perform extensive experiments to assess our method and compare it to other state-of-the-art approaches. The advantages of the proposed method are demonstrated by experimental results for the benchmark datasets and self-collected datasets of physical fitness assessment of tumor patients with varying missing rates.
doi_str_mv	10.1109/JBHI.2024.3376428
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_38483807</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10472573</ieee_id><sourcerecordid>2958296967</sourcerecordid><originalsourceid>FETCH-LOGICAL-c302t-b172ca202d60dd031d01bc25c7181d58560ecafc8f6760ce23f46a855af1b88c3</originalsourceid><addsrcrecordid>eNpdkU9r20AQxZfS0AQnH6BQykIvvdjZP9JqdWxct1FwyCU5L6PdUa0grZxdKeBv33Vsl5K5zPD4zWOGR8hnzhacs_L67ua2WggmsoWUhcqE_kAuBFd6LgTTH08zL7NzchXjM0ulk1SqT-Rc6kxLzYoLMt3juBkcbYZAK2-HftvhiBS8o1VfQwfeoqM_YQR6AzGNg6f3Uze2rxBaSGTVb6cRxjbp9Y4uN9D6RK1epjctvjmtfMS-7pCuEYJv_Z9LctZAF_Hq2Gfk6dfqcXk7Xz_8rpY_1nMrmRjnNS-EhfSjU8w5JrljvLYitwXX3OU6VwwtNFY3qlDMopBNpkDnOTS81trKGfl-8N2G4WXCOJq-jRa79BYOUzSizLUoVamKhH57hz4PU_DpOiNZLriWXMlE8QNlwxBjwMZsQ9tD2BnOzD4Ws4_F7GMxx1jSztej81T36P5tnEJIwJcD0CLif4ZZIfJCyr_XT5Cd</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3052183163</pqid></control><display><type>article</type><title>Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Jiaxi ; Wang, Zhelong ; Wu, Lina ; Qiu, Sen ; Zhao, Hongyu ; Lin, Fang ; Zhang, Ke</creator><creatorcontrib>Li, Jiaxi ; Wang, Zhelong ; Wu, Lina ; Qiu, Sen ; Zhao, Hongyu ; Lin, Fang ; Zhang, Ke</creatorcontrib><description>The classification analysis of incomplete and imbalanced data is still a challenging task since these issues could negatively impact the training of classifiers, which were also found in our study on the physical fitness assessments of patients. And in fields such as healthcare, there are higher requirements for the accuracy of the generated imputation values. To train a high-performance classifier and pursue high accuracy, we attempted to resolve any potential negative impact by using a novel algorithmic approach based on the combination of multivariate imputation by chained equations and the ensemble learning method (MICEEN), which can solve the two problems simultaneously. We used multivariate imputation by chained equations to generate more accurate imputation values for the training set passed to ensemble learning to build a predictor. On the other hand, missing values were introduced into minority classes and used them to generate new samples belonging to the minority classes in order to balance the distribution of classes. On real-world datasets, we perform extensive experiments to assess our method and compare it to other state-of-the-art approaches. The advantages of the proposed method are demonstrated by experimental results for the benchmark datasets and self-collected datasets of physical fitness assessment of tumor patients with varying missing rates.</description><identifier>ISSN: 2168-2194</identifier><identifier>ISSN: 2168-2208</identifier><identifier>EISSN: 2168-2208</identifier><identifier>DOI: 10.1109/JBHI.2024.3376428</identifier><identifier>PMID: 38483807</identifier><identifier>CODEN: IJBHA9</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Accuracy ; Algorithms ; class imbalance ; Classifiers ; Costs ; Data incompleteness ; Data models ; Databases, Factual ; Datasets ; Ensemble learning ; Humans ; Learning ; Machine Learning ; malignant tumor patients ; Mathematical models ; Multivariate Analysis ; multivariate imputation by chained equations ; Physical fitness ; Physical Fitness - physiology ; physical fitness assessment ; Support vector machines ; Task analysis ; Training</subject><ispartof>IEEE journal of biomedical and health informatics, 2024-05, Vol.28 (5), p.3102-3113</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c302t-b172ca202d60dd031d01bc25c7181d58560ecafc8f6760ce23f46a855af1b88c3</cites><orcidid>0000-0001-6846-546X ; 0000-0001-9672-2902 ; 0000-0001-9108-5676 ; 0000-0003-1510-5289 ; 0009-0009-7116-2086 ; 0000-0003-4959-3372 ; 0000-0002-5855-540X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10472573$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10472573$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38483807$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Jiaxi</creatorcontrib><creatorcontrib>Wang, Zhelong</creatorcontrib><creatorcontrib>Wu, Lina</creatorcontrib><creatorcontrib>Qiu, Sen</creatorcontrib><creatorcontrib>Zhao, Hongyu</creatorcontrib><creatorcontrib>Lin, Fang</creatorcontrib><creatorcontrib>Zhang, Ke</creatorcontrib><title>Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning</title><title>IEEE journal of biomedical and health informatics</title><addtitle>JBHI</addtitle><addtitle>IEEE J Biomed Health Inform</addtitle><description>The classification analysis of incomplete and imbalanced data is still a challenging task since these issues could negatively impact the training of classifiers, which were also found in our study on the physical fitness assessments of patients. And in fields such as healthcare, there are higher requirements for the accuracy of the generated imputation values. To train a high-performance classifier and pursue high accuracy, we attempted to resolve any potential negative impact by using a novel algorithmic approach based on the combination of multivariate imputation by chained equations and the ensemble learning method (MICEEN), which can solve the two problems simultaneously. We used multivariate imputation by chained equations to generate more accurate imputation values for the training set passed to ensemble learning to build a predictor. On the other hand, missing values were introduced into minority classes and used them to generate new samples belonging to the minority classes in order to balance the distribution of classes. On real-world datasets, we perform extensive experiments to assess our method and compare it to other state-of-the-art approaches. The advantages of the proposed method are demonstrated by experimental results for the benchmark datasets and self-collected datasets of physical fitness assessment of tumor patients with varying missing rates.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>class imbalance</subject><subject>Classifiers</subject><subject>Costs</subject><subject>Data incompleteness</subject><subject>Data models</subject><subject>Databases, Factual</subject><subject>Datasets</subject><subject>Ensemble learning</subject><subject>Humans</subject><subject>Learning</subject><subject>Machine Learning</subject><subject>malignant tumor patients</subject><subject>Mathematical models</subject><subject>Multivariate Analysis</subject><subject>multivariate imputation by chained equations</subject><subject>Physical fitness</subject><subject>Physical Fitness - physiology</subject><subject>physical fitness assessment</subject><subject>Support vector machines</subject><subject>Task analysis</subject><subject>Training</subject><issn>2168-2194</issn><issn>2168-2208</issn><issn>2168-2208</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkU9r20AQxZfS0AQnH6BQykIvvdjZP9JqdWxct1FwyCU5L6PdUa0grZxdKeBv33Vsl5K5zPD4zWOGR8hnzhacs_L67ua2WggmsoWUhcqE_kAuBFd6LgTTH08zL7NzchXjM0ulk1SqT-Rc6kxLzYoLMt3juBkcbYZAK2-HftvhiBS8o1VfQwfeoqM_YQR6AzGNg6f3Uze2rxBaSGTVb6cRxjbp9Y4uN9D6RK1epjctvjmtfMS-7pCuEYJv_Z9LctZAF_Hq2Gfk6dfqcXk7Xz_8rpY_1nMrmRjnNS-EhfSjU8w5JrljvLYitwXX3OU6VwwtNFY3qlDMopBNpkDnOTS81trKGfl-8N2G4WXCOJq-jRa79BYOUzSizLUoVamKhH57hz4PU_DpOiNZLriWXMlE8QNlwxBjwMZsQ9tD2BnOzD4Ws4_F7GMxx1jSztej81T36P5tnEJIwJcD0CLif4ZZIfJCyr_XT5Cd</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Li, Jiaxi</creator><creator>Wang, Zhelong</creator><creator>Wu, Lina</creator><creator>Qiu, Sen</creator><creator>Zhao, Hongyu</creator><creator>Lin, Fang</creator><creator>Zhang, Ke</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>NAPCQ</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-6846-546X</orcidid><orcidid>https://orcid.org/0000-0001-9672-2902</orcidid><orcidid>https://orcid.org/0000-0001-9108-5676</orcidid><orcidid>https://orcid.org/0000-0003-1510-5289</orcidid><orcidid>https://orcid.org/0009-0009-7116-2086</orcidid><orcidid>https://orcid.org/0000-0003-4959-3372</orcidid><orcidid>https://orcid.org/0000-0002-5855-540X</orcidid></search><sort><creationdate>20240501</creationdate><title>Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning</title><author>Li, Jiaxi ; Wang, Zhelong ; Wu, Lina ; Qiu, Sen ; Zhao, Hongyu ; Lin, Fang ; Zhang, Ke</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c302t-b172ca202d60dd031d01bc25c7181d58560ecafc8f6760ce23f46a855af1b88c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>class imbalance</topic><topic>Classifiers</topic><topic>Costs</topic><topic>Data incompleteness</topic><topic>Data models</topic><topic>Databases, Factual</topic><topic>Datasets</topic><topic>Ensemble learning</topic><topic>Humans</topic><topic>Learning</topic><topic>Machine Learning</topic><topic>malignant tumor patients</topic><topic>Mathematical models</topic><topic>Multivariate Analysis</topic><topic>multivariate imputation by chained equations</topic><topic>Physical fitness</topic><topic>Physical Fitness - physiology</topic><topic>physical fitness assessment</topic><topic>Support vector machines</topic><topic>Task analysis</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Jiaxi</creatorcontrib><creatorcontrib>Wang, Zhelong</creatorcontrib><creatorcontrib>Wu, Lina</creatorcontrib><creatorcontrib>Qiu, Sen</creatorcontrib><creatorcontrib>Zhao, Hongyu</creatorcontrib><creatorcontrib>Lin, Fang</creatorcontrib><creatorcontrib>Zhang, Ke</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Nursing & Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE journal of biomedical and health informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Jiaxi</au><au>Wang, Zhelong</au><au>Wu, Lina</au><au>Qiu, Sen</au><au>Zhao, Hongyu</au><au>Lin, Fang</au><au>Zhang, Ke</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning</atitle><jtitle>IEEE journal of biomedical and health informatics</jtitle><stitle>JBHI</stitle><addtitle>IEEE J Biomed Health Inform</addtitle><date>2024-05-01</date><risdate>2024</risdate><volume>28</volume><issue>5</issue><spage>3102</spage><epage>3113</epage><pages>3102-3113</pages><issn>2168-2194</issn><issn>2168-2208</issn><eissn>2168-2208</eissn><coden>IJBHA9</coden><abstract>The classification analysis of incomplete and imbalanced data is still a challenging task since these issues could negatively impact the training of classifiers, which were also found in our study on the physical fitness assessments of patients. And in fields such as healthcare, there are higher requirements for the accuracy of the generated imputation values. To train a high-performance classifier and pursue high accuracy, we attempted to resolve any potential negative impact by using a novel algorithmic approach based on the combination of multivariate imputation by chained equations and the ensemble learning method (MICEEN), which can solve the two problems simultaneously. We used multivariate imputation by chained equations to generate more accurate imputation values for the training set passed to ensemble learning to build a predictor. On the other hand, missing values were introduced into minority classes and used them to generate new samples belonging to the minority classes in order to balance the distribution of classes. On real-world datasets, we perform extensive experiments to assess our method and compare it to other state-of-the-art approaches. The advantages of the proposed method are demonstrated by experimental results for the benchmark datasets and self-collected datasets of physical fitness assessment of tumor patients with varying missing rates.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38483807</pmid><doi>10.1109/JBHI.2024.3376428</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-6846-546X</orcidid><orcidid>https://orcid.org/0000-0001-9672-2902</orcidid><orcidid>https://orcid.org/0000-0001-9108-5676</orcidid><orcidid>https://orcid.org/0000-0003-1510-5289</orcidid><orcidid>https://orcid.org/0009-0009-7116-2086</orcidid><orcidid>https://orcid.org/0000-0003-4959-3372</orcidid><orcidid>https://orcid.org/0000-0002-5855-540X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2168-2194
ispartof	IEEE journal of biomedical and health informatics, 2024-05, Vol.28 (5), p.3102-3113
issn	2168-2194 2168-2208 2168-2208
language	eng
recordid	cdi_pubmed_primary_38483807
source	IEEE Electronic Library (IEL)
subjects	Accuracy Algorithms class imbalance Classifiers Costs Data incompleteness Data models Databases, Factual Datasets Ensemble learning Humans Learning Machine Learning malignant tumor patients Mathematical models Multivariate Analysis multivariate imputation by chained equations Physical fitness Physical Fitness - physiology physical fitness assessment Support vector machines Task analysis Training
title	Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T04%3A29%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Method%20for%20Incomplete%20and%20Imbalanced%20Data%20Based%20on%20Multivariate%20Imputation%20by%20Chained%20Equations%20and%20Ensemble%20Learning&rft.jtitle=IEEE%20journal%20of%20biomedical%20and%20health%20informatics&rft.au=Li,%20Jiaxi&rft.date=2024-05-01&rft.volume=28&rft.issue=5&rft.spage=3102&rft.epage=3113&rft.pages=3102-3113&rft.issn=2168-2194&rft.eissn=2168-2208&rft.coden=IJBHA9&rft_id=info:doi/10.1109/JBHI.2024.3376428&rft_dat=%3Cproquest_RIE%3E2958296967%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3052183163&rft_id=info:pmid/38483807&rft_ieee_id=10472573&rfr_iscdi=true