Multiple retrieval case-based reasoning for incomplete datasets

[Display omitted] •Method preserves the most accurate and robust CBR ranking for every missingness type.•Performance enhances for larger datasets and rising number of affected variables.•Variable types impact onto CBR retrieval is reinforced by the missingness type.•Ignoring incomplete data results...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2019-04, Vol.92, p.103127-103127, Article 103127
Hauptverfasser: Löw, Nikolas, Hesser, Jürgen, Blessing, Manuel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 103127
container_issue
container_start_page 103127
container_title Journal of biomedical informatics
container_volume 92
creator Löw, Nikolas
Hesser, Jürgen
Blessing, Manuel
description [Display omitted] •Method preserves the most accurate and robust CBR ranking for every missingness type.•Performance enhances for larger datasets and rising number of affected variables.•Variable types impact onto CBR retrieval is reinforced by the missingness type.•Ignoring incomplete data results is less trustworthy CBR ranking than data imputation. The performance of case-based reasoning (CBR) depends on an accurate ranking of similar cases in the retrieval phase that affects all subsequent phases and profits from the potential of large databases. Unfortunately, growing databases come along with a rising amount of missing data that reduces the stability of the ranking since incomplete cases cannot be ranked as reliable as complete ones. In context of CBR hardly any work was done so far to rigorously analyze the impact of missing data and solutions to tackle this issue. In particular, a generalized solution which is able to process data under different missingness conditions for different variable types is missing. In this paper we present a multiple retrieval case-based reasoning (MRCBR) framework for incomplete databases that provides a statistically accurate ranking for similar cases. It unifies the advantages of multiple imputation and CBR while it preserves both the data distribution and database structure. Built as generalized CBR system, MRCBR was optimized and tested for medical decision support but can be extended to any CBR requirement as well. It is suitable for numerical and categorical variables and all sorts of missingness conditions. The approach was compared to eight competing methods applicable to handle incomplete databases in context of CBR. The comparison to the true ranking was based on two various error measures. In the evaluation we tested four representative scenarios that considered different conditions for missing data analysis. The outcome for every method in each scenario resulted in 200 miscellaneous setups. MRCBR outperforms all compared CBR methods in presence of missing data and shows reliable and stable results in every scenario. Especially with larger databases and rising number of incomplete variables it enlarges its lead to all other methods. Our study demonstrates that missing data must not be ignored when a correct CBR outcome is required.
doi_str_mv 10.1016/j.jbi.2019.103127
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2209607971</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1532046419300450</els_id><sourcerecordid>2209607971</sourcerecordid><originalsourceid>FETCH-LOGICAL-c396t-fa5d61dddc6f14d9bf6878776d4cf7b96630e20a1f3fe5c7894925dfb9638fbe3</originalsourceid><addsrcrecordid>eNp9kE1PxCAQhonRuLr6A7yYHr10hdJCiQdjNn4la7zomVAYDE23XYFu4r-XTdc9emGY4Zk34UHoiuAFwYTdtou2cYsCE5F6Sgp-hM5IRYsclzU-PtxZOUPnIbQYE1JV7BTNKOaclHV5hu7fxi66TQeZh-gdbFWXaRUgb9Jh0lCFoXf9V2YHn7leD-vERsiMigmI4QKdWNUFuNzXOfp8evxYvuSr9-fX5cMq11SwmFtVGUaMMZpZUhrRWFbzmnNmSm15IxijGAqsiKUWKs1rUYqiMja90No2QOfoZsrd-OF7hBDl2gUNXad6GMYgiwILhrngJKFkQrUfQvBg5ca7tfI_kmC58yZbmbzJnTc5eUs71_v4sVmDOWz8iUrA3QRA-uTWgZdBO-g1GOdBR2kG90_8L9VLfeU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2209607971</pqid></control><display><type>article</type><title>Multiple retrieval case-based reasoning for incomplete datasets</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>ScienceDirect Journals (5 years ago - present)</source><creator>Löw, Nikolas ; Hesser, Jürgen ; Blessing, Manuel</creator><creatorcontrib>Löw, Nikolas ; Hesser, Jürgen ; Blessing, Manuel</creatorcontrib><description>[Display omitted] •Method preserves the most accurate and robust CBR ranking for every missingness type.•Performance enhances for larger datasets and rising number of affected variables.•Variable types impact onto CBR retrieval is reinforced by the missingness type.•Ignoring incomplete data results is less trustworthy CBR ranking than data imputation. The performance of case-based reasoning (CBR) depends on an accurate ranking of similar cases in the retrieval phase that affects all subsequent phases and profits from the potential of large databases. Unfortunately, growing databases come along with a rising amount of missing data that reduces the stability of the ranking since incomplete cases cannot be ranked as reliable as complete ones. In context of CBR hardly any work was done so far to rigorously analyze the impact of missing data and solutions to tackle this issue. In particular, a generalized solution which is able to process data under different missingness conditions for different variable types is missing. In this paper we present a multiple retrieval case-based reasoning (MRCBR) framework for incomplete databases that provides a statistically accurate ranking for similar cases. It unifies the advantages of multiple imputation and CBR while it preserves both the data distribution and database structure. Built as generalized CBR system, MRCBR was optimized and tested for medical decision support but can be extended to any CBR requirement as well. It is suitable for numerical and categorical variables and all sorts of missingness conditions. The approach was compared to eight competing methods applicable to handle incomplete databases in context of CBR. The comparison to the true ranking was based on two various error measures. In the evaluation we tested four representative scenarios that considered different conditions for missing data analysis. The outcome for every method in each scenario resulted in 200 miscellaneous setups. MRCBR outperforms all compared CBR methods in presence of missing data and shows reliable and stable results in every scenario. Especially with larger databases and rising number of incomplete variables it enlarges its lead to all other methods. Our study demonstrates that missing data must not be ignored when a correct CBR outcome is required.</description><identifier>ISSN: 1532-0464</identifier><identifier>EISSN: 1532-0480</identifier><identifier>DOI: 10.1016/j.jbi.2019.103127</identifier><identifier>PMID: 30771484</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Case-based reasoning ; Incomplete data ; Medical decision support ; Missingness types ; Multiple imputation</subject><ispartof>Journal of biomedical informatics, 2019-04, Vol.92, p.103127-103127, Article 103127</ispartof><rights>2019 Elsevier Inc.</rights><rights>Copyright © 2019 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c396t-fa5d61dddc6f14d9bf6878776d4cf7b96630e20a1f3fe5c7894925dfb9638fbe3</citedby><cites>FETCH-LOGICAL-c396t-fa5d61dddc6f14d9bf6878776d4cf7b96630e20a1f3fe5c7894925dfb9638fbe3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jbi.2019.103127$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3548,27922,27923,45993</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30771484$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Löw, Nikolas</creatorcontrib><creatorcontrib>Hesser, Jürgen</creatorcontrib><creatorcontrib>Blessing, Manuel</creatorcontrib><title>Multiple retrieval case-based reasoning for incomplete datasets</title><title>Journal of biomedical informatics</title><addtitle>J Biomed Inform</addtitle><description>[Display omitted] •Method preserves the most accurate and robust CBR ranking for every missingness type.•Performance enhances for larger datasets and rising number of affected variables.•Variable types impact onto CBR retrieval is reinforced by the missingness type.•Ignoring incomplete data results is less trustworthy CBR ranking than data imputation. The performance of case-based reasoning (CBR) depends on an accurate ranking of similar cases in the retrieval phase that affects all subsequent phases and profits from the potential of large databases. Unfortunately, growing databases come along with a rising amount of missing data that reduces the stability of the ranking since incomplete cases cannot be ranked as reliable as complete ones. In context of CBR hardly any work was done so far to rigorously analyze the impact of missing data and solutions to tackle this issue. In particular, a generalized solution which is able to process data under different missingness conditions for different variable types is missing. In this paper we present a multiple retrieval case-based reasoning (MRCBR) framework for incomplete databases that provides a statistically accurate ranking for similar cases. It unifies the advantages of multiple imputation and CBR while it preserves both the data distribution and database structure. Built as generalized CBR system, MRCBR was optimized and tested for medical decision support but can be extended to any CBR requirement as well. It is suitable for numerical and categorical variables and all sorts of missingness conditions. The approach was compared to eight competing methods applicable to handle incomplete databases in context of CBR. The comparison to the true ranking was based on two various error measures. In the evaluation we tested four representative scenarios that considered different conditions for missing data analysis. The outcome for every method in each scenario resulted in 200 miscellaneous setups. MRCBR outperforms all compared CBR methods in presence of missing data and shows reliable and stable results in every scenario. Especially with larger databases and rising number of incomplete variables it enlarges its lead to all other methods. Our study demonstrates that missing data must not be ignored when a correct CBR outcome is required.</description><subject>Case-based reasoning</subject><subject>Incomplete data</subject><subject>Medical decision support</subject><subject>Missingness types</subject><subject>Multiple imputation</subject><issn>1532-0464</issn><issn>1532-0480</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1PxCAQhonRuLr6A7yYHr10hdJCiQdjNn4la7zomVAYDE23XYFu4r-XTdc9emGY4Zk34UHoiuAFwYTdtou2cYsCE5F6Sgp-hM5IRYsclzU-PtxZOUPnIbQYE1JV7BTNKOaclHV5hu7fxi66TQeZh-gdbFWXaRUgb9Jh0lCFoXf9V2YHn7leD-vERsiMigmI4QKdWNUFuNzXOfp8evxYvuSr9-fX5cMq11SwmFtVGUaMMZpZUhrRWFbzmnNmSm15IxijGAqsiKUWKs1rUYqiMja90No2QOfoZsrd-OF7hBDl2gUNXad6GMYgiwILhrngJKFkQrUfQvBg5ca7tfI_kmC58yZbmbzJnTc5eUs71_v4sVmDOWz8iUrA3QRA-uTWgZdBO-g1GOdBR2kG90_8L9VLfeU</recordid><startdate>201904</startdate><enddate>201904</enddate><creator>Löw, Nikolas</creator><creator>Hesser, Jürgen</creator><creator>Blessing, Manuel</creator><general>Elsevier Inc</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>201904</creationdate><title>Multiple retrieval case-based reasoning for incomplete datasets</title><author>Löw, Nikolas ; Hesser, Jürgen ; Blessing, Manuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c396t-fa5d61dddc6f14d9bf6878776d4cf7b96630e20a1f3fe5c7894925dfb9638fbe3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Case-based reasoning</topic><topic>Incomplete data</topic><topic>Medical decision support</topic><topic>Missingness types</topic><topic>Multiple imputation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Löw, Nikolas</creatorcontrib><creatorcontrib>Hesser, Jürgen</creatorcontrib><creatorcontrib>Blessing, Manuel</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of biomedical informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Löw, Nikolas</au><au>Hesser, Jürgen</au><au>Blessing, Manuel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multiple retrieval case-based reasoning for incomplete datasets</atitle><jtitle>Journal of biomedical informatics</jtitle><addtitle>J Biomed Inform</addtitle><date>2019-04</date><risdate>2019</risdate><volume>92</volume><spage>103127</spage><epage>103127</epage><pages>103127-103127</pages><artnum>103127</artnum><issn>1532-0464</issn><eissn>1532-0480</eissn><abstract>[Display omitted] •Method preserves the most accurate and robust CBR ranking for every missingness type.•Performance enhances for larger datasets and rising number of affected variables.•Variable types impact onto CBR retrieval is reinforced by the missingness type.•Ignoring incomplete data results is less trustworthy CBR ranking than data imputation. The performance of case-based reasoning (CBR) depends on an accurate ranking of similar cases in the retrieval phase that affects all subsequent phases and profits from the potential of large databases. Unfortunately, growing databases come along with a rising amount of missing data that reduces the stability of the ranking since incomplete cases cannot be ranked as reliable as complete ones. In context of CBR hardly any work was done so far to rigorously analyze the impact of missing data and solutions to tackle this issue. In particular, a generalized solution which is able to process data under different missingness conditions for different variable types is missing. In this paper we present a multiple retrieval case-based reasoning (MRCBR) framework for incomplete databases that provides a statistically accurate ranking for similar cases. It unifies the advantages of multiple imputation and CBR while it preserves both the data distribution and database structure. Built as generalized CBR system, MRCBR was optimized and tested for medical decision support but can be extended to any CBR requirement as well. It is suitable for numerical and categorical variables and all sorts of missingness conditions. The approach was compared to eight competing methods applicable to handle incomplete databases in context of CBR. The comparison to the true ranking was based on two various error measures. In the evaluation we tested four representative scenarios that considered different conditions for missing data analysis. The outcome for every method in each scenario resulted in 200 miscellaneous setups. MRCBR outperforms all compared CBR methods in presence of missing data and shows reliable and stable results in every scenario. Especially with larger databases and rising number of incomplete variables it enlarges its lead to all other methods. Our study demonstrates that missing data must not be ignored when a correct CBR outcome is required.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>30771484</pmid><doi>10.1016/j.jbi.2019.103127</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1532-0464
ispartof Journal of biomedical informatics, 2019-04, Vol.92, p.103127-103127, Article 103127
issn 1532-0464
1532-0480
language eng
recordid cdi_proquest_miscellaneous_2209607971
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; ScienceDirect Journals (5 years ago - present)
subjects Case-based reasoning
Incomplete data
Medical decision support
Missingness types
Multiple imputation
title Multiple retrieval case-based reasoning for incomplete datasets
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T19%3A16%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multiple%20retrieval%20case-based%20reasoning%20for%20incomplete%20datasets&rft.jtitle=Journal%20of%20biomedical%20informatics&rft.au=L%C3%B6w,%20Nikolas&rft.date=2019-04&rft.volume=92&rft.spage=103127&rft.epage=103127&rft.pages=103127-103127&rft.artnum=103127&rft.issn=1532-0464&rft.eissn=1532-0480&rft_id=info:doi/10.1016/j.jbi.2019.103127&rft_dat=%3Cproquest_cross%3E2209607971%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2209607971&rft_id=info:pmid/30771484&rft_els_id=S1532046419300450&rfr_iscdi=true