Multiple retrieval case-based reasoning for incomplete datasets
[Display omitted] •Method preserves the most accurate and robust CBR ranking for every missingness type.•Performance enhances for larger datasets and rising number of affected variables.•Variable types impact onto CBR retrieval is reinforced by the missingness type.•Ignoring incomplete data results...
Gespeichert in:
Veröffentlicht in: | Journal of biomedical informatics 2019-04, Vol.92, p.103127-103127, Article 103127 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | [Display omitted]
•Method preserves the most accurate and robust CBR ranking for every missingness type.•Performance enhances for larger datasets and rising number of affected variables.•Variable types impact onto CBR retrieval is reinforced by the missingness type.•Ignoring incomplete data results is less trustworthy CBR ranking than data imputation.
The performance of case-based reasoning (CBR) depends on an accurate ranking of similar cases in the retrieval phase that affects all subsequent phases and profits from the potential of large databases. Unfortunately, growing databases come along with a rising amount of missing data that reduces the stability of the ranking since incomplete cases cannot be ranked as reliable as complete ones. In context of CBR hardly any work was done so far to rigorously analyze the impact of missing data and solutions to tackle this issue. In particular, a generalized solution which is able to process data under different missingness conditions for different variable types is missing.
In this paper we present a multiple retrieval case-based reasoning (MRCBR) framework for incomplete databases that provides a statistically accurate ranking for similar cases. It unifies the advantages of multiple imputation and CBR while it preserves both the data distribution and database structure. Built as generalized CBR system, MRCBR was optimized and tested for medical decision support but can be extended to any CBR requirement as well. It is suitable for numerical and categorical variables and all sorts of missingness conditions.
The approach was compared to eight competing methods applicable to handle incomplete databases in context of CBR. The comparison to the true ranking was based on two various error measures. In the evaluation we tested four representative scenarios that considered different conditions for missing data analysis. The outcome for every method in each scenario resulted in 200 miscellaneous setups. MRCBR outperforms all compared CBR methods in presence of missing data and shows reliable and stable results in every scenario. Especially with larger databases and rising number of incomplete variables it enlarges its lead to all other methods. Our study demonstrates that missing data must not be ignored when a correct CBR outcome is required. |
---|---|
ISSN: | 1532-0464 1532-0480 |
DOI: | 10.1016/j.jbi.2019.103127 |