Consistency of incomplete data
Consistency is well-known for completely specified data sets. A specified data set is defined as consistent when any pair of cases with the same attribute values belongs to the same concept. In this paper we generalize the definition of consistency for incomplete data sets using rough set theory. We...
Gespeichert in:
Veröffentlicht in: | Information sciences 2015-11, Vol.322, p.197-222 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Consistency is well-known for completely specified data sets. A specified data set is defined as consistent when any pair of cases with the same attribute values belongs to the same concept. In this paper we generalize the definition of consistency for incomplete data sets using rough set theory. We discuss two types of missing attribute values: lost values and “do not care” conditions. For incomplete data sets there exist three definitions of approximations: singleton, subset and concept. Any approximation is lower or upper, so we may define six types of consistencies. We show that two pairs of such consistencies are equivalent, hence there are only four distinct consistencies of incomplete data. Additionally, we discuss probabilistic approximations and study properties of corresponding consistencies. We illustrate the idea of consistency for incomplete data sets using experiments on many incomplete data sets derived from eight benchmark data sets. |
---|---|
ISSN: | 0020-0255 |
DOI: | 10.1016/j.ins.2015.06.011 |