Consistency of incomplete data

Consistency is well-known for completely specified data sets. A specified data set is defined as consistent when any pair of cases with the same attribute values belongs to the same concept. In this paper we generalize the definition of consistency for incomplete data sets using rough set theory. We...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information sciences 2015-11, Vol.322, p.197-222
Hauptverfasser:	Clark, Patrick G., Grzymala-Busse, Jerzy W., Rzasa, Wojciech
Format:	Artikel
Sprache:	eng
Schlagworte:	Approximation Benchmarking Consistency Equivalence Incomplete data Missing attribute value Probabilistic approximation Probabilistic methods Probability theory Rough set theory Set theory
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Consistency is well-known for completely specified data sets. A specified data set is defined as consistent when any pair of cases with the same attribute values belongs to the same concept. In this paper we generalize the definition of consistency for incomplete data sets using rough set theory. We discuss two types of missing attribute values: lost values and “do not care” conditions. For incomplete data sets there exist three definitions of approximations: singleton, subset and concept. Any approximation is lower or upper, so we may define six types of consistencies. We show that two pairs of such consistencies are equivalent, hence there are only four distinct consistencies of incomplete data. Additionally, we discuss probabilistic approximations and study properties of corresponding consistencies. We illustrate the idea of consistency for incomplete data sets using experiments on many incomplete data sets derived from eight benchmark data sets.
ISSN:	0020-0255
DOI:	10.1016/j.ins.2015.06.011