On rule acquisition methods for data classification in heterogeneous incomplete decision systems

In the age of big data, lots of data obtained is low-quality data characterized by heterogeneousness and incompleteness, referred to as heterogeneous incomplete decision systems (HIDSs) in this paper. Data classification is an important task in machine learning, with the ability to discover valuable...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2020-04, Vol.193, p.105472, Article 105472
Hauptverfasser:	Meng, Zuqiang, Shi, Zhongzhi
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Classifiers Computer Science Computer Science, Artificial Intelligence Data classification Granulation Heterogeneous incomplete decision systems Heuristic methods Machine learning Methods Optimization Reduction Rough set Rule acquisition Science & Technology Technology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In the age of big data, lots of data obtained is low-quality data characterized by heterogeneousness and incompleteness, referred to as heterogeneous incomplete decision systems (HIDSs) in this paper. Data classification is an important task in machine learning, with the ability to discover valuable knowledge hidden in HIDSs. However, systematic studies on data classification in HIDSs are rarely reported. Especially, there is a lack of adaptive classification methods for HIDSs, which can deal directly with heterogeneous incomplete data and do not require prior discretization of numerical attributes or filling in missing values. In this paper, a unified representation model, called parameterized tolerance granulation model (PTGM), is proposed to deal with heterogeneous incomplete data. And the principle of an adaptive granulation method of constructing appropriate PTGMs is also described using difference-based collaborative optimization. Based on PTGMs, decision logic language is used to describe classifiers consisting of decision rules satisfying given conditions. Then, a discernibility function-based and a heuristic function-based classification methods are proposed to obtain all optimized rule sets (classifiers) and to generate a particular optimized rule set, respectively. The heuristic function-based method is actually an adaptive classification method, which can deal directly with heterogeneous incomplete data. Furthermore, detailed theoretical analyses are given to illustrate the correctness and effectiveness of the proposed methods. The experimental results show that the proposed methods are effective and have obvious advantages in directly handling heterogeneous incomplete data.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2020.105472