A two-stage discretization algorithm based on information entropy
Discretization is an important and difficult preprocessing task for data mining and knowledge discovery. Although there are numerous discretization approaches, many suffer from certain drawbacks. Local approaches are efficient, but their generalization ability is weak. Global approaches consider all...
Gespeichert in:
Veröffentlicht in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2017-12, Vol.47 (4), p.1169-1185 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Discretization is an important and difficult preprocessing task for data mining and knowledge discovery. Although there are numerous discretization approaches, many suffer from certain drawbacks. Local approaches are efficient, but their generalization ability is weak. Global approaches consider all attributes simultaneously, but they have high time and space complexities. In this paper, we propose a two-stage discretization (TSD) algorithm based on information entropy. In the local discretization stage, we independently select
k
strong cuts for each attribute to minimize conditional entropy. The goal is to rapidly reduce the cardinality of the attributes, with minor information loss. In the global discretization stage, cuts for all attributes are considered simultaneously to form a scaled decision system. The minimal cut set that preserves the positive region is finally selected. We tested the new algorithm and seven popular algorithms on 28 datasets. Compared with other approaches, our algorithm has the best generalization ability, with a good information preserving ability, the highest classification accuracy, and reasonable time consumption. |
---|---|
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-017-0941-0 |