Unsupervised discretization method for continuous attribute data based on information entropy

The invention relates to the technical field of discretization of continuous attributes of big data, in particular to an unsupervised discretization method for continuous attribute data based on information entropy. The method includes the steps as follows: the first step of traversing all value rec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHEN WANGHU, GUO HONGLE, LI XINTIAN, MA SHENGJUN, QIAO BAOMIN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to the technical field of discretization of continuous attributes of big data, in particular to an unsupervised discretization method for continuous attribute data based on information entropy. The method includes the steps as follows: the first step of traversing all value records of any continuous attribute, counting discrete granularity |nj| of the attribute and the probability qji of each different value, and recording a maximum njmax and minimum njmin; the second step of obtaining a calculating formula of the value chaos degree of any continuous attribute nj according to a calculating formula of the information entropy, and calculating the value chaos degree of the attribute according to the formula; the third step of rounding down the value chaos degree to obtain the number of break points; the fourth step of adopting an equivalent width interval method to calculate the width of each divided interval, and determining the position of each break point; and thefifth step of discretizi