Unsupervised discretization method for continuous attribute data based on information entropy
The invention relates to the technical field of discretization of continuous attributes of big data, in particular to an unsupervised discretization method for continuous attribute data based on information entropy. The method includes the steps as follows: the first step of traversing all value rec...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to the technical field of discretization of continuous attributes of big data, in particular to an unsupervised discretization method for continuous attribute data based on information entropy. The method includes the steps as follows: the first step of traversing all value records of any continuous attribute, counting discrete granularity |nj| of the attribute and the probability qji of each different value, and recording a maximum njmax and minimum njmin; the second step of obtaining a calculating formula of the value chaos degree of any continuous attribute nj according to a calculating formula of the information entropy, and calculating the value chaos degree of the attribute according to the formula; the third step of rounding down the value chaos degree to obtain the number of break points; the fourth step of adopting an equivalent width interval method to calculate the width of each divided interval, and determining the position of each break point; and thefifth step of discretizi |
---|