Data resampling method based on clustering oversampling and instance hardness threshold
The invention provides a data resampling method based on clustering oversampling and an instance hardness threshold. The method comprises the following steps: firstly, performing clustering processingon a data set by utilizing a Kmeans method, and performing filtering processing and sampling weight...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides a data resampling method based on clustering oversampling and an instance hardness threshold. The method comprises the following steps: firstly, performing clustering processingon a data set by utilizing a Kmeans method, and performing filtering processing and sampling weight distribution on clustering; then, adopting an SMOTE algorithm to carry out oversampling on the dataset to generate new data, so that the number of minority class samples in the data set is equal to that of majority class samples, and the data set becomes class balance; and finally, cleaning the data by adopting an instance hardness threshold algorithm to obtain a final balanced data set with fewer noisy points. According to the method, the class imbalance data set can be processed into the balance data set, and the prediction performance of the classifier for minority class samples is improved.
本发明提供了一种基于聚类过采样与实例硬度阈值的数据重采样方法。首先,利用K-means方法对数据集进行聚类处理,并对聚类进行过滤处理和采样权重分配;接着,采用SMOTE算法对数据集进行过采样,生成新的数据使数据集中少数类与多数类样本数量相等,数 |
---|