Sample Reduction Algorithm Based on Classification Contribution

The KNN algorithm takes exponentially growth of time to process dataset containing a large number of samples and has low classification performance. To address this problem, this paper proposes a sample reduction method based on classification contribution ranking (SRCCR). First, SRCCR performs a de...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IAENG international journal of computer science 2023-08, Vol.50 (3), p.851
Hauptverfasser: Chai, Zheng, Li, Yanying, Zhang, Jiaoni, Wang, Xialin, Li, Wen, Jiang, Yucong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The KNN algorithm takes exponentially growth of time to process dataset containing a large number of samples and has low classification performance. To address this problem, this paper proposes a sample reduction method based on classification contribution ranking (SRCCR). First, SRCCR performs a denoising process to expand the smoothing decision boundary by removing the noise sample in the initial training dataset; next, the denoised samples are sorted in ascending order according to the classification contribution strategy; finally, representative boundary samples and center samples are selected based on the local set to form the final subset. SRCCR reduces storage requirement and execution time, and significantly improves the classification performance of the KNN algorithm. To verify the effectiveness of the proposed method, we conduct comparative experiments on 31 real datasets from the UCI and KEEL databases. Compared with several classical instance selection algorithms, the proposed SRCCR algorithm has advantages in terms of accuracy and reduction rate. The results of the study on the two-dimensional dataset "Banana" show that the SRCCR algorithm not only selects more representative boundary and center samples, but preserves the distribution of the original dataset.
ISSN:1819-656X
1819-9224