Instance Selection via Voronoi Neighbors for Binary Classification Tasks
Large datasets available in many applications have enabled the training of binary classifiers to match or even outperform humans. However, the large volume of data introduces computational burden during the training and calibration of model parameters. Since the optimal decision surface (ODS) of a c...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on knowledge and data engineering 2024-08, Vol.36 (8), p.3921-3933 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large datasets available in many applications have enabled the training of binary classifiers to match or even outperform humans. However, the large volume of data introduces computational burden during the training and calibration of model parameters. Since the optimal decision surface (ODS) of a classification task is often determined by a few nearby instances, a novel PDOC-V method is proposed to identify them. A Bayesian probability model is adopted to describe the ODS. An instance is close to the ODS if its probability of belonging to the positive and negative classes is similar. The probabilities of an instance are estimated by partitioning the input space into cells containing a single instance via the Voronoi diagram and inspecting its Voronoi neighbors. A randomized ray shooting algorithm is adopted to accelerate our algorithm. In many natural datasets, the spatial distribution of instances is often uneven. For such datasets, our method is more robust than existing distance-based instance selection methods. Comprehensive experiments suggest that common classifiers trained on instances selected by PDOC-V can accurately recover the ODS. Moreover, for many natural datasets, common classifiers trained on 10% - 20% of instances can achieve more than 98% of the full set performance. |
---|---|
ISSN: | 1041-4347 1558-2191 |
DOI: | 10.1109/TKDE.2023.3328952 |