Training sample selection method based on clustering and active learning

The invention discloses a clustering and active learning-based training sample selection method and device. The method comprises the following steps of: firstly, dividing samples in a data pool into high-confidence samples higher than a threshold value and low-confidence samples lower than the thres...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHEN ZHANGQUAN, YANG PENG, MAO LU, GUO YUANYUAN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a clustering and active learning-based training sample selection method and device. The method comprises the following steps of: firstly, dividing samples in a data pool into high-confidence samples higher than a threshold value and low-confidence samples lower than the threshold value by utilizing consistency regularization; secondly, data clustering is carried out through a density peak value clustering method, and samples in a data pool are divided into an inner area and an outer area; then, the samples which are higher than the threshold value and belong to the inner area are marked with pseudo labels, and the samples which are lower than the threshold value and belong to the outer area are added into the active learning task; and finally, screening out training samples with uncertainty and diversity through a multi-index fusion method, and marking the training samples for experts. According to the invention, on the premise of reaching the preset performance of the model, the train