Performance enhancement‐based active learning sample selection method

Representative samples are important for multivariate calibration. The highly efficient selection of representative samples to be labelled can save money and time. Existing methods, such as Kennard‐Stone and net analyte signal selection, are usually based on the distance between candidate samples an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemometrics 2022-03, Vol.36 (3), p.n/a
Hauptverfasser: He, Zhonghai, Song, Shijie, Shen, Kun, Zhang, Xiaofang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Representative samples are important for multivariate calibration. The highly efficient selection of representative samples to be labelled can save money and time. Existing methods, such as Kennard‐Stone and net analyte signal selection, are usually based on the distance between candidate samples and labelled calibration sets in feature space. However, these distances are influenced by the feature space, which is spanned by an information vector extracted from labelled samples. To overcome the negative effects of the distance‐based selection method, a model performance enhancement‐based sample selection method is proposed to select calibration samples efficiently. Based on loss function optimization, the samples that can improve model performance the most, as estimated by bootstrap, are sequentially selected and added to the calibration set. Due to the high representation of each sample, a few samples can build a model that has no significant loss of prediction ability when compared with a model built with the large number set of calibration samples. The performance enhancement‐based active learning (PEAL) sample selection method is both effective and efficient. Model performance is used as the query criterion. True value of model is estimated by the bootstrap method. High efficiency of sample selection can be realized.
ISSN:0886-9383
1099-128X
DOI:10.1002/cem.3386