An adaptive strategy to improve the partial least squares model via minimum covariance determinant
Partial least squares (PLS) regression is a linear regression technique that performs well with high-dimensional regressors. Similar to many other supervised learning techniques, PLS is susceptible to the problem that the prediction and training data are drawn from different distributions, which det...
Gespeichert in:
Veröffentlicht in: | Chemometrics and intelligent laboratory systems 2024-06, Vol.249, p.105120, Article 105120 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Partial least squares (PLS) regression is a linear regression technique that performs well with high-dimensional regressors. Similar to many other supervised learning techniques, PLS is susceptible to the problem that the prediction and training data are drawn from different distributions, which deteriorates the PLS performance. To address this problem, an adaptive strategy via the minimum covariance determinant (MCD) estimator is proposed to improve the PLS model, which aims to find an appropriate training set for the adaptive construction of an accurate PLS model to fit the prediction data. In this study, an h-subset of the merged set of prediction and training data with the smallest covariance determinant is found via the MCD estimator, and the prediction and training data with Mahalanobis distances to the h-subset less than or equal to a cutoff that is the square root of a quantile of the chi-squared distribution are assumed to have the same distribution, then a PLS model is built on these training data. The proposed method is applied to three real-world datasets and compared with the results of classic PLS, the most significant improvement is obtained for the m5 prediction data in the corn dataset, where the root mean square error of prediction (RMSEP) is reduced from 0.149 to 0.023. For other datasets, our method can also perform better than PLS. The experimental results show the effectiveness of our method.
•Performance of PLS models is corrupted when the training data for constructing PLS model and the prediction data come from different distributions.•Employing MCD for selecting training data from the same distribution as the prediction data.•PLS models are constructed using training data with the same distribution as the prediction data to improve the predictive performance of the models.•Validation of the proposed approach's efficacy through experimentation on three real-world NIR spectra. |
---|---|
ISSN: | 0169-7439 1873-3239 |
DOI: | 10.1016/j.chemolab.2024.105120 |