Estimation of predictive performance for test data in applicability domains using y‐randomization
A new measure of predictive performance for objective variables in regression analysis is proposed, enabling the y‐errors of new samples or test samples to be estimated in the applicability domains (ADs) of regression models. The proposed measure, based on y‐randomization, considers chance correlati...
Gespeichert in:
Veröffentlicht in: | Journal of chemometrics 2019-09, Vol.33 (9), p.n/a |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A new measure of predictive performance for objective variables in regression analysis is proposed, enabling the y‐errors of new samples or test samples to be estimated in the applicability domains (ADs) of regression models. The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples considering the influence of chance correlations in the given dataset on the regression models. Experiments using numerical simulation, quantitative structure‐activity relationship, and quantitative structure‐property relationship datasets confirm that MAECCE can estimate the distribution of y‐errors of new samples in ADs for various training datasets, descriptor sets, and regression analysis methods, enabling chance correlations to be eliminated from data analysis. Python and MATLAB codes for the proposed algorithm are available at https://github.com/hkaneko1985/maecce.
The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples or test samples in the applicability domains (ADs) of regression models, considering the influence of chance correlations in the given dataset on the regression models. |
---|---|
ISSN: | 0886-9383 1099-128X |
DOI: | 10.1002/cem.3171 |