Estimation of predictive performance for test data in applicability domains using y‐randomization

A new measure of predictive performance for objective variables in regression analysis is proposed, enabling the y‐errors of new samples or test samples to be estimated in the applicability domains (ADs) of regression models. The proposed measure, based on y‐randomization, considers chance correlati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemometrics 2019-09, Vol.33 (9), p.n/a
1. Verfasser: Kaneko, Hiromasa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A new measure of predictive performance for objective variables in regression analysis is proposed, enabling the y‐errors of new samples or test samples to be estimated in the applicability domains (ADs) of regression models. The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples considering the influence of chance correlations in the given dataset on the regression models. Experiments using numerical simulation, quantitative structure‐activity relationship, and quantitative structure‐property relationship datasets confirm that MAECCE can estimate the distribution of y‐errors of new samples in ADs for various training datasets, descriptor sets, and regression analysis methods, enabling chance correlations to be eliminated from data analysis. Python and MATLAB codes for the proposed algorithm are available at https://github.com/hkaneko1985/maecce. The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples or test samples in the applicability domains (ADs) of regression models, considering the influence of chance correlations in the given dataset on the regression models.
ISSN:0886-9383
1099-128X
DOI:10.1002/cem.3171