Block-Regularized m × 2 Cross-Validated Estimator of the Generalization Error
A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization er...
Gespeichert in:
Veröffentlicht in: | Neural computation 2017-02, Vol.29 (2), p.519-554 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A cross-validation method based on
replications of two-fold cross validation is called an
cross validation. An
cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in
cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in
cross validation. This fluctuation results in a large variance in the
cross-validated estimator. The influence of the random partitions on variance becomes serious as
increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized
cross validation (
BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the
BCV estimator of the generalization error is smaller than the variance of
cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of
BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. |
---|---|
ISSN: | 0899-7667 1530-888X |
DOI: | 10.1162/NECO_a_00923 |