PRESS-related statistics : regression tools for cross-validation and case diagnostics

In the health science literature, a common approach of validating a regression equation is data-splitting, where a portion of the data fits the model (fitting sample) and the remainder (validation sample) estimates future performance. The R2 and SEE obtained by predicting the validation sample with...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Medicine and science in sports and exercise 1995-04, Vol.27 (4), p.612-620
Hauptverfasser: HOLIDAY, D. B, BALLARD, J. E, MCKEOWN, B. C
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the health science literature, a common approach of validating a regression equation is data-splitting, where a portion of the data fits the model (fitting sample) and the remainder (validation sample) estimates future performance. The R2 and SEE obtained by predicting the validation sample with the fitting sample equation is a proper estimate of future performance, tending to correct for the natural upward bias of the R2 and SEE obtained from fitting sample alone. Data-splitting has several disadvantages, however. These include: 1) difficulty, arbitrariness, and inconvenience of matching samples; 2) the need to report two sets of statistics to determine homogeneity; and 3) the lack of equation stability due to diluted sample size. The PRESS statistic and associated residuals do not require the data to be split, yield alternative unbiased estimates of R2 and SEE, and provide useful case diagnostics. This procedure is easy to use, is widely available in modern statistical packages, but is rarely utilized. The two methods are contrasted here using a simulation from original data for predicting body density from anthropometric measurements of a group of 117 women. The PRESS approach is particularly appropriate for smaller datasets; methods of reporting these statistics are recommended.
ISSN:0195-9131
1530-0315
DOI:10.1249/00005768-199504000-00022