Validation in prediction research: the waste by data splitting

Accurate prediction of medical outcomes is important for diagnosis and prognosis. The standard requirement in major medical journals is nowadays that validity outside the development sample needs to be shown. Is such data splitting an example of a waste of resources? In large samples, interest shoul...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of clinical epidemiology 2018-11, Vol.103, p.131-133
1. Verfasser:	Steyerberg, Ewout W.
Format:	Artikel
Sprache:	eng
Schlagworte:	Big Data Datasets Heterogeneity Leukemia Mathematical models Medical research Prediction models Splitting Validation studies Validity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Accurate prediction of medical outcomes is important for diagnosis and prognosis. The standard requirement in major medical journals is nowadays that validity outside the development sample needs to be shown. Is such data splitting an example of a waste of resources? In large samples, interest should shift to assessment of heterogeneity in model performance across settings. In small samples, cross-validation and bootstrapping are more efficient approaches. In conclusion, random data splitting should be abolished for validation of prediction models. •In the absence of sufficient sample size, independent validation is misleading and should be dropped as a model evaluation step.•We should accept that small size studies on prediction are exploratory in nature, at best show potential of new biological insights, and cannot be expected to provide clinically applicable tests, prediction models or classifiers.•Validation studies should have at least 100 events to be meaningful. In Big Data, heterogeneity in model performance should be quantified rather than average performance.
ISSN:	0895-4356 1878-5921
DOI:	10.1016/j.jclinepi.2018.07.010