Is the area under an ROC curve a valid measure of the performance of a screening or diagnostic test?

Objectives The area under a receiver operating characteristic (ROC) curve (the AUC) is used as a measure of the performance of a screening or diagnostic test. We here assess the validity of the AUC. Methods Assuming the test results follow Gaussian distributions in affected and unaffected individual...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of medical screening 2014-03, Vol.21 (1), p.51-56
Hauptverfasser:	Wald, NJ, Bestwick, JP
Format:	Artikel
Sprache:	eng
Schlagworte:	Area Under Curve Diagnostic Tests, Routine - methods Diagnostic Tests, Routine - standards False Positive Reactions Humans Mass Screening - methods Models, Theoretical Normal Distribution Reproducibility of Results ROC Curve
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Objectives The area under a receiver operating characteristic (ROC) curve (the AUC) is used as a measure of the performance of a screening or diagnostic test. We here assess the validity of the AUC. Methods Assuming the test results follow Gaussian distributions in affected and unaffected individuals, standard mathematical formulae were used to describe the relationship between the detection rate (DR) (or sensitivity) and the false-positive rate (FPR) of a test with the AUC. These formulae were used to calculate the screening performance (DR for a given FPR, or FPR for a given DR) for different AUC values according to different standard deviations of the test result in affected and unaffected individuals. Results The DR for a given FPR is strongly dependent on relative differences in the standard deviation of the test variable in affected and unaffected individuals. Consequently, two tests with the same AUC can have a different DR for the same FPR. For example, an AUC of 0.75 has a DR of 24% for a 5% FPR if the standard deviations are the same in affected and unaffected individuals, but 39% for the same 5% FPR if the standard deviation in affected individuals is 1.5 times that in unaffected individuals. Conclusion The AUC is an unreliable measure of screening performance because in practice the standard deviation of a screening or diagnostic test in affected and unaffected individuals can differ. The problem is avoided by not using AUC at all, and instead specifying DRs for given FPRs or FPRs for given DRs.
ISSN:	0969-1413 1475-5793
DOI:	10.1177/0969141313517497