The meaning of kappa: Probabilistic concepts of reliability and validity revisited

A Framework—the “agreement concept”—is developed to study the use of Cohen's kappa as well as alternative measures of chance-corrected agreement in a unified manner. Fcusing on intrarater consistency it is demonstrated that for 2 × 2 tables an adequate choice between different measures of chanc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of clinical epidemiology 1996-07, Vol.49 (7), p.775-782
1. Verfasser:	Guggenmoos-Holzmann, Irene
Format:	Artikel
Sprache:	eng
Schlagworte:	Biological and medical sciences chance-corrected agreement chance-corrected validity Diagnostic test Fundamental and applied biological sciences. Psychology kappa Models, Statistical Probability Psychology. Psychoanalysis. Psychiatry Psychology. Psychophysiology Psychometrics. Statistics. Methodology reliability Sensitivity and Specificity Statistics. Mathematics validity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A Framework—the “agreement concept”—is developed to study the use of Cohen's kappa as well as alternative measures of chance-corrected agreement in a unified manner. Fcusing on intrarater consistency it is demonstrated that for 2 × 2 tables an adequate choice between different measures of chance-corrected agreement can be made only if the characteristics of the observational setting are taken into account. In particular, a naive use of Cohen's kappa may lead to strinkingly overoptimistic estimates of chance-corrected agreement. Such bias can be overcome by more elaborate study designs that allow for an unrestricted estimation of the probabilities at issue. When Cohen's kappa is appropriately applied as a measure of chance-corrected agreement, its values prove to be a linear—and not a parabolic—function of true prevalence. It is further shown how the validity of ratings is influenced by lack of consistency. Depending on the design of a validity study, this may lead, on purely formal grounds, to prevalence-dependent estimates of sensitivity and specificity. Proposed formulas for “chance-corrected” validity indexes fail to adjust for this phenomenon.
ISSN:	0895-4356 1878-5921
DOI:	10.1016/0895-4356(96)00011-X