Beyond kappa: A review of interrater agreement measures

In 1960, Cohen introduced the kappa coefficient to measure chance-corrected nominal scale agreement between two raters. Since then, numerous extensions and generalizations of this interrater agreement measure have been proposed in the literature. This paper reviews and critiques various approaches t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Canadian journal of statistics 1999-03, Vol.27 (1), p.3-23
Hauptverfasser:	Banerjee, Mousumi, Capozzoli, Michelle, McSweeney, Laura, Sinha, Debajyoti
Format:	Artikel
Sprache:	eng
Schlagworte:	Biometrics Confidence interval Correlation coefficients Correlations Estimators intraclass correlation Kappa coefficient log-linear models Logical givens nominal data ordinal data Standard error Statistical variance Statistics Tanneries
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In 1960, Cohen introduced the kappa coefficient to measure chance-corrected nominal scale agreement between two raters. Since then, numerous extensions and generalizations of this interrater agreement measure have been proposed in the literature. This paper reviews and critiques various approaches to the study of interrater agreement, for which the relevant data comprise either nominal or ordinal categorical ratings from multiple raters. It presents a comprehensive compilation of the main statistical approaches to this problem, descriptions and characterizations of the underlying models, and discussions of related statistical methodologies for estimation and confidence-interval construction. The emphasis is on various practical scenarios and designs that underlie the development of these measures, and the interrelationships between them. /// C'est en 1960 que Cohen a proposé l'emploi du coefficient kappa comme outil de mesure de l'accord entre deux évaluateurs exprimant leur jugement au moyen d'une échelle nominale. De nombreuses généralisations de cette mesure d'accord ont été proposées depuis lors. Les auteurs jettent ici un regard critique sur nombre de ces travaux traitant du cas où l'échelle de réponse est soit nominale, soit ordinale. Les principales approches statistiques sont passées en revue, les modèles sous-jacents sont décrits et caractérisés, et les problèmes liés à l'estimation ponctuelle ou par intervalle sont abordés. L'accent est mis sur différents scénarios concrets et sur des schémas expérimentaux qui sous-tendent l'emploi de ces mesures et les relations existant entre elles.
ISSN:	0319-5724 1708-945X
DOI:	10.2307/3315487