The accuracy of interrater reliability estimates found using a subset of the total data sample: A bootstrap analysis

Interrater reliability (IRR) assesses the stability of a coding protocol over time and across coders. For practical reasons, it is often difficult to assess IRR for an entire dataset, so researchers sometimes calculate the IRR for a subset of the total data sample. The purpose of this study is to in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the Human Factors and Ergonomics Society Annual Meeting 2020-12, Vol.64 (1), p.1377-1382
Hauptverfasser: Armstrong, Miriam E., Tornblad, McKenna K., Jones, Keith S.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Interrater reliability (IRR) assesses the stability of a coding protocol over time and across coders. For practical reasons, it is often difficult to assess IRR for an entire dataset, so researchers sometimes calculate the IRR for a subset of the total data sample. The purpose of this study is to investigate the accuracy of such subset IRRs. Using bootstrapping, we determined the effects of sample size (10%, 25%, & 40% of the total dataset) and IRR measure type (percent agreement, Krippendorff’s alpha, & the G Index) on the bias and percent error of subset IRRs. Results support the use of calculating IRR from subsets of the total data sample, though we discuss how the accuracy of subset IRR values may depend on aspects of the dataset such as total sample size and coding methodology.
ISSN:1071-1813
2169-5067
DOI:10.1177/1071181320641329