The accuracy of interrater reliability estimates found using a subset of the total data sample: A bootstrap analysis
Interrater reliability (IRR) assesses the stability of a coding protocol over time and across coders. For practical reasons, it is often difficult to assess IRR for an entire dataset, so researchers sometimes calculate the IRR for a subset of the total data sample. The purpose of this study is to in...
Gespeichert in:
Veröffentlicht in: | Proceedings of the Human Factors and Ergonomics Society Annual Meeting 2020-12, Vol.64 (1), p.1377-1382 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Interrater reliability (IRR) assesses the stability of a coding protocol over time and across coders. For practical reasons, it is often difficult to assess IRR for an entire dataset, so researchers sometimes calculate the IRR for a subset of the total data sample. The purpose of this study is to investigate the accuracy of such subset IRRs. Using bootstrapping, we determined the effects of sample size (10%, 25%, & 40% of the total dataset) and IRR measure type (percent agreement, Krippendorff’s alpha, & the G Index) on the bias and percent error of subset IRRs. Results support the use of calculating IRR from subsets of the total data sample, though we discuss how the accuracy of subset IRR values may depend on aspects of the dataset such as total sample size and coding methodology. |
---|---|
ISSN: | 1071-1813 2169-5067 |
DOI: | 10.1177/1071181320641329 |