Interrater Reliability for Multilevel Data: A Generalizability Theory Approach

Current interrater reliability (IRR) coefficients ignore the nested structure of multilevel observational data, resulting in biased estimates of both subject- and cluster-level IRR. We used generalizability theory to provide a conceptualization and estimation method for IRR of continuous multilevel...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Psychological methods 2022-08, Vol.27 (4), p.650-666
Hauptverfasser:	ten Hove, Debby, Jorgensen, Terrence D., van der Ark, L. Andries
Format:	Artikel
Sprache:	eng
Schlagworte:	Bayesian Analysis Estimation Experimental Design Human Interrater Reliability Markov Chains Simulation Statistical Probability
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Current interrater reliability (IRR) coefficients ignore the nested structure of multilevel observational data, resulting in biased estimates of both subject- and cluster-level IRR. We used generalizability theory to provide a conceptualization and estimation method for IRR of continuous multilevel observational data. We explain how generalizability theory decomposes the variance of multilevel observational data into subject-, cluster-, and rater-related components, which can be estimated using Markov chain Monte Carlo (MCMC) estimation. We explain how IRR coefficients for each level can be derived from these variance components, and how they can be estimated as intraclass correlation coefficients (ICC). We assessed the quality of MCMC point and interval estimates with a simulation study, and showed that small numbers of raters were the main source of bias and inefficiency of the ICCs. In a follow-up simulation, we showed that a planned missing data design can diminish most estimation difficulties in these conditions, yielding a useful approach to estimating multilevel interrater reliability for most social and behavioral research. We illustrated the method using data on student-teacher relationships. All software code and data used for this article is available on the Open Science Framework: https://osf.io/bwk5t/. Translational Abstract Observational studies in social and behavioral science often have a multilevel structure, with subjects nested within clusters. To inspect the quality of rating procedures and improve these where necessary, interrater reliability (IRR) should then be defined for the subject-level and cluster-level of the data separately. In this article, we propose a method to assess IRR for multilevel continuous ratings provided by two or more raters. We explain how generalizability theory can be used to decompose the variance of multilevel observational data into subject-, cluster-, and rater-related components. We explain how IRR coefficients for each level can be derived from these variance components, and how they can be estimated as intraclass correlation coefficients (ICC). We assessed the quality of the proposed estimation procedure with a simulation study, and showed that small numbers of raters were the main source of bias and inefficiency of the ICCs. In a follow-up simulation, we showed that a planned missing data design can diminish most estimation difficulties of the ICCs, yielding a useful approach to estimating multilevel i
ISSN:	1082-989X 1939-1463
DOI:	10.1037/met0000391