Inter-Rater Reliability of Grading Undergraduate Portfolios in Veterinary Medical Education

The reliability of high-stakes assessment of portfolios containing an aggregation of quantitative and qualitative data based on programmatic assessment is under debate, especially when multiple assessors are involved. In this study carried out at the Faculty of Veterinary Medicine, Utrecht Universit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of veterinary medical education 2019, Vol.46 (4), p.1-422
Hauptverfasser: Favier, Robert P, Vernooij, Johannes C M, Jonker, F Herman, Bok, Harold G
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The reliability of high-stakes assessment of portfolios containing an aggregation of quantitative and qualitative data based on programmatic assessment is under debate, especially when multiple assessors are involved. In this study carried out at the Faculty of Veterinary Medicine, Utrecht University, the Netherlands, two independent assessors graded the portfolios of students in their second year of the 3-year clinical phase. The similarity of grades (i.e., equal grades) and the level of the grades were studied to estimate inter-rater reliability, taking into account the potential effects of the assessor's background (i.e., originating from a clinical or non-clinical department) and student's cohort group, gender, and chosen master track (Companion Animal Health, Equine Health, or Farm Animal/Public Health). Whereas the similarity between the two grades increased from 58% in the first year the grading system was introduced to around 80% afterwards, the grade level was lower over the next 3 years. The assessor's background had a minor effect on the proportion of similar grades, as well as on grading level. The assessor intraclass correlation was low (i.e., all assessors scored with a similar grading pattern [same range of grades]). The grades awarded to female students were higher but more often dissimilar. We conclude that the grading system was well implemented and has a high inter-rater reliability.
ISSN:0748-321X
1943-7218
DOI:10.3138/jvme.0917-128r1