Building reliable and generalizable clerkship competency assessments: Impact of 'hawk-dove' correction

Systematic differences among raters' approaches to student assessment may result in leniency or stringency of assessment scores. This study examines the generalizability of medical student workplace-based competency assessments including the impact of rater-adjusted scores for leniency and stri...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Medical teacher 2021-12, Vol.43 (12), p.1374-1380
Hauptverfasser:	Santen, Sally A., Ryan, Michael, Helou, Marieka A., Richards, Alicia, Perera, Robert A., Haley, Kellen, Bradner, Melissa, Rigby, Fidelma B., Park, Yoon Soo
Format:	Artikel
Sprache:	eng
Schlagworte:	assessment Assessors Clinical Clinical Competence Competence Educational Assessment Educational Measurement Error of Measurement Evaluation Generalizability Generalizability Theory Humans Leniency Measurement Medical students medicine Psychometrics Reliability Reproducibility of Results Student Evaluation Students, Medical undergraduate Workplaces
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Systematic differences among raters' approaches to student assessment may result in leniency or stringency of assessment scores. This study examines the generalizability of medical student workplace-based competency assessments including the impact of rater-adjusted scores for leniency and stringency. Data were collected from summative clerkship assessments completed for 204 students during 2017-2018 the clerkship at a single institution. Generalizability theory was used to explore variance attributed to different facets (rater, learner, item, and competency domain) through three unbalanced random-effects models by clerkship including applying assessor stringency-leniency adjustments. In the original assessments, only 4-8% of the variance was attributed to the student with the remainder being rater variance and error. Aggregating items to create a composite score increased variability attributable to the student (5-13% of variance). Applying a stringency-leniency ('hawk-dove') correction substantially increased the variance attributed to the student (14.8-17.8%) and reliability. Controlling for assessor leniency/stringency reduced measurement error, decreasing the number of assessments required for generalizability from 16-50 to 11-14. Similar to prior research, most of the variance in competency assessment scores was attributable to raters, with only a small proportion attributed to the student. Making stringency-leniency corrections using rater-adjusted scores improved the psychometric characteristics of assessment scores.
ISSN:	0142-159X 1466-187X
DOI:	10.1080/0142159X.2021.1948519