Measuring and correcting staff variability in large-scale OSCEs

Objective Structured Clinical Examinations (OSCEs) are an increasingly popular evaluation modality for medical students. While the face-to-face interaction allows for more in-depth assessment, it may cause standardization problems. Methods to quantify, limit or adjust for examiner effects are needed...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	BMC medical education 2024-07, Vol.24 (1), p.817-11, Article 817
Hauptverfasser:	Haviari, Skerdi, de Tymowski, Christian, Burnichon, Nelly, Lemogne, Cédric, Flamant, Martin, Ruszniewski, Philippe, Bensaadi, Saja, Mercier, Gregory, Hamaoui, Hasséne, Mirault, Tristan, Faye, Albert, Bouzid, Donia
Format:	Artikel
Sprache:	eng
Schlagworte:	Clinical Competence - standards Down Syndrome Education, Medical, Undergraduate - standards Educational Measurement - methods Educational Measurement - standards Evaluation Familiarity Fractures Grading Humans Inter-rater variability Likert Scales Medical education Medical schools Medical students Methods Objective tests Observer Variation OSCE Paris Psychometrics Reproducibility of Results Score variability Skills Statistical analysis Students Students, Medical Summative Evaluation Teachers Validity Wages & salaries
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Objective Structured Clinical Examinations (OSCEs) are an increasingly popular evaluation modality for medical students. While the face-to-face interaction allows for more in-depth assessment, it may cause standardization problems. Methods to quantify, limit or adjust for examiner effects are needed. Data originated from 3 OSCEs undergone by 900-student classes of 5 - and 6 -year medical students at Université Paris Cité in the 2022-2023 academic year. Sessions had five stations each, and one of the three sessions was scored by consensus by two raters (rather than one). We report OSCEs' longitudinal consistency for one of the classes and staff-related and student variability by session. We also propose a statistical method to adjust for inter-rater variability by deriving a statistical random student effect that accounts for staff-related and station random effects. From the four sessions, a total of 16,910 station scores were collected from 2615 student sessions, with two of the sessions undergone by the same students, and 36, 36, 35 and 20 distinct staff teams in each station for each session. Scores had staff-related heterogeneity (p
ISSN:	1472-6920 1472-6920
DOI:	10.1186/s12909-024-05803-6