Item response theory model highlighting rating scale of a rubric and rater-rubric interaction in objective structured clinical examination

Objective structured clinical examinations (OSCEs) are a widely used performance assessment for medical and dental students. A common limitation of OSCEs is that the evaluation results depend on the characteristics of raters and a scoring rubric. To overcome this limitation, item response theory (IR...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2024-09, Vol.19 (9), p.e0309887
Hauptverfasser:	Uto, Masaki, Tsuruta, Jun, Araki, Kouji, Ueno, Maomi
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Clinical Competence Educational aspects Educational Measurement - methods Educational research Evaluation Humans Interaction parameters Item response theory Methods Models, Theoretical Performance assessment Performance evaluation Periodic health examinations Physical diagnosis Rating scales Rubrics (Education) Students Students, Dental Students, Medical
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Objective structured clinical examinations (OSCEs) are a widely used performance assessment for medical and dental students. A common limitation of OSCEs is that the evaluation results depend on the characteristics of raters and a scoring rubric. To overcome this limitation, item response theory (IRT) models such as the many-facet Rasch model have been proposed to estimate examinee abilities while taking into account the characteristics of raters and evaluation items in a rubric. However, conventional IRT models have two impractical assumptions: constant rater severity across all evaluation items in a rubric and an equal interval rating scale among evaluation items, which can decrease model fitting and ability measurement accuracy. To resolve this problem, we propose a new IRT model that introduces two parameters: (1) a rater-item interaction parameter representing the rater severity for each evaluation item and (2) an item-specific step-difficulty parameter representing the difference in rating scales among evaluation items. We demonstrate the effectiveness of the proposed model by applying it to actual data collected from a medical interview test conducted at Tokyo Medical and Dental University as part of a post-clinical clerkship OSCE. The experimental results showed that the proposed model was well-fitted to our OSCE data and measured ability accurately. Furthermore, it provided abundant information on rater and item characteristics that conventional models cannot, helping us to better understand rater and item properties.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0309887