Reviewing and analyzing peer review Inter-Rater Reliability in a MOOC platform

Peer assessment activities might be one of the few personalized assessment alternatives to the implementation of auto-graded activities at scale in Massive Open Online Course (MOOC) environments. However, teacher's motivation to implement peer assessment activities in their courses might go bey...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers and education 2020-09, Vol.154, p.103894, Article 103894
Hauptverfasser:	Garcia-Loro, Felix, Martin, Sergio, Ruipérez-Valiente, José A., Sancristobal, Elio, Castro, Manuel
Format:	Artikel
Sprache:	eng
Schlagworte:	Inter-rater reliability (IRR) Krippendorff's alpha MOOCs Peer assessment Reliability
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Peer assessment activities might be one of the few personalized assessment alternatives to the implementation of auto-graded activities at scale in Massive Open Online Course (MOOC) environments. However, teacher's motivation to implement peer assessment activities in their courses might go beyond the most straightforward goal (i.e., assessment), as peer assessment activities also have other side benefits, such as showing evidence and enhancing the critical thinking, comprehension or writing capabilities of students. However, one of the main drawbacks of implementing peer review activities, especially when the scoring is meant to be used as part of the summative assessment, is that it adds a high degree of uncertainty to the grades. Motivated by this issue, this paper analyses the reliability of all the peer assessment activities performed as part of the MOOC platform of the Spanish University for Distance Education (UNED) UNED-COMA. The following study has analyzed 63 peer assessment activities from the different courses in the platform, and includes a total of 27,745 validated tasks and 93,334 peer reviews. Based on the Krippendorff's alpha statistic, which measures the agreement reached between the reviewers, the results obtained clearly point out the low reliability, and therefore, the low validity of this dataset of peer reviews. We did not find that factors such as the topic of the course, number of raters or number of criteria to be evaluated had a significant effect on reliability. We compare our results with other studies, discuss about the potential implications of this low reliability for summative assessment, and provide some recommendations to maximize the benefit of implementing peer activities in online courses. •Based on Krippendorff's alpha, we have analyzed the reliability of peer assessment in MOOCs.•The dataset includes 63 activities, 27,745 validated tasks and 93,334 reviews.•The results obtained clearly point out the low reliability, thus low validity.•There is not relationship between the topic of the course and the reliability.•Courses with more than six tasks present an improvement in the reliability.
ISSN:	0360-1315 1873-782X
DOI:	10.1016/j.compedu.2020.103894