Evaluating regression algorithms at the instance level using item response theory

Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorith...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2022-03, Vol.240, p.108076, Article 108076
Hauptverfasser: Moraes, João V.C., Reinaldo, Jéssica T.S., Ferreira-Junior, Manuel, Filho, Telmo Silva, Prudêncio, Ricardo B.C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2021.108076