Evaluating regression algorithms at the instance level using item response theory

Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorith...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2022-03, Vol.240, p.108076, Article 108076
Hauptverfasser: Moraes, João V.C., Reinaldo, Jéssica T.S., Ferreira-Junior, Manuel, Filho, Telmo Silva, Prudêncio, Ricardo B.C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 108076
container_title Knowledge-based systems
container_volume 240
creator Moraes, João V.C.
Reinaldo, Jéssica T.S.
Ferreira-Junior, Manuel
Filho, Telmo Silva
Prudêncio, Ricardo B.C.
description Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.
doi_str_mv 10.1016/j.knosys.2021.108076
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2639693732</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705121011515</els_id><sourcerecordid>2639693732</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</originalsourceid><addsrcrecordid>eNp9kElrwzAQhUVpoenyD3ow9OxUizddCiWkCwRKoT0LWR4nch0r1ciB_PvKuOeeZhjee8P7CLljdMkoKx665ffg8IRLTjmLp4qWxRlZsKrkaZlReU4WVOY0LWnOLskVYkcp5ZxVC_KxPup-1MEO28TD1gOidUOi-63zNuz2mOiQhB0kdsCgBwNJD0fokxEnhw2wjzY8uAFhkjl_uiEXre4Rbv_mNfl6Xn-uXtPN-8vb6mmTGiGykJaVliIDI1qqoa14XFue1aYxRuRNLWqWiVrKjJacCmZYzqqa56bJZCuE4bW4Jvdz7sG7nxEwqM6NfogvFS-ELKQoBY-qbFYZ7xA9tOrg7V77k2JUTfBUp2Z4aoKnZnjR9jjbIDY4WvAKjYVYv7EeTFCNs_8H_AI0SHtW</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2639693732</pqid></control><display><type>article</type><title>Evaluating regression algorithms at the instance level using item response theory</title><source>Elsevier ScienceDirect Journals</source><creator>Moraes, João V.C. ; Reinaldo, Jéssica T.S. ; Ferreira-Junior, Manuel ; Filho, Telmo Silva ; Prudêncio, Ricardo B.C.</creator><creatorcontrib>Moraes, João V.C. ; Reinaldo, Jéssica T.S. ; Ferreira-Junior, Manuel ; Filho, Telmo Silva ; Prudêncio, Ricardo B.C.</creatorcontrib><description>Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2021.108076</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Datasets ; Discrimination ; Errors ; Item response theory ; Machine learning ; Parameterization ; Performance evaluation ; Probability distribution functions ; Regression models ; Regression tasks ; Statistical analysis ; Student ability ; Upper bounds</subject><ispartof>Knowledge-based systems, 2022-03, Vol.240, p.108076, Article 108076</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Mar 15, 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</citedby><cites>FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</cites><orcidid>0000-0003-0826-6885</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705121011515$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Moraes, João V.C.</creatorcontrib><creatorcontrib>Reinaldo, Jéssica T.S.</creatorcontrib><creatorcontrib>Ferreira-Junior, Manuel</creatorcontrib><creatorcontrib>Filho, Telmo Silva</creatorcontrib><creatorcontrib>Prudêncio, Ricardo B.C.</creatorcontrib><title>Evaluating regression algorithms at the instance level using item response theory</title><title>Knowledge-based systems</title><description>Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.</description><subject>Algorithms</subject><subject>Datasets</subject><subject>Discrimination</subject><subject>Errors</subject><subject>Item response theory</subject><subject>Machine learning</subject><subject>Parameterization</subject><subject>Performance evaluation</subject><subject>Probability distribution functions</subject><subject>Regression models</subject><subject>Regression tasks</subject><subject>Statistical analysis</subject><subject>Student ability</subject><subject>Upper bounds</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kElrwzAQhUVpoenyD3ow9OxUizddCiWkCwRKoT0LWR4nch0r1ciB_PvKuOeeZhjee8P7CLljdMkoKx665ffg8IRLTjmLp4qWxRlZsKrkaZlReU4WVOY0LWnOLskVYkcp5ZxVC_KxPup-1MEO28TD1gOidUOi-63zNuz2mOiQhB0kdsCgBwNJD0fokxEnhw2wjzY8uAFhkjl_uiEXre4Rbv_mNfl6Xn-uXtPN-8vb6mmTGiGykJaVliIDI1qqoa14XFue1aYxRuRNLWqWiVrKjJacCmZYzqqa56bJZCuE4bW4Jvdz7sG7nxEwqM6NfogvFS-ELKQoBY-qbFYZ7xA9tOrg7V77k2JUTfBUp2Z4aoKnZnjR9jjbIDY4WvAKjYVYv7EeTFCNs_8H_AI0SHtW</recordid><startdate>20220315</startdate><enddate>20220315</enddate><creator>Moraes, João V.C.</creator><creator>Reinaldo, Jéssica T.S.</creator><creator>Ferreira-Junior, Manuel</creator><creator>Filho, Telmo Silva</creator><creator>Prudêncio, Ricardo B.C.</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0826-6885</orcidid></search><sort><creationdate>20220315</creationdate><title>Evaluating regression algorithms at the instance level using item response theory</title><author>Moraes, João V.C. ; Reinaldo, Jéssica T.S. ; Ferreira-Junior, Manuel ; Filho, Telmo Silva ; Prudêncio, Ricardo B.C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Datasets</topic><topic>Discrimination</topic><topic>Errors</topic><topic>Item response theory</topic><topic>Machine learning</topic><topic>Parameterization</topic><topic>Performance evaluation</topic><topic>Probability distribution functions</topic><topic>Regression models</topic><topic>Regression tasks</topic><topic>Statistical analysis</topic><topic>Student ability</topic><topic>Upper bounds</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Moraes, João V.C.</creatorcontrib><creatorcontrib>Reinaldo, Jéssica T.S.</creatorcontrib><creatorcontrib>Ferreira-Junior, Manuel</creatorcontrib><creatorcontrib>Filho, Telmo Silva</creatorcontrib><creatorcontrib>Prudêncio, Ricardo B.C.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Moraes, João V.C.</au><au>Reinaldo, Jéssica T.S.</au><au>Ferreira-Junior, Manuel</au><au>Filho, Telmo Silva</au><au>Prudêncio, Ricardo B.C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating regression algorithms at the instance level using item response theory</atitle><jtitle>Knowledge-based systems</jtitle><date>2022-03-15</date><risdate>2022</risdate><volume>240</volume><spage>108076</spage><pages>108076-</pages><artnum>108076</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2021.108076</doi><orcidid>https://orcid.org/0000-0003-0826-6885</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0950-7051
ispartof Knowledge-based systems, 2022-03, Vol.240, p.108076, Article 108076
issn 0950-7051
1872-7409
language eng
recordid cdi_proquest_journals_2639693732
source Elsevier ScienceDirect Journals
subjects Algorithms
Datasets
Discrimination
Errors
Item response theory
Machine learning
Parameterization
Performance evaluation
Probability distribution functions
Regression models
Regression tasks
Statistical analysis
Student ability
Upper bounds
title Evaluating regression algorithms at the instance level using item response theory
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T08%3A14%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20regression%20algorithms%20at%20the%20instance%20level%20using%20item%20response%20theory&rft.jtitle=Knowledge-based%20systems&rft.au=Moraes,%20Jo%C3%A3o%20V.C.&rft.date=2022-03-15&rft.volume=240&rft.spage=108076&rft.pages=108076-&rft.artnum=108076&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2021.108076&rft_dat=%3Cproquest_cross%3E2639693732%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2639693732&rft_id=info:pmid/&rft_els_id=S0950705121011515&rfr_iscdi=true