Evaluating regression algorithms at the instance level using item response theory
Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorith...
Gespeichert in:
Veröffentlicht in: | Knowledge-based systems 2022-03, Vol.240, p.108076, Article 108076 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 108076 |
container_title | Knowledge-based systems |
container_volume | 240 |
creator | Moraes, João V.C. Reinaldo, Jéssica T.S. Ferreira-Junior, Manuel Filho, Telmo Silva Prudêncio, Ricardo B.C. |
description | Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination. |
doi_str_mv | 10.1016/j.knosys.2021.108076 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2639693732</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705121011515</els_id><sourcerecordid>2639693732</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</originalsourceid><addsrcrecordid>eNp9kElrwzAQhUVpoenyD3ow9OxUizddCiWkCwRKoT0LWR4nch0r1ciB_PvKuOeeZhjee8P7CLljdMkoKx665ffg8IRLTjmLp4qWxRlZsKrkaZlReU4WVOY0LWnOLskVYkcp5ZxVC_KxPup-1MEO28TD1gOidUOi-63zNuz2mOiQhB0kdsCgBwNJD0fokxEnhw2wjzY8uAFhkjl_uiEXre4Rbv_mNfl6Xn-uXtPN-8vb6mmTGiGykJaVliIDI1qqoa14XFue1aYxRuRNLWqWiVrKjJacCmZYzqqa56bJZCuE4bW4Jvdz7sG7nxEwqM6NfogvFS-ELKQoBY-qbFYZ7xA9tOrg7V77k2JUTfBUp2Z4aoKnZnjR9jjbIDY4WvAKjYVYv7EeTFCNs_8H_AI0SHtW</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2639693732</pqid></control><display><type>article</type><title>Evaluating regression algorithms at the instance level using item response theory</title><source>Elsevier ScienceDirect Journals</source><creator>Moraes, João V.C. ; Reinaldo, Jéssica T.S. ; Ferreira-Junior, Manuel ; Filho, Telmo Silva ; Prudêncio, Ricardo B.C.</creator><creatorcontrib>Moraes, João V.C. ; Reinaldo, Jéssica T.S. ; Ferreira-Junior, Manuel ; Filho, Telmo Silva ; Prudêncio, Ricardo B.C.</creatorcontrib><description>Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2021.108076</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Datasets ; Discrimination ; Errors ; Item response theory ; Machine learning ; Parameterization ; Performance evaluation ; Probability distribution functions ; Regression models ; Regression tasks ; Statistical analysis ; Student ability ; Upper bounds</subject><ispartof>Knowledge-based systems, 2022-03, Vol.240, p.108076, Article 108076</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Mar 15, 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</citedby><cites>FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</cites><orcidid>0000-0003-0826-6885</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705121011515$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Moraes, João V.C.</creatorcontrib><creatorcontrib>Reinaldo, Jéssica T.S.</creatorcontrib><creatorcontrib>Ferreira-Junior, Manuel</creatorcontrib><creatorcontrib>Filho, Telmo Silva</creatorcontrib><creatorcontrib>Prudêncio, Ricardo B.C.</creatorcontrib><title>Evaluating regression algorithms at the instance level using item response theory</title><title>Knowledge-based systems</title><description>Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.</description><subject>Algorithms</subject><subject>Datasets</subject><subject>Discrimination</subject><subject>Errors</subject><subject>Item response theory</subject><subject>Machine learning</subject><subject>Parameterization</subject><subject>Performance evaluation</subject><subject>Probability distribution functions</subject><subject>Regression models</subject><subject>Regression tasks</subject><subject>Statistical analysis</subject><subject>Student ability</subject><subject>Upper bounds</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kElrwzAQhUVpoenyD3ow9OxUizddCiWkCwRKoT0LWR4nch0r1ciB_PvKuOeeZhjee8P7CLljdMkoKx665ffg8IRLTjmLp4qWxRlZsKrkaZlReU4WVOY0LWnOLskVYkcp5ZxVC_KxPup-1MEO28TD1gOidUOi-63zNuz2mOiQhB0kdsCgBwNJD0fokxEnhw2wjzY8uAFhkjl_uiEXre4Rbv_mNfl6Xn-uXtPN-8vb6mmTGiGykJaVliIDI1qqoa14XFue1aYxRuRNLWqWiVrKjJacCmZYzqqa56bJZCuE4bW4Jvdz7sG7nxEwqM6NfogvFS-ELKQoBY-qbFYZ7xA9tOrg7V77k2JUTfBUp2Z4aoKnZnjR9jjbIDY4WvAKjYVYv7EeTFCNs_8H_AI0SHtW</recordid><startdate>20220315</startdate><enddate>20220315</enddate><creator>Moraes, João V.C.</creator><creator>Reinaldo, Jéssica T.S.</creator><creator>Ferreira-Junior, Manuel</creator><creator>Filho, Telmo Silva</creator><creator>Prudêncio, Ricardo B.C.</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0826-6885</orcidid></search><sort><creationdate>20220315</creationdate><title>Evaluating regression algorithms at the instance level using item response theory</title><author>Moraes, João V.C. ; Reinaldo, Jéssica T.S. ; Ferreira-Junior, Manuel ; Filho, Telmo Silva ; Prudêncio, Ricardo B.C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Datasets</topic><topic>Discrimination</topic><topic>Errors</topic><topic>Item response theory</topic><topic>Machine learning</topic><topic>Parameterization</topic><topic>Performance evaluation</topic><topic>Probability distribution functions</topic><topic>Regression models</topic><topic>Regression tasks</topic><topic>Statistical analysis</topic><topic>Student ability</topic><topic>Upper bounds</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Moraes, João V.C.</creatorcontrib><creatorcontrib>Reinaldo, Jéssica T.S.</creatorcontrib><creatorcontrib>Ferreira-Junior, Manuel</creatorcontrib><creatorcontrib>Filho, Telmo Silva</creatorcontrib><creatorcontrib>Prudêncio, Ricardo B.C.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Moraes, João V.C.</au><au>Reinaldo, Jéssica T.S.</au><au>Ferreira-Junior, Manuel</au><au>Filho, Telmo Silva</au><au>Prudêncio, Ricardo B.C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating regression algorithms at the instance level using item response theory</atitle><jtitle>Knowledge-based systems</jtitle><date>2022-03-15</date><risdate>2022</risdate><volume>240</volume><spage>108076</spage><pages>108076-</pages><artnum>108076</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2021.108076</doi><orcidid>https://orcid.org/0000-0003-0826-6885</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0950-7051 |
ispartof | Knowledge-based systems, 2022-03, Vol.240, p.108076, Article 108076 |
issn | 0950-7051 1872-7409 |
language | eng |
recordid | cdi_proquest_journals_2639693732 |
source | Elsevier ScienceDirect Journals |
subjects | Algorithms Datasets Discrimination Errors Item response theory Machine learning Parameterization Performance evaluation Probability distribution functions Regression models Regression tasks Statistical analysis Student ability Upper bounds |
title | Evaluating regression algorithms at the instance level using item response theory |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T08%3A14%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20regression%20algorithms%20at%20the%20instance%20level%20using%20item%20response%20theory&rft.jtitle=Knowledge-based%20systems&rft.au=Moraes,%20Jo%C3%A3o%20V.C.&rft.date=2022-03-15&rft.volume=240&rft.spage=108076&rft.pages=108076-&rft.artnum=108076&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2021.108076&rft_dat=%3Cproquest_cross%3E2639693732%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2639693732&rft_id=info:pmid/&rft_els_id=S0950705121011515&rfr_iscdi=true |