Evaluating regression algorithms at the instance level using item response theory

Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorith...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2022-03, Vol.240, p.108076, Article 108076
Hauptverfasser:	Moraes, João V.C., Reinaldo, Jéssica T.S., Ferreira-Junior, Manuel, Filho, Telmo Silva, Prudêncio, Ricardo B.C.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Datasets Discrimination Errors Item response theory Machine learning Parameterization Performance evaluation Probability distribution functions Regression models Regression tasks Statistical analysis Student ability Upper bounds
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	108076
container_title	Knowledge-based systems
container_volume	240
creator	Moraes, João V.C. Reinaldo, Jéssica T.S. Ferreira-Junior, Manuel Filho, Telmo Silva Prudêncio, Ricardo B.C.
description	Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.
doi_str_mv	10.1016/j.knosys.2021.108076
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2639693732</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705121011515</els_id><sourcerecordid>2639693732</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</originalsourceid><addsrcrecordid>eNp9kElrwzAQhUVpoenyD3ow9OxUizddCiWkCwRKoT0LWR4nch0r1ciB_PvKuOeeZhjee8P7CLljdMkoKx665ffg8IRLTjmLp4qWxRlZsKrkaZlReU4WVOY0LWnOLskVYkcp5ZxVC_KxPup-1MEO28TD1gOidUOi-63zNuz2mOiQhB0kdsCgBwNJD0fokxEnhw2wjzY8uAFhkjl_uiEXre4Rbv_mNfl6Xn-uXtPN-8vb6mmTGiGykJaVliIDI1qqoa14XFue1aYxRuRNLWqWiVrKjJacCmZYzqqa56bJZCuE4bW4Jvdz7sG7nxEwqM6NfogvFS-ELKQoBY-qbFYZ7xA9tOrg7V77k2JUTfBUp2Z4aoKnZnjR9jjbIDY4WvAKjYVYv7EeTFCNs_8H_AI0SHtW</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2639693732</pqid></control><display><type>article</type><title>Evaluating regression algorithms at the instance level using item response theory</title><source>Elsevier ScienceDirect Journals</source><creator>Moraes, João V.C. ; Reinaldo, Jéssica T.S. ; Ferreira-Junior, Manuel ; Filho, Telmo Silva ; Prudêncio, Ricardo B.C.</creator><creatorcontrib>Moraes, João V.C. ; Reinaldo, Jéssica T.S. ; Ferreira-Junior, Manuel ; Filho, Telmo Silva ; Prudêncio, Ricardo B.C.</creatorcontrib><description>Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2021.108076</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Datasets ; Discrimination ; Errors ; Item response theory ; Machine learning ; Parameterization ; Performance evaluation ; Probability distribution functions ; Regression models ; Regression tasks ; Statistical analysis ; Student ability ; Upper bounds</subject><ispartof>Knowledge-based systems, 2022-03, Vol.240, p.108076, Article 108076</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Mar 15, 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</citedby><cites>FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</cites><orcidid>0000-0003-0826-6885</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705121011515$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Moraes, João V.C.</creatorcontrib><creatorcontrib>Reinaldo, Jéssica T.S.</creatorcontrib><creatorcontrib>Ferreira-Junior, Manuel</creatorcontrib><creatorcontrib>Filho, Telmo Silva</creatorcontrib><creatorcontrib>Prudêncio, Ricardo B.C.</creatorcontrib><title>Evaluating regression algorithms at the instance level using item response theory</title><title>Knowledge-based systems</title><description>Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.</description><subject>Algorithms</subject><subject>Datasets</subject><subject>Discrimination</subject><subject>Errors</subject><subject>Item response theory</subject><subject>Machine learning</subject><subject>Parameterization</subject><subject>Performance evaluation</subject><subject>Probability distribution functions</subject><subject>Regression models</subject><subject>Regression tasks</subject><subject>Statistical analysis</subject><subject>Student ability</subject><subject>Upper bounds</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kElrwzAQhUVpoenyD3ow9OxUizddCiWkCwRKoT0LWR4nch0r1ciB_PvKuOeeZhjee8P7CLljdMkoKx665ffg8IRLTjmLp4qWxRlZsKrkaZlReU4WVOY0LWnOLskVYkcp5ZxVC_KxPup-1MEO28TD1gOidUOi-63zNuz2mOiQhB0kdsCgBwNJD0fokxEnhw2wjzY8uAFhkjl_uiEXre4Rbv_mNfl6Xn-uXtPN-8vb6mmTGiGykJaVliIDI1qqoa14XFue1aYxRuRNLWqWiVrKjJacCmZYzqqa56bJZCuE4bW4Jvdz7sG7nxEwqM6NfogvFS-ELKQoBY-qbFYZ7xA9tOrg7V77k2JUTfBUp2Z4aoKnZnjR9jjbIDY4WvAKjYVYv7EeTFCNs_8H_AI0SHtW</recordid><startdate>20220315</startdate><enddate>20220315</enddate><creator>Moraes, João V.C.</creator><creator>Reinaldo, Jéssica T.S.</creator><creator>Ferreira-Junior, Manuel</creator><creator>Filho, Telmo Silva</creator><creator>Prudêncio, Ricardo B.C.</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0826-6885</orcidid></search><sort><creationdate>20220315</creationdate><title>Evaluating regression algorithms at the instance level using item response theory</title><author>Moraes, João V.C. ; Reinaldo, Jéssica T.S. ; Ferreira-Junior, Manuel ; Filho, Telmo Silva ; Prudêncio, Ricardo B.C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-78a934ec3f0aef824ecf24bcdcc35db3b143b994072031c1518b25cd49f33c2b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Datasets</topic><topic>Discrimination</topic><topic>Errors</topic><topic>Item response theory</topic><topic>Machine learning</topic><topic>Parameterization</topic><topic>Performance evaluation</topic><topic>Probability distribution functions</topic><topic>Regression models</topic><topic>Regression tasks</topic><topic>Statistical analysis</topic><topic>Student ability</topic><topic>Upper bounds</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Moraes, João V.C.</creatorcontrib><creatorcontrib>Reinaldo, Jéssica T.S.</creatorcontrib><creatorcontrib>Ferreira-Junior, Manuel</creatorcontrib><creatorcontrib>Filho, Telmo Silva</creatorcontrib><creatorcontrib>Prudêncio, Ricardo B.C.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Moraes, João V.C.</au><au>Reinaldo, Jéssica T.S.</au><au>Ferreira-Junior, Manuel</au><au>Filho, Telmo Silva</au><au>Prudêncio, Ricardo B.C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating regression algorithms at the instance level using item response theory</atitle><jtitle>Knowledge-based systems</jtitle><date>2022-03-15</date><risdate>2022</risdate><volume>240</volume><spage>108076</spage><pages>108076-</pages><artnum>108076</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Algorithm evaluation is a very important task for different Machine Learning (ML) problems. In this work, we assume a different perspective in ML evaluation, in which algorithms are evaluated at the instance level. In this perspective, Item Response Theory (IRT) has recently been applied to algorithm evaluation in ML in order to identify which instances are more difficult and discriminating in a dataset, while also evaluating algorithms based on their predictions for instances with different difficulty values. In IRT, a strong algorithm returns accurate predictions for the most difficult instances, while maintaining a consistent behaviour in the easiest instances. The most common IRT models adopted in the literature only deal with dichotomous responses (i.e., a response has to be either correct or incorrect). This is suitable for evaluating classification algorithms, but not adequate in application contexts where responses are recorded in a continuous scale without an upper bound, such as regression. In this paper we propose the Γ-IRT model, particularly designed for dealing with positive unbounded responses, which we model using a Gamma distribution, parameterised according to respondent ability and item difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the traditional logistic curves adopted in IRT. We apply the proposed model to evaluate student responses (number of errors) in open-ended questions extracted from Statistics exams. Then, we use Γ-IRT to assess regression model abilities, where responses are the absolute errors in test instances. This novel application represents an alternative for evaluating regression performance and for identifying regions in a regression dataset that present different levels of difficulty and discrimination.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2021.108076</doi><orcidid>https://orcid.org/0000-0003-0826-6885</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0950-7051
ispartof	Knowledge-based systems, 2022-03, Vol.240, p.108076, Article 108076
issn	0950-7051 1872-7409
language	eng
recordid	cdi_proquest_journals_2639693732
source	Elsevier ScienceDirect Journals
subjects	Algorithms Datasets Discrimination Errors Item response theory Machine learning Parameterization Performance evaluation Probability distribution functions Regression models Regression tasks Statistical analysis Student ability Upper bounds
title	Evaluating regression algorithms at the instance level using item response theory
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T08%3A14%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20regression%20algorithms%20at%20the%20instance%20level%20using%20item%20response%20theory&rft.jtitle=Knowledge-based%20systems&rft.au=Moraes,%20Jo%C3%A3o%20V.C.&rft.date=2022-03-15&rft.volume=240&rft.spage=108076&rft.pages=108076-&rft.artnum=108076&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2021.108076&rft_dat=%3Cproquest_cross%3E2639693732%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2639693732&rft_id=info:pmid/&rft_els_id=S0950705121011515&rfr_iscdi=true