BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration

In this paper, we present a complete framework for quickly calibrating and administering a robust large-scale computerized adaptive test (CAT) with a small number of responses. Calibration - learning item parameters in a test - is done using AutoIRT, a new method that uses automated machine learning...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-10
Hauptverfasser: Sharpnack, James, Hao, Kevin, Mulcaire, Phoebe, Bicknell, Klinton, LaFlair, Geoff, Yancey, Kevin, von Davier, Alina A
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Sharpnack, James
Hao, Kevin
Mulcaire, Phoebe
Bicknell, Klinton
LaFlair, Geoff
Yancey, Kevin
von Davier, Alina A
description In this paper, we present a complete framework for quickly calibrating and administering a robust large-scale computerized adaptive test (CAT) with a small number of responses. Calibration - learning item parameters in a test - is done using AutoIRT, a new method that uses automated machine learning (AutoML) in combination with item response theory (IRT), originally proposed in [Sharpnack et al., 2024]. AutoIRT trains a non-parametric AutoML grading model using item features, followed by an item-specific parametric model, which results in an explanatory IRT model. In our work, we use tabular AutoML tools (AutoGluon.tabular, [Erickson et al., 2020]) along with BERT embeddings and linguistically motivated NLP features. In this framework, we use Bayesian updating to obtain test taker ability posterior distributions for administration and scoring. For administration of our adaptive test, we propose the BanditCAT framework, a methodology motivated by casting the problem in the contextual bandit framework and utilizing item response theory (IRT). The key insight lies in defining the bandit reward as the Fisher information for the selected item, given the latent test taker ability from IRT assumptions. We use Thompson sampling to balance between exploring items with different psychometric characteristics and selecting highly discriminative items that give more precise information about ability. To control item exposure, we inject noise through an additional randomization step before computing the Fisher information. This framework was used to initially launch two new item types on the DET practice test using limited training data. We outline some reliability and exposure metrics for the 5 practice test experiments that utilized this framework.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3121793346</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3121793346</sourcerecordid><originalsourceid>FETCH-proquest_journals_31217933463</originalsourceid><addsrcrecordid>eNqNjE0KwjAUhIMgWLR3eOC60Cb1d1eLYkE3kn2J-rQpbVKT1IWnNwUP4GqGmflmRALKWBKtU0onJLS2juOYLld0sWABqXZC3aXLMw7eQNY7XVz4Fs7iVkmFcEJhlFRPyLrOaB-iBach123XOzTygx66i87JNwJH64bt8FQ4bCEXjbwa4aRWMzJ-iMZi-NMpmR_2PD9G_vbVe7CsdW-Ur0qW0GS1YSxdsv9WX581Rtk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3121793346</pqid></control><display><type>article</type><title>BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration</title><source>Free E- Journals</source><creator>Sharpnack, James ; Hao, Kevin ; Mulcaire, Phoebe ; Bicknell, Klinton ; LaFlair, Geoff ; Yancey, Kevin ; von Davier, Alina A</creator><creatorcontrib>Sharpnack, James ; Hao, Kevin ; Mulcaire, Phoebe ; Bicknell, Klinton ; LaFlair, Geoff ; Yancey, Kevin ; von Davier, Alina A</creatorcontrib><description>In this paper, we present a complete framework for quickly calibrating and administering a robust large-scale computerized adaptive test (CAT) with a small number of responses. Calibration - learning item parameters in a test - is done using AutoIRT, a new method that uses automated machine learning (AutoML) in combination with item response theory (IRT), originally proposed in [Sharpnack et al., 2024]. AutoIRT trains a non-parametric AutoML grading model using item features, followed by an item-specific parametric model, which results in an explanatory IRT model. In our work, we use tabular AutoML tools (AutoGluon.tabular, [Erickson et al., 2020]) along with BERT embeddings and linguistically motivated NLP features. In this framework, we use Bayesian updating to obtain test taker ability posterior distributions for administration and scoring. For administration of our adaptive test, we propose the BanditCAT framework, a methodology motivated by casting the problem in the contextual bandit framework and utilizing item response theory (IRT). The key insight lies in defining the bandit reward as the Fisher information for the selected item, given the latent test taker ability from IRT assumptions. We use Thompson sampling to balance between exploring items with different psychometric characteristics and selecting highly discriminative items that give more precise information about ability. To control item exposure, we inject noise through an additional randomization step before computing the Fisher information. This framework was used to initially launch two new item types on the DET practice test using limited training data. We outline some reliability and exposure metrics for the 5 practice test experiments that utilized this framework.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Adaptive control ; Calibration ; Fisher information ; Item response theory ; Machine learning ; Noise control</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Sharpnack, James</creatorcontrib><creatorcontrib>Hao, Kevin</creatorcontrib><creatorcontrib>Mulcaire, Phoebe</creatorcontrib><creatorcontrib>Bicknell, Klinton</creatorcontrib><creatorcontrib>LaFlair, Geoff</creatorcontrib><creatorcontrib>Yancey, Kevin</creatorcontrib><creatorcontrib>von Davier, Alina A</creatorcontrib><title>BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration</title><title>arXiv.org</title><description>In this paper, we present a complete framework for quickly calibrating and administering a robust large-scale computerized adaptive test (CAT) with a small number of responses. Calibration - learning item parameters in a test - is done using AutoIRT, a new method that uses automated machine learning (AutoML) in combination with item response theory (IRT), originally proposed in [Sharpnack et al., 2024]. AutoIRT trains a non-parametric AutoML grading model using item features, followed by an item-specific parametric model, which results in an explanatory IRT model. In our work, we use tabular AutoML tools (AutoGluon.tabular, [Erickson et al., 2020]) along with BERT embeddings and linguistically motivated NLP features. In this framework, we use Bayesian updating to obtain test taker ability posterior distributions for administration and scoring. For administration of our adaptive test, we propose the BanditCAT framework, a methodology motivated by casting the problem in the contextual bandit framework and utilizing item response theory (IRT). The key insight lies in defining the bandit reward as the Fisher information for the selected item, given the latent test taker ability from IRT assumptions. We use Thompson sampling to balance between exploring items with different psychometric characteristics and selecting highly discriminative items that give more precise information about ability. To control item exposure, we inject noise through an additional randomization step before computing the Fisher information. This framework was used to initially launch two new item types on the DET practice test using limited training data. We outline some reliability and exposure metrics for the 5 practice test experiments that utilized this framework.</description><subject>Adaptive control</subject><subject>Calibration</subject><subject>Fisher information</subject><subject>Item response theory</subject><subject>Machine learning</subject><subject>Noise control</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjE0KwjAUhIMgWLR3eOC60Cb1d1eLYkE3kn2J-rQpbVKT1IWnNwUP4GqGmflmRALKWBKtU0onJLS2juOYLld0sWABqXZC3aXLMw7eQNY7XVz4Fs7iVkmFcEJhlFRPyLrOaB-iBach123XOzTygx66i87JNwJH64bt8FQ4bCEXjbwa4aRWMzJ-iMZi-NMpmR_2PD9G_vbVe7CsdW-Ur0qW0GS1YSxdsv9WX581Rtk</recordid><startdate>20241028</startdate><enddate>20241028</enddate><creator>Sharpnack, James</creator><creator>Hao, Kevin</creator><creator>Mulcaire, Phoebe</creator><creator>Bicknell, Klinton</creator><creator>LaFlair, Geoff</creator><creator>Yancey, Kevin</creator><creator>von Davier, Alina A</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241028</creationdate><title>BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration</title><author>Sharpnack, James ; Hao, Kevin ; Mulcaire, Phoebe ; Bicknell, Klinton ; LaFlair, Geoff ; Yancey, Kevin ; von Davier, Alina A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31217933463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptive control</topic><topic>Calibration</topic><topic>Fisher information</topic><topic>Item response theory</topic><topic>Machine learning</topic><topic>Noise control</topic><toplevel>online_resources</toplevel><creatorcontrib>Sharpnack, James</creatorcontrib><creatorcontrib>Hao, Kevin</creatorcontrib><creatorcontrib>Mulcaire, Phoebe</creatorcontrib><creatorcontrib>Bicknell, Klinton</creatorcontrib><creatorcontrib>LaFlair, Geoff</creatorcontrib><creatorcontrib>Yancey, Kevin</creatorcontrib><creatorcontrib>von Davier, Alina A</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sharpnack, James</au><au>Hao, Kevin</au><au>Mulcaire, Phoebe</au><au>Bicknell, Klinton</au><au>LaFlair, Geoff</au><au>Yancey, Kevin</au><au>von Davier, Alina A</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration</atitle><jtitle>arXiv.org</jtitle><date>2024-10-28</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>In this paper, we present a complete framework for quickly calibrating and administering a robust large-scale computerized adaptive test (CAT) with a small number of responses. Calibration - learning item parameters in a test - is done using AutoIRT, a new method that uses automated machine learning (AutoML) in combination with item response theory (IRT), originally proposed in [Sharpnack et al., 2024]. AutoIRT trains a non-parametric AutoML grading model using item features, followed by an item-specific parametric model, which results in an explanatory IRT model. In our work, we use tabular AutoML tools (AutoGluon.tabular, [Erickson et al., 2020]) along with BERT embeddings and linguistically motivated NLP features. In this framework, we use Bayesian updating to obtain test taker ability posterior distributions for administration and scoring. For administration of our adaptive test, we propose the BanditCAT framework, a methodology motivated by casting the problem in the contextual bandit framework and utilizing item response theory (IRT). The key insight lies in defining the bandit reward as the Fisher information for the selected item, given the latent test taker ability from IRT assumptions. We use Thompson sampling to balance between exploring items with different psychometric characteristics and selecting highly discriminative items that give more precise information about ability. To control item exposure, we inject noise through an additional randomization step before computing the Fisher information. This framework was used to initially launch two new item types on the DET practice test using limited training data. We outline some reliability and exposure metrics for the 5 practice test experiments that utilized this framework.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-10
issn 2331-8422
language eng
recordid cdi_proquest_journals_3121793346
source Free E- Journals
subjects Adaptive control
Calibration
Fisher information
Item response theory
Machine learning
Noise control
title BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T15%3A29%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=BanditCAT%20and%20AutoIRT:%20Machine%20Learning%20Approaches%20to%20Computerized%20Adaptive%20Testing%20and%20Item%20Calibration&rft.jtitle=arXiv.org&rft.au=Sharpnack,%20James&rft.date=2024-10-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3121793346%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3121793346&rft_id=info:pmid/&rfr_iscdi=true