Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks

Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of conv...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Educational and psychological measurement 2023-06, Vol.83 (3), p.556-585
Hauptverfasser:	von Davier, Matthias, Tyack, Lillian, Khorramdel, Lale
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Achievement Tests Artificial Intelligence Automation Classification Comparative Analysis Computer Software Elementary Secondary Education Evaluators Feedback (Response) Foreign Countries Graphs International Assessment Item Analysis Item Response Theory Mathematics Achievement Mathematics Tests Networks Neural networks Responses Science Achievement Science Tests Scoring Test Items Test Validity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	585
container_issue	3
container_start_page	556
container_title	Educational and psychological measurement
container_volume	83
creator	von Davier, Matthias Tyack, Lillian Khorramdel, Lale
description	Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.
doi_str_mv	10.1177/00131644221098021
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2814525091</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ericid>EJ1375633</ericid><sage_id>10.1177_00131644221098021</sage_id><sourcerecordid>2814525091</sourcerecordid><originalsourceid>FETCH-LOGICAL-c433t-31f8e5c61fbe0b8cb0eff86fcb94892f3787d04dde4a680bd9d82f5964f9075b3</originalsourceid><addsrcrecordid>eNp10EFPwjAUB_DGaATRD-BBs8SLl2Ff223t0RBECGoicF62rsUibNiyGL-9nSAmGnt5h__vvb48hM4BdwGS5AZjoBAzRghgwTGBA9SGKCIh5ZwfonaThw1ooRPnFtg_BnCMWjQBnsRctNFoIitrynkwsNn6xchsGTwrt65Kp1xgymA6fJhMAoJBBDPXuFu7MdpI4-Gjqu1X2bxX9tWdoiOdLZ0629UOmt31p737cPw0GPZux6FklG5CCpqrSMagc4VzLnOstOaxlrlgXBBNE54UmBWFYlnMcV6IghMdiZhpgZMopx10vZ27ttVbrdwmXRkn1XKZlaqqXUo4sIhEWICnV7_ooqpt6bfzCscNZNgr2CppK-es0unamlVmP1LAaXPo9M-hfc_lbnKdr1Sx7_i-rAcXW6Cskfu4PwKaRDGlPu9uc5fN1c9a___4CYucjEQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2806814540</pqid></control><display><type>article</type><title>Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks</title><source>SAGE Complete A-Z List</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>von Davier, Matthias ; Tyack, Lillian ; Khorramdel, Lale</creator><creatorcontrib>von Davier, Matthias ; Tyack, Lillian ; Khorramdel, Lale</creatorcontrib><description>Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.</description><identifier>ISSN: 0013-1644</identifier><identifier>EISSN: 1552-3888</identifier><identifier>DOI: 10.1177/00131644221098021</identifier><identifier>PMID: 37187689</identifier><language>eng</language><publisher>Los Angeles, CA: SAGE Publications</publisher><subject>Accuracy ; Achievement Tests ; Artificial Intelligence ; Automation ; Classification ; Comparative Analysis ; Computer Software ; Elementary Secondary Education ; Evaluators ; Feedback (Response) ; Foreign Countries ; Graphs ; International Assessment ; Item Analysis ; Item Response Theory ; Mathematics Achievement ; Mathematics Tests ; Networks ; Neural networks ; Responses ; Science Achievement ; Science Tests ; Scoring ; Test Items ; Test Validity</subject><ispartof>Educational and psychological measurement, 2023-06, Vol.83 (3), p.556-585</ispartof><rights>The Author(s) 2022</rights><rights>The Author(s) 2022.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c433t-31f8e5c61fbe0b8cb0eff86fcb94892f3787d04dde4a680bd9d82f5964f9075b3</citedby><cites>FETCH-LOGICAL-c433t-31f8e5c61fbe0b8cb0eff86fcb94892f3787d04dde4a680bd9d82f5964f9075b3</cites><orcidid>0000-0003-1298-9701</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/00131644221098021$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/00131644221098021$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>314,780,784,21817,27922,27923,43619,43620</link.rule.ids><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=EJ1375633$$DView record in ERIC$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37187689$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>von Davier, Matthias</creatorcontrib><creatorcontrib>Tyack, Lillian</creatorcontrib><creatorcontrib>Khorramdel, Lale</creatorcontrib><title>Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks</title><title>Educational and psychological measurement</title><addtitle>Educ Psychol Meas</addtitle><description>Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.</description><subject>Accuracy</subject><subject>Achievement Tests</subject><subject>Artificial Intelligence</subject><subject>Automation</subject><subject>Classification</subject><subject>Comparative Analysis</subject><subject>Computer Software</subject><subject>Elementary Secondary Education</subject><subject>Evaluators</subject><subject>Feedback (Response)</subject><subject>Foreign Countries</subject><subject>Graphs</subject><subject>International Assessment</subject><subject>Item Analysis</subject><subject>Item Response Theory</subject><subject>Mathematics Achievement</subject><subject>Mathematics Tests</subject><subject>Networks</subject><subject>Neural networks</subject><subject>Responses</subject><subject>Science Achievement</subject><subject>Science Tests</subject><subject>Scoring</subject><subject>Test Items</subject><subject>Test Validity</subject><issn>0013-1644</issn><issn>1552-3888</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp10EFPwjAUB_DGaATRD-BBs8SLl2Ff223t0RBECGoicF62rsUibNiyGL-9nSAmGnt5h__vvb48hM4BdwGS5AZjoBAzRghgwTGBA9SGKCIh5ZwfonaThw1ooRPnFtg_BnCMWjQBnsRctNFoIitrynkwsNn6xchsGTwrt65Kp1xgymA6fJhMAoJBBDPXuFu7MdpI4-Gjqu1X2bxX9tWdoiOdLZ0629UOmt31p737cPw0GPZux6FklG5CCpqrSMagc4VzLnOstOaxlrlgXBBNE54UmBWFYlnMcV6IghMdiZhpgZMopx10vZ27ttVbrdwmXRkn1XKZlaqqXUo4sIhEWICnV7_ooqpt6bfzCscNZNgr2CppK-es0unamlVmP1LAaXPo9M-hfc_lbnKdr1Sx7_i-rAcXW6Cskfu4PwKaRDGlPu9uc5fN1c9a___4CYucjEQ</recordid><startdate>202306</startdate><enddate>202306</enddate><creator>von Davier, Matthias</creator><creator>Tyack, Lillian</creator><creator>Khorramdel, Lale</creator><general>SAGE Publications</general><general>SAGE PUBLICATIONS, INC</general><scope>7SW</scope><scope>BJH</scope><scope>BNH</scope><scope>BNI</scope><scope>BNJ</scope><scope>BNO</scope><scope>ERI</scope><scope>PET</scope><scope>REK</scope><scope>WWN</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1298-9701</orcidid></search><sort><creationdate>202306</creationdate><title>Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks</title><author>von Davier, Matthias ; Tyack, Lillian ; Khorramdel, Lale</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c433t-31f8e5c61fbe0b8cb0eff86fcb94892f3787d04dde4a680bd9d82f5964f9075b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Achievement Tests</topic><topic>Artificial Intelligence</topic><topic>Automation</topic><topic>Classification</topic><topic>Comparative Analysis</topic><topic>Computer Software</topic><topic>Elementary Secondary Education</topic><topic>Evaluators</topic><topic>Feedback (Response)</topic><topic>Foreign Countries</topic><topic>Graphs</topic><topic>International Assessment</topic><topic>Item Analysis</topic><topic>Item Response Theory</topic><topic>Mathematics Achievement</topic><topic>Mathematics Tests</topic><topic>Networks</topic><topic>Neural networks</topic><topic>Responses</topic><topic>Science Achievement</topic><topic>Science Tests</topic><topic>Scoring</topic><topic>Test Items</topic><topic>Test Validity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>von Davier, Matthias</creatorcontrib><creatorcontrib>Tyack, Lillian</creatorcontrib><creatorcontrib>Khorramdel, Lale</creatorcontrib><collection>ERIC</collection><collection>ERIC (Ovid)</collection><collection>ERIC</collection><collection>ERIC</collection><collection>ERIC (Legacy Platform)</collection><collection>ERIC( SilverPlatter )</collection><collection>ERIC</collection><collection>ERIC PlusText (Legacy Platform)</collection><collection>Education Resources Information Center (ERIC)</collection><collection>ERIC</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Educational and psychological measurement</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>von Davier, Matthias</au><au>Tyack, Lillian</au><au>Khorramdel, Lale</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><ericid>EJ1375633</ericid><atitle>Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks</atitle><jtitle>Educational and psychological measurement</jtitle><addtitle>Educ Psychol Meas</addtitle><date>2023-06</date><risdate>2023</risdate><volume>83</volume><issue>3</issue><spage>556</spage><epage>585</epage><pages>556-585</pages><issn>0013-1644</issn><eissn>1552-3888</eissn><abstract>Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.</abstract><cop>Los Angeles, CA</cop><pub>SAGE Publications</pub><pmid>37187689</pmid><doi>10.1177/00131644221098021</doi><tpages>30</tpages><orcidid>https://orcid.org/0000-0003-1298-9701</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0013-1644
ispartof	Educational and psychological measurement, 2023-06, Vol.83 (3), p.556-585
issn	0013-1644 1552-3888
language	eng
recordid	cdi_proquest_miscellaneous_2814525091
source	SAGE Complete A-Z List; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects	Accuracy Achievement Tests Artificial Intelligence Automation Classification Comparative Analysis Computer Software Elementary Secondary Education Evaluators Feedback (Response) Foreign Countries Graphs International Assessment Item Analysis Item Response Theory Mathematics Achievement Mathematics Tests Networks Neural networks Responses Science Achievement Science Tests Scoring Test Items Test Validity
title	Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T18%3A05%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Scoring%20Graphical%20Responses%20in%20TIMSS%202019%20Using%20Artificial%20Neural%20Networks&rft.jtitle=Educational%20and%20psychological%20measurement&rft.au=von%20Davier,%20Matthias&rft.date=2023-06&rft.volume=83&rft.issue=3&rft.spage=556&rft.epage=585&rft.pages=556-585&rft.issn=0013-1644&rft.eissn=1552-3888&rft_id=info:doi/10.1177/00131644221098021&rft_dat=%3Cproquest_cross%3E2814525091%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2806814540&rft_id=info:pmid/37187689&rft_ericid=EJ1375633&rft_sage_id=10.1177_00131644221098021&rfr_iscdi=true