Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use
In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advance...
Gespeichert in:
Veröffentlicht in: | Journal of educational measurement 2009-12, Vol.46 (4), p.371-389 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 389 |
---|---|
container_issue | 4 |
container_start_page | 371 |
container_title | Journal of educational measurement |
container_volume | 46 |
creator | Myford, Carol M. Wolfe, Edward W. |
description | In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition examination, employing a multifaceted Rasch approach to determine whether raters exhibited evidence of two types of differential rater functioning over time (i.e., changes in levels of accuracy or scale category use). Some raters showed statistically significant changes in their levels of accuracy as the scoring progressed, while other raters displayed evidence of differential scale category use over time. |
doi_str_mv | 10.1111/j.1745-3984.2009.00088.x |
format | Article |
fullrecord | <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_853208662</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ericid>EJ864667</ericid><jstor_id>25651523</jstor_id><sourcerecordid>25651523</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5188-810180bada96f0bb8fd89de033b46757eadef2cdbec2f8cc75c654aa941f425a3</originalsourceid><addsrcrecordid>eNqNkVFv0zAUhSMEEmXwD0CyeOEp4dqJHQfx0rXdYFoZYpv6aDnOzeQsTYadslb8eRyCKsETfrHkc75zrXuiiFBIaDjvm4TmGY_TQmYJAygSAJAy2T-JZkfhaTQDYCwGwfnz6IX3DQDlOaez6Oe67-zQO9vdkW96QEe-oqt7t9WdQXL1Izzc2C1-IHNy5vQWH3t3T4JOljigGUZsaesaHXaD1S2ZG7Nz2hyI7qq_lWujWySLMOOudwdy6_Fl9KzWrcdXf-6T6PZsdbP4FF9enX9ezC9jw6mUsaRAJZS60oWooSxlXcmiQkjTMhM5z1FXWDNTlWhYLY3JuRE807rIaJ0xrtOT6N2U--D67zv0g9pab7BtdYf9zivJUwZSCBacb_9xNv3OdeFzihZ5BlxmeTDJyWRc773DWj04u9XuoCiosRPVqHH1aly9GjtRvztR-4C-nlB01hyx1YUUmRBj8sdJfrQtHv47Vl2slmspA_5mwhsfGj3ijAtOOUuDHk-69QPuj7p29yoMz7nafDlXcE1hs5FrdZr-AqyHtVs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>197405847</pqid></control><display><type>article</type><title>Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use</title><source>Applied Social Sciences Index & Abstracts (ASSIA)</source><source>Jstor Complete Legacy</source><source>Education Source</source><source>Wiley Online Library Journals Frontfile Complete</source><creator>Myford, Carol M. ; Wolfe, Edward W.</creator><creatorcontrib>Myford, Carol M. ; Wolfe, Edward W.</creatorcontrib><description>In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition examination, employing a multifaceted Rasch approach to determine whether raters exhibited evidence of two types of differential rater functioning over time (i.e., changes in levels of accuracy or scale category use). Some raters showed statistically significant changes in their levels of accuracy as the scoring progressed, while other raters displayed evidence of differential scale category use over time.</description><identifier>ISSN: 0022-0655</identifier><identifier>EISSN: 1745-3984</identifier><identifier>DOI: 10.1111/j.1745-3984.2009.00088.x</identifier><identifier>CODEN: JEDMAA</identifier><language>eng</language><publisher>Malden, USA: Blackwell Publishing Inc</publisher><subject>Accuracy ; Advanced Placement ; Advanced Placement Examinations (CEEB) ; Advanced Placement Programs ; Central tendencies ; College entrance examinations ; Composition ; Correlations ; Educational Assessment ; Educational evaluation ; Educational research ; Educational Testing ; English Literature ; Essays ; Evaluation Methods ; Evaluation Problems ; Evaluation Research ; Measurement ; Measures (Individuals) ; Modeling ; Observational frames of reference ; Rasch model ; Rater Reliability ; Scoring ; Secondary Education ; Statistical significance ; Student Evaluation ; Student evaluation of teacher performance ; Studies ; Testing Problems ; Tests ; Writing (Composition) ; Written composition</subject><ispartof>Journal of educational measurement, 2009-12, Vol.46 (4), p.371-389</ispartof><rights>2009 The National Council on Measurement in Education</rights><rights>2009 by the National Council on Measurement in Education</rights><rights>Copyright National Council on Measurement in Education Winter 2009</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5188-810180bada96f0bb8fd89de033b46757eadef2cdbec2f8cc75c654aa941f425a3</citedby><cites>FETCH-LOGICAL-c5188-810180bada96f0bb8fd89de033b46757eadef2cdbec2f8cc75c654aa941f425a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/25651523$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/25651523$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,776,780,799,1411,27901,27902,30976,30977,45550,45551,57992,58225</link.rule.ids><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=EJ864667$$DView record in ERIC$$Hfree_for_read</backlink></links><search><creatorcontrib>Myford, Carol M.</creatorcontrib><creatorcontrib>Wolfe, Edward W.</creatorcontrib><title>Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use</title><title>Journal of educational measurement</title><description>In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition examination, employing a multifaceted Rasch approach to determine whether raters exhibited evidence of two types of differential rater functioning over time (i.e., changes in levels of accuracy or scale category use). Some raters showed statistically significant changes in their levels of accuracy as the scoring progressed, while other raters displayed evidence of differential scale category use over time.</description><subject>Accuracy</subject><subject>Advanced Placement</subject><subject>Advanced Placement Examinations (CEEB)</subject><subject>Advanced Placement Programs</subject><subject>Central tendencies</subject><subject>College entrance examinations</subject><subject>Composition</subject><subject>Correlations</subject><subject>Educational Assessment</subject><subject>Educational evaluation</subject><subject>Educational research</subject><subject>Educational Testing</subject><subject>English Literature</subject><subject>Essays</subject><subject>Evaluation Methods</subject><subject>Evaluation Problems</subject><subject>Evaluation Research</subject><subject>Measurement</subject><subject>Measures (Individuals)</subject><subject>Modeling</subject><subject>Observational frames of reference</subject><subject>Rasch model</subject><subject>Rater Reliability</subject><subject>Scoring</subject><subject>Secondary Education</subject><subject>Statistical significance</subject><subject>Student Evaluation</subject><subject>Student evaluation of teacher performance</subject><subject>Studies</subject><subject>Testing Problems</subject><subject>Tests</subject><subject>Writing (Composition)</subject><subject>Written composition</subject><issn>0022-0655</issn><issn>1745-3984</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><sourceid>7QJ</sourceid><recordid>eNqNkVFv0zAUhSMEEmXwD0CyeOEp4dqJHQfx0rXdYFoZYpv6aDnOzeQsTYadslb8eRyCKsETfrHkc75zrXuiiFBIaDjvm4TmGY_TQmYJAygSAJAy2T-JZkfhaTQDYCwGwfnz6IX3DQDlOaez6Oe67-zQO9vdkW96QEe-oqt7t9WdQXL1Izzc2C1-IHNy5vQWH3t3T4JOljigGUZsaesaHXaD1S2ZG7Nz2hyI7qq_lWujWySLMOOudwdy6_Fl9KzWrcdXf-6T6PZsdbP4FF9enX9ezC9jw6mUsaRAJZS60oWooSxlXcmiQkjTMhM5z1FXWDNTlWhYLY3JuRE807rIaJ0xrtOT6N2U--D67zv0g9pab7BtdYf9zivJUwZSCBacb_9xNv3OdeFzihZ5BlxmeTDJyWRc773DWj04u9XuoCiosRPVqHH1aly9GjtRvztR-4C-nlB01hyx1YUUmRBj8sdJfrQtHv47Vl2slmspA_5mwhsfGj3ijAtOOUuDHk-69QPuj7p29yoMz7nafDlXcE1hs5FrdZr-AqyHtVs</recordid><startdate>20091201</startdate><enddate>20091201</enddate><creator>Myford, Carol M.</creator><creator>Wolfe, Edward W.</creator><general>Blackwell Publishing Inc</general><general>Wiley Subscription Services</general><general>Wiley-Blackwell</general><general>Wiley Subscription Services, Inc</general><scope>BSCLL</scope><scope>7SW</scope><scope>BJH</scope><scope>BNH</scope><scope>BNI</scope><scope>BNJ</scope><scope>BNO</scope><scope>ERI</scope><scope>PET</scope><scope>REK</scope><scope>WWN</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QJ</scope></search><sort><creationdate>20091201</creationdate><title>Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use</title><author>Myford, Carol M. ; Wolfe, Edward W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5188-810180bada96f0bb8fd89de033b46757eadef2cdbec2f8cc75c654aa941f425a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Accuracy</topic><topic>Advanced Placement</topic><topic>Advanced Placement Examinations (CEEB)</topic><topic>Advanced Placement Programs</topic><topic>Central tendencies</topic><topic>College entrance examinations</topic><topic>Composition</topic><topic>Correlations</topic><topic>Educational Assessment</topic><topic>Educational evaluation</topic><topic>Educational research</topic><topic>Educational Testing</topic><topic>English Literature</topic><topic>Essays</topic><topic>Evaluation Methods</topic><topic>Evaluation Problems</topic><topic>Evaluation Research</topic><topic>Measurement</topic><topic>Measures (Individuals)</topic><topic>Modeling</topic><topic>Observational frames of reference</topic><topic>Rasch model</topic><topic>Rater Reliability</topic><topic>Scoring</topic><topic>Secondary Education</topic><topic>Statistical significance</topic><topic>Student Evaluation</topic><topic>Student evaluation of teacher performance</topic><topic>Studies</topic><topic>Testing Problems</topic><topic>Tests</topic><topic>Writing (Composition)</topic><topic>Written composition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Myford, Carol M.</creatorcontrib><creatorcontrib>Wolfe, Edward W.</creatorcontrib><collection>Istex</collection><collection>ERIC</collection><collection>ERIC (Ovid)</collection><collection>ERIC</collection><collection>ERIC</collection><collection>ERIC (Legacy Platform)</collection><collection>ERIC( SilverPlatter )</collection><collection>ERIC</collection><collection>ERIC PlusText (Legacy Platform)</collection><collection>Education Resources Information Center (ERIC)</collection><collection>ERIC</collection><collection>CrossRef</collection><collection>Applied Social Sciences Index & Abstracts (ASSIA)</collection><jtitle>Journal of educational measurement</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Myford, Carol M.</au><au>Wolfe, Edward W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><ericid>EJ864667</ericid><atitle>Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use</atitle><jtitle>Journal of educational measurement</jtitle><date>2009-12-01</date><risdate>2009</risdate><volume>46</volume><issue>4</issue><spage>371</spage><epage>389</epage><pages>371-389</pages><issn>0022-0655</issn><eissn>1745-3984</eissn><coden>JEDMAA</coden><abstract>In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition examination, employing a multifaceted Rasch approach to determine whether raters exhibited evidence of two types of differential rater functioning over time (i.e., changes in levels of accuracy or scale category use). Some raters showed statistically significant changes in their levels of accuracy as the scoring progressed, while other raters displayed evidence of differential scale category use over time.</abstract><cop>Malden, USA</cop><pub>Blackwell Publishing Inc</pub><doi>10.1111/j.1745-3984.2009.00088.x</doi><tpages>19</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0022-0655 |
ispartof | Journal of educational measurement, 2009-12, Vol.46 (4), p.371-389 |
issn | 0022-0655 1745-3984 |
language | eng |
recordid | cdi_proquest_miscellaneous_853208662 |
source | Applied Social Sciences Index & Abstracts (ASSIA); Jstor Complete Legacy; Education Source; Wiley Online Library Journals Frontfile Complete |
subjects | Accuracy Advanced Placement Advanced Placement Examinations (CEEB) Advanced Placement Programs Central tendencies College entrance examinations Composition Correlations Educational Assessment Educational evaluation Educational research Educational Testing English Literature Essays Evaluation Methods Evaluation Problems Evaluation Research Measurement Measures (Individuals) Modeling Observational frames of reference Rasch model Rater Reliability Scoring Secondary Education Statistical significance Student Evaluation Student evaluation of teacher performance Studies Testing Problems Tests Writing (Composition) Written composition |
title | Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T06%3A44%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Monitoring%20Rater%20Performance%20Over%20Time:%20A%20Framework%20for%20Detecting%20Differential%20Accuracy%20and%20Differential%20Scale%20Category%20Use&rft.jtitle=Journal%20of%20educational%20measurement&rft.au=Myford,%20Carol%20M.&rft.date=2009-12-01&rft.volume=46&rft.issue=4&rft.spage=371&rft.epage=389&rft.pages=371-389&rft.issn=0022-0655&rft.eissn=1745-3984&rft.coden=JEDMAA&rft_id=info:doi/10.1111/j.1745-3984.2009.00088.x&rft_dat=%3Cjstor_proqu%3E25651523%3C/jstor_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=197405847&rft_id=info:pmid/&rft_ericid=EJ864667&rft_jstor_id=25651523&rfr_iscdi=true |