The Effect of Year-to-Year Rater Variation on IRT Linking

Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results us...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yen, Shu Jing, Ochieng, Charles, Michaels, Hillary, Friedman, Greg
Format: Report
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Yen, Shu Jing
Ochieng, Charles
Michaels, Hillary
Friedman, Greg
description Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology based on the work of Tate (1999, 2000). The common item equating design was used to place the second year scores to the first year scores after a re-score of the first year test in order to adjust for rater effects. Two samples of data from contiguous years, designated as Year 1 (n ~ 1,200) and Year 2 (n ~ 2,000), from the writing and reading portions of a statewide assessment were examined. The writing test consisted of 32, 36, and 40 selected-response items for grade 4, 6, and 8 and a single writing prompt scored on a six-point scale (0-5) scored by two raters whose scores are added for a composite. The reading test consists of 75, 93, and 91 selected-response items and 12, 14, and 16 constructed response items for grade 4, 6, and 8, respectively. All the CR items in reading were scored on a three-point scale (0-2.) The resulting item parameters were compared between year one and two, with and without rater adjustment. For writing, there were significant shifts in the parameters after rater adjustment. The p-values and TCCs shifted across years when adjusted for rater effects. The impact of the parameter shifts and TCCs manifested in the changes in the proficiency classification before and after adjustment. The results of the study suggests that raters were not consistently more severe or more lenient between grades or content areas, but the resulting rater error (severity or leniency) affected the scores and thereby produced misleading results if not taken into account. (Contains 6 tables and 6 figures.)
format Report
fullrecord <record><control><sourceid>eric_GA5</sourceid><recordid>TN_cdi_eric_primary_ED503778</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ericid>ED503778</ericid><sourcerecordid>ED503778</sourcerecordid><originalsourceid>FETCH-eric_primary_ED5037783</originalsourceid><addsrcrecordid>eNrjZLAMyUhVcE1LS00uUchPU4hMTSzSLcnXBdEKQYklqUUKYYlFmYklmfl5CkDkGRSi4JOZl52Zl87DwJqWmFOcyguluRlk3FxDnD10U4syk-MLijJzE4sq411dTA2Mzc0tjAlIAwDICyqb</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>report</recordtype></control><display><type>report</type><title>The Effect of Year-to-Year Rater Variation on IRT Linking</title><source>ERIC - Full Text Only (Discovery)</source><creator>Yen, Shu Jing ; Ochieng, Charles ; Michaels, Hillary ; Friedman, Greg</creator><creatorcontrib>Yen, Shu Jing ; Ochieng, Charles ; Michaels, Hillary ; Friedman, Greg</creatorcontrib><description>Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology based on the work of Tate (1999, 2000). The common item equating design was used to place the second year scores to the first year scores after a re-score of the first year test in order to adjust for rater effects. Two samples of data from contiguous years, designated as Year 1 (n ~ 1,200) and Year 2 (n ~ 2,000), from the writing and reading portions of a statewide assessment were examined. The writing test consisted of 32, 36, and 40 selected-response items for grade 4, 6, and 8 and a single writing prompt scored on a six-point scale (0-5) scored by two raters whose scores are added for a composite. The reading test consists of 75, 93, and 91 selected-response items and 12, 14, and 16 constructed response items for grade 4, 6, and 8, respectively. All the CR items in reading were scored on a three-point scale (0-2.) The resulting item parameters were compared between year one and two, with and without rater adjustment. For writing, there were significant shifts in the parameters after rater adjustment. The p-values and TCCs shifted across years when adjusted for rater effects. The impact of the parameter shifts and TCCs manifested in the changes in the proficiency classification before and after adjustment. The results of the study suggests that raters were not consistently more severe or more lenient between grades or content areas, but the resulting rater error (severity or leniency) affected the scores and thereby produced misleading results if not taken into account. (Contains 6 tables and 6 figures.)</description><language>eng</language><subject>Evaluation Methods ; Grade 4 ; Grade 6 ; Grade 8 ; Interrater Reliability ; Item Response Theory ; Measurement Techniques ; Measures (Individuals) ; Reading Tests ; Scores ; Scoring ; Test Items ; Testing Programs ; Writing Tests</subject><creationdate>2005</creationdate><tpages>22</tpages><format>22</format><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,687,776,881,4476</link.rule.ids><linktorsrc>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=ED503778$$EView_record_in_ERIC_Clearinghouse_on_Information_&amp;_Technology$$FView_record_in_$$GERIC_Clearinghouse_on_Information_&amp;_Technology$$Hfree_for_read</linktorsrc><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=ED503778$$DView record in ERIC$$Hfree_for_read</backlink></links><search><creatorcontrib>Yen, Shu Jing</creatorcontrib><creatorcontrib>Ochieng, Charles</creatorcontrib><creatorcontrib>Michaels, Hillary</creatorcontrib><creatorcontrib>Friedman, Greg</creatorcontrib><title>The Effect of Year-to-Year Rater Variation on IRT Linking</title><description>Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology based on the work of Tate (1999, 2000). The common item equating design was used to place the second year scores to the first year scores after a re-score of the first year test in order to adjust for rater effects. Two samples of data from contiguous years, designated as Year 1 (n ~ 1,200) and Year 2 (n ~ 2,000), from the writing and reading portions of a statewide assessment were examined. The writing test consisted of 32, 36, and 40 selected-response items for grade 4, 6, and 8 and a single writing prompt scored on a six-point scale (0-5) scored by two raters whose scores are added for a composite. The reading test consists of 75, 93, and 91 selected-response items and 12, 14, and 16 constructed response items for grade 4, 6, and 8, respectively. All the CR items in reading were scored on a three-point scale (0-2.) The resulting item parameters were compared between year one and two, with and without rater adjustment. For writing, there were significant shifts in the parameters after rater adjustment. The p-values and TCCs shifted across years when adjusted for rater effects. The impact of the parameter shifts and TCCs manifested in the changes in the proficiency classification before and after adjustment. The results of the study suggests that raters were not consistently more severe or more lenient between grades or content areas, but the resulting rater error (severity or leniency) affected the scores and thereby produced misleading results if not taken into account. (Contains 6 tables and 6 figures.)</description><subject>Evaluation Methods</subject><subject>Grade 4</subject><subject>Grade 6</subject><subject>Grade 8</subject><subject>Interrater Reliability</subject><subject>Item Response Theory</subject><subject>Measurement Techniques</subject><subject>Measures (Individuals)</subject><subject>Reading Tests</subject><subject>Scores</subject><subject>Scoring</subject><subject>Test Items</subject><subject>Testing Programs</subject><subject>Writing Tests</subject><fulltext>true</fulltext><rsrctype>report</rsrctype><creationdate>2005</creationdate><recordtype>report</recordtype><sourceid>GA5</sourceid><recordid>eNrjZLAMyUhVcE1LS00uUchPU4hMTSzSLcnXBdEKQYklqUUKYYlFmYklmfl5CkDkGRSi4JOZl52Zl87DwJqWmFOcyguluRlk3FxDnD10U4syk-MLijJzE4sq411dTA2Mzc0tjAlIAwDICyqb</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Yen, Shu Jing</creator><creator>Ochieng, Charles</creator><creator>Michaels, Hillary</creator><creator>Friedman, Greg</creator><scope>ERI</scope><scope>GA5</scope></search><sort><creationdate>2005</creationdate><title>The Effect of Year-to-Year Rater Variation on IRT Linking</title><author>Yen, Shu Jing ; Ochieng, Charles ; Michaels, Hillary ; Friedman, Greg</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-eric_primary_ED5037783</frbrgroupid><rsrctype>reports</rsrctype><prefilter>reports</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Evaluation Methods</topic><topic>Grade 4</topic><topic>Grade 6</topic><topic>Grade 8</topic><topic>Interrater Reliability</topic><topic>Item Response Theory</topic><topic>Measurement Techniques</topic><topic>Measures (Individuals)</topic><topic>Reading Tests</topic><topic>Scores</topic><topic>Scoring</topic><topic>Test Items</topic><topic>Testing Programs</topic><topic>Writing Tests</topic><toplevel>online_resources</toplevel><creatorcontrib>Yen, Shu Jing</creatorcontrib><creatorcontrib>Ochieng, Charles</creatorcontrib><creatorcontrib>Michaels, Hillary</creatorcontrib><creatorcontrib>Friedman, Greg</creatorcontrib><collection>ERIC</collection><collection>ERIC - Full Text Only (Discovery)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yen, Shu Jing</au><au>Ochieng, Charles</au><au>Michaels, Hillary</au><au>Friedman, Greg</au><format>book</format><genre>unknown</genre><ristype>RPRT</ristype><ericid>ED503778</ericid><btitle>The Effect of Year-to-Year Rater Variation on IRT Linking</btitle><date>2005</date><risdate>2005</risdate><abstract>Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology based on the work of Tate (1999, 2000). The common item equating design was used to place the second year scores to the first year scores after a re-score of the first year test in order to adjust for rater effects. Two samples of data from contiguous years, designated as Year 1 (n ~ 1,200) and Year 2 (n ~ 2,000), from the writing and reading portions of a statewide assessment were examined. The writing test consisted of 32, 36, and 40 selected-response items for grade 4, 6, and 8 and a single writing prompt scored on a six-point scale (0-5) scored by two raters whose scores are added for a composite. The reading test consists of 75, 93, and 91 selected-response items and 12, 14, and 16 constructed response items for grade 4, 6, and 8, respectively. All the CR items in reading were scored on a three-point scale (0-2.) The resulting item parameters were compared between year one and two, with and without rater adjustment. For writing, there were significant shifts in the parameters after rater adjustment. The p-values and TCCs shifted across years when adjusted for rater effects. The impact of the parameter shifts and TCCs manifested in the changes in the proficiency classification before and after adjustment. The results of the study suggests that raters were not consistently more severe or more lenient between grades or content areas, but the resulting rater error (severity or leniency) affected the scores and thereby produced misleading results if not taken into account. (Contains 6 tables and 6 figures.)</abstract><tpages>22</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_eric_primary_ED503778
source ERIC - Full Text Only (Discovery)
subjects Evaluation Methods
Grade 4
Grade 6
Grade 8
Interrater Reliability
Item Response Theory
Measurement Techniques
Measures (Individuals)
Reading Tests
Scores
Scoring
Test Items
Testing Programs
Writing Tests
title The Effect of Year-to-Year Rater Variation on IRT Linking
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T09%3A46%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-eric_GA5&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.btitle=The%20Effect%20of%20Year-to-Year%20Rater%20Variation%20on%20IRT%20Linking&rft.au=Yen,%20Shu%20Jing&rft.date=2005&rft_id=info:doi/&rft_dat=%3Ceric_GA5%3EED503778%3C/eric_GA5%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ericid=ED503778&rfr_iscdi=true