The Effect of Year-to-Year Rater Variation on IRT Linking
Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results us...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Report |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Yen, Shu Jing Ochieng, Charles Michaels, Hillary Friedman, Greg |
description | Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology based on the work of Tate (1999, 2000). The common item equating design was used to place the second year scores to the first year scores after a re-score of the first year test in order to adjust for rater effects. Two samples of data from contiguous years, designated as Year 1 (n ~ 1,200) and Year 2 (n ~ 2,000), from the writing and reading portions of a statewide assessment were examined. The writing test consisted of 32, 36, and 40 selected-response items for grade 4, 6, and 8 and a single writing prompt scored on a six-point scale (0-5) scored by two raters whose scores are added for a composite. The reading test consists of 75, 93, and 91 selected-response items and 12, 14, and 16 constructed response items for grade 4, 6, and 8, respectively. All the CR items in reading were scored on a three-point scale (0-2.) The resulting item parameters were compared between year one and two, with and without rater adjustment. For writing, there were significant shifts in the parameters after rater adjustment. The p-values and TCCs shifted across years when adjusted for rater effects. The impact of the parameter shifts and TCCs manifested in the changes in the proficiency classification before and after adjustment. The results of the study suggests that raters were not consistently more severe or more lenient between grades or content areas, but the resulting rater error (severity or leniency) affected the scores and thereby produced misleading results if not taken into account. (Contains 6 tables and 6 figures.) |
format | Report |
fullrecord | <record><control><sourceid>eric_GA5</sourceid><recordid>TN_cdi_eric_primary_ED503778</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ericid>ED503778</ericid><sourcerecordid>ED503778</sourcerecordid><originalsourceid>FETCH-eric_primary_ED5037783</originalsourceid><addsrcrecordid>eNrjZLAMyUhVcE1LS00uUchPU4hMTSzSLcnXBdEKQYklqUUKYYlFmYklmfl5CkDkGRSi4JOZl52Zl87DwJqWmFOcyguluRlk3FxDnD10U4syk-MLijJzE4sq411dTA2Mzc0tjAlIAwDICyqb</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>report</recordtype></control><display><type>report</type><title>The Effect of Year-to-Year Rater Variation on IRT Linking</title><source>ERIC - Full Text Only (Discovery)</source><creator>Yen, Shu Jing ; Ochieng, Charles ; Michaels, Hillary ; Friedman, Greg</creator><creatorcontrib>Yen, Shu Jing ; Ochieng, Charles ; Michaels, Hillary ; Friedman, Greg</creatorcontrib><description>Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology based on the work of Tate (1999, 2000). The common item equating design was used to place the second year scores to the first year scores after a re-score of the first year test in order to adjust for rater effects. Two samples of data from contiguous years, designated as Year 1 (n ~ 1,200) and Year 2 (n ~ 2,000), from the writing and reading portions of a statewide assessment were examined. The writing test consisted of 32, 36, and 40 selected-response items for grade 4, 6, and 8 and a single writing prompt scored on a six-point scale (0-5) scored by two raters whose scores are added for a composite. The reading test consists of 75, 93, and 91 selected-response items and 12, 14, and 16 constructed response items for grade 4, 6, and 8, respectively. All the CR items in reading were scored on a three-point scale (0-2.) The resulting item parameters were compared between year one and two, with and without rater adjustment. For writing, there were significant shifts in the parameters after rater adjustment. The p-values and TCCs shifted across years when adjusted for rater effects. The impact of the parameter shifts and TCCs manifested in the changes in the proficiency classification before and after adjustment. The results of the study suggests that raters were not consistently more severe or more lenient between grades or content areas, but the resulting rater error (severity or leniency) affected the scores and thereby produced misleading results if not taken into account. (Contains 6 tables and 6 figures.)</description><language>eng</language><subject>Evaluation Methods ; Grade 4 ; Grade 6 ; Grade 8 ; Interrater Reliability ; Item Response Theory ; Measurement Techniques ; Measures (Individuals) ; Reading Tests ; Scores ; Scoring ; Test Items ; Testing Programs ; Writing Tests</subject><creationdate>2005</creationdate><tpages>22</tpages><format>22</format><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,687,776,881,4476</link.rule.ids><linktorsrc>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=ED503778$$EView_record_in_ERIC_Clearinghouse_on_Information_&_Technology$$FView_record_in_$$GERIC_Clearinghouse_on_Information_&_Technology$$Hfree_for_read</linktorsrc><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=ED503778$$DView record in ERIC$$Hfree_for_read</backlink></links><search><creatorcontrib>Yen, Shu Jing</creatorcontrib><creatorcontrib>Ochieng, Charles</creatorcontrib><creatorcontrib>Michaels, Hillary</creatorcontrib><creatorcontrib>Friedman, Greg</creatorcontrib><title>The Effect of Year-to-Year Rater Variation on IRT Linking</title><description>Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology based on the work of Tate (1999, 2000). The common item equating design was used to place the second year scores to the first year scores after a re-score of the first year test in order to adjust for rater effects. Two samples of data from contiguous years, designated as Year 1 (n ~ 1,200) and Year 2 (n ~ 2,000), from the writing and reading portions of a statewide assessment were examined. The writing test consisted of 32, 36, and 40 selected-response items for grade 4, 6, and 8 and a single writing prompt scored on a six-point scale (0-5) scored by two raters whose scores are added for a composite. The reading test consists of 75, 93, and 91 selected-response items and 12, 14, and 16 constructed response items for grade 4, 6, and 8, respectively. All the CR items in reading were scored on a three-point scale (0-2.) The resulting item parameters were compared between year one and two, with and without rater adjustment. For writing, there were significant shifts in the parameters after rater adjustment. The p-values and TCCs shifted across years when adjusted for rater effects. The impact of the parameter shifts and TCCs manifested in the changes in the proficiency classification before and after adjustment. The results of the study suggests that raters were not consistently more severe or more lenient between grades or content areas, but the resulting rater error (severity or leniency) affected the scores and thereby produced misleading results if not taken into account. (Contains 6 tables and 6 figures.)</description><subject>Evaluation Methods</subject><subject>Grade 4</subject><subject>Grade 6</subject><subject>Grade 8</subject><subject>Interrater Reliability</subject><subject>Item Response Theory</subject><subject>Measurement Techniques</subject><subject>Measures (Individuals)</subject><subject>Reading Tests</subject><subject>Scores</subject><subject>Scoring</subject><subject>Test Items</subject><subject>Testing Programs</subject><subject>Writing Tests</subject><fulltext>true</fulltext><rsrctype>report</rsrctype><creationdate>2005</creationdate><recordtype>report</recordtype><sourceid>GA5</sourceid><recordid>eNrjZLAMyUhVcE1LS00uUchPU4hMTSzSLcnXBdEKQYklqUUKYYlFmYklmfl5CkDkGRSi4JOZl52Zl87DwJqWmFOcyguluRlk3FxDnD10U4syk-MLijJzE4sq411dTA2Mzc0tjAlIAwDICyqb</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Yen, Shu Jing</creator><creator>Ochieng, Charles</creator><creator>Michaels, Hillary</creator><creator>Friedman, Greg</creator><scope>ERI</scope><scope>GA5</scope></search><sort><creationdate>2005</creationdate><title>The Effect of Year-to-Year Rater Variation on IRT Linking</title><author>Yen, Shu Jing ; Ochieng, Charles ; Michaels, Hillary ; Friedman, Greg</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-eric_primary_ED5037783</frbrgroupid><rsrctype>reports</rsrctype><prefilter>reports</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Evaluation Methods</topic><topic>Grade 4</topic><topic>Grade 6</topic><topic>Grade 8</topic><topic>Interrater Reliability</topic><topic>Item Response Theory</topic><topic>Measurement Techniques</topic><topic>Measures (Individuals)</topic><topic>Reading Tests</topic><topic>Scores</topic><topic>Scoring</topic><topic>Test Items</topic><topic>Testing Programs</topic><topic>Writing Tests</topic><toplevel>online_resources</toplevel><creatorcontrib>Yen, Shu Jing</creatorcontrib><creatorcontrib>Ochieng, Charles</creatorcontrib><creatorcontrib>Michaels, Hillary</creatorcontrib><creatorcontrib>Friedman, Greg</creatorcontrib><collection>ERIC</collection><collection>ERIC - Full Text Only (Discovery)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yen, Shu Jing</au><au>Ochieng, Charles</au><au>Michaels, Hillary</au><au>Friedman, Greg</au><format>book</format><genre>unknown</genre><ristype>RPRT</ristype><ericid>ED503778</ericid><btitle>The Effect of Year-to-Year Rater Variation on IRT Linking</btitle><date>2005</date><risdate>2005</risdate><abstract>Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology based on the work of Tate (1999, 2000). The common item equating design was used to place the second year scores to the first year scores after a re-score of the first year test in order to adjust for rater effects. Two samples of data from contiguous years, designated as Year 1 (n ~ 1,200) and Year 2 (n ~ 2,000), from the writing and reading portions of a statewide assessment were examined. The writing test consisted of 32, 36, and 40 selected-response items for grade 4, 6, and 8 and a single writing prompt scored on a six-point scale (0-5) scored by two raters whose scores are added for a composite. The reading test consists of 75, 93, and 91 selected-response items and 12, 14, and 16 constructed response items for grade 4, 6, and 8, respectively. All the CR items in reading were scored on a three-point scale (0-2.) The resulting item parameters were compared between year one and two, with and without rater adjustment. For writing, there were significant shifts in the parameters after rater adjustment. The p-values and TCCs shifted across years when adjusted for rater effects. The impact of the parameter shifts and TCCs manifested in the changes in the proficiency classification before and after adjustment. The results of the study suggests that raters were not consistently more severe or more lenient between grades or content areas, but the resulting rater error (severity or leniency) affected the scores and thereby produced misleading results if not taken into account. (Contains 6 tables and 6 figures.)</abstract><tpages>22</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | |
ispartof | |
issn | |
language | eng |
recordid | cdi_eric_primary_ED503778 |
source | ERIC - Full Text Only (Discovery) |
subjects | Evaluation Methods Grade 4 Grade 6 Grade 8 Interrater Reliability Item Response Theory Measurement Techniques Measures (Individuals) Reading Tests Scores Scoring Test Items Testing Programs Writing Tests |
title | The Effect of Year-to-Year Rater Variation on IRT Linking |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T09%3A46%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-eric_GA5&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.btitle=The%20Effect%20of%20Year-to-Year%20Rater%20Variation%20on%20IRT%20Linking&rft.au=Yen,%20Shu%20Jing&rft.date=2005&rft_id=info:doi/&rft_dat=%3Ceric_GA5%3EED503778%3C/eric_GA5%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ericid=ED503778&rfr_iscdi=true |