Sensitivity of Equating Results to Different Sampling Strategies

In this article, the results of equating two parallel forms of the College Board Biology Achievement Test using three different sampling strategies are discussed. New-form data were collected during a fall administration of the test, and old-form data were collected at a spring administration. The g...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied measurement in education 1990-01, Vol.3 (1), p.53-71
Hauptverfasser: Schmitt, Alicia P., Cook, Linda L., Dorans, Neil J., Eignor, Daniel R.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this article, the results of equating two parallel forms of the College Board Biology Achievement Test using three different sampling strategies are discussed. New-form data were collected during a fall administration of the test, and old-form data were collected at a spring administration. The group taking the test in the spring was much more able, as measured by test score, than the group taking the test in the fall. The three sampling strategies studied were representative sampling, matched sampling, and reference or target sampling. For each sampling strategy, five equating procedures were studied: Tucker and Levine unequally reliable linear equatings, frequency estimation equipercentile and chained equipercentile curvilinear equatings, and three-parameter logistic (3PL) item response theory (IRT) true-score equating. The criterion for comparison in all cases was the results of a Tucker linear equating from a fall new-form/fall old-form representative sampling data collection design. Results of this study indicated that matching on a set of common items provided greater agreement among the results of the various equating procedures studied than were obtained under representative sampling. In addition, for all equating procedures, the results of equating with samples matched on common item scores agreed more closely with the criterion equating than did the equating results from representative samples. Matching to an external target population produced agreement among methods, but did not agree as closely with the criterion equating as matching to the new form on the basis of common item scores. The equating models least affected by differences in new-form and old-form sample abilities were the Tucker and frequency estimation equipercentile models and the procedure most affected by ability differences was the 3PL IRT procedure.
ISSN:0895-7347
1532-4818
DOI:10.1207/s15324818ame0301_5