Test publisher Pearson, which has a five-year, $468 million contract to create Texas’s state exams through 2015, has responded to research by University of Texas at Austin professor Walter Stroup that points to a serious design flaw embedded in the tests, the Texas Tribune reports.
The company’s senior vice president for measurement services, Denny Way, said it welcomes an open dialogue about the role of standardized tests in student and school evaluation, but that such discussion should not be based on Stroup’s “wild conclusions,” which Way maintains would not hold up under expert scrutiny.
Analyses conducted by Stroup and two other UT Austin researchers call into question Pearson’s application of “item response theory,” a popular method among test publishers for devising standardized exams. Using I.R.T., test developers select questions based on a model that correlates students’ ability with the probability that they will answer a question correctly.
According to Stroup’s research, this method -- when applied to the creation of large-scale standardized exams -- results in a test that is more sensitive to how it ranks students, versus gauging what they have learned from year to year.
This explains, Stroup says, why standardized test scores on the Texas Assessment of Knowledge and Skills from the previous year were better indicators of students’ scores the following year, than a district-administered benchmark test they had taken a few months earlier.
Way rejected this claim, stating that any correlation between students’ scores probably reflected the fact that students are retaining what they’ve learned and building on that knowledge.
The Texas Education Agency also issued a statement defending the State of Texas Assessments of Academic Readiness, which ninth-grade students began taking this past school year, and the TAKS, which the state has used since 2003. According to the agency, the exams “are designed and administered in a transparent and highly scrutinized process” and “routinely undergo” reviews for technical quality.
Howard Everson, former vice president for research at the College Board, told the Texas Tribune because statewide accountability exams are developed to compare school districts, they are not a good indicator of how well classroom instructional practices are working within a district. Thus, he said, these tests do not represent the best measures for evaluating the quality of educational programs.
Stroup has said that if his research is correct, and the tests do not sufficiently measure the quality of instruction, Pearson should justify their use in the state’s accountability system -- and in a public hearing, not behind the TEA’s closed doors.