07/30/2012 05:03 pm ET | Updated Jul 30, 2012

Texas State Standardized Tests May Have Serious Design Flaw, UT Austin Professor Walter Stroup Says

A University of Texas at Austin professor and two other researchers believe they have found a serious design flaw in Texas’s state standardized tests, the Texas Tribune reports.

In 2006, middle school students enrolled in a Dallas-area math pilot program organized by UT Austin professor Walter Stroup showed marked improvement in their understanding of mathematical concepts. By the end of the year, however, their scores had increased only marginally on state standardized Texas Assessment of Knowledge and Skills tests.

This discrepancy re-emerged when comparing students’ scores on midyear benchmark tests, and what they received on their end-of-year exams. Standardized test scores the previous year were better indicators of students’ scores the following year than the benchmark test they had taken a few months earlier.

Stroup and his co-workers believe there is a glitch in the state exams that stems from test developer Pearson’s use of “item response theory” in devising questions. Using I.R.T., test publishers -- not limited to Pearson -- select questions based on a model that correlates students’ ability with the probability that they will answer a question correctly.

According to Stroup, this allows for a test that is more sensitive to how it ranks students than to gauging what they have learned.

Pearson has a five-year, $468 million contract to create Texas’s TAKS tests through 2015. Meanwhile, the midyear benchmark exams students took were developed by the district, which explains the discrepancy in scores.

“I’ve thought about being wrong,” Stroup told the Tribune. “I’d love if everyone could say, ‘You are wrong, everything’s fine,’ ” he said. “But these are hundreds and hundreds of numbers that we’ve run now.”

According to the paper, Gloria Zyskowski, the deputy associate commissioner responsible for assessments at the Texas Education Agency, said in a statement the agency needed more time to evaluate the findings, adding that Stroup’s comments in June demonstrated “fundamental misunderstandings” about test development, and that there was no evidence of a flaw in the exam.

This past school year, the state-mandated State of Texas Assessments of Academic Readiness tests made their debut. These STAAR tests are meant as a more rigorous replacement for the TAKS exams, and will also be developed by Pearson.

Pearson also has a five-year, $32 million contract to produce New York's standardized tests for students in grades four through eight. But because Pearson uses the same questions on different state exams and because the test publisher’s contract with Texas is much larger than its contract with New York, the latter’s assessments are designed to satisfy requirements established by the Texas Education Agency.

In May, foreign language versions of New York’s state math exams administered to third through eighth graders had translation errors that invalidated 20 questions, bringing this year’s total number of invalidated state test questions to 29. This included those questions related to the confusing and controversial talking pineapple and hare passage.