iOS app Android app More

Featuring fresh takes and real-time analysis from HuffPost's signature lineup of contributors
Todd Farley

GET UPDATES FROM Todd Farley
 

Numbers Game

Posted: 05/17/2012 10:48 am

The scores for the writing portion of this year's FCAT (Florida Comprehensive Assessment Test) plummeted so precipitously that the abilities of the Sunshine State's student writers aren't being called into question, the very validity of those scoring statistics are. While I don't want to say "I told you so" regarding the dubiousness of those statistics, I did tell you so, as my 2009 book highlighted in great detail all the ways the numbers produced by the for-profit testing industry cannot be trusted.

Especially the stats produced at Pearson scoring centers around the country, where I worked for some 15 years.

On the first project I worked scoring student essays, I had to pass a qualifying exam to stay on the job. When I failed that qualifying exam (twice), I was unceremoniously fired. So were half the original hundred scorers who'd also failed the tests. Of course, when Pearson realized the next morning they no longer had enough scorers to complete the project on time, they simply lowered the "passing" grade on the qualifying test and put us flunkies right back on the job.

Yes, those of us considered unable to score student essays 12 hours before were welcomed back into the scoring center with open arms, deemed "qualified" after all.

Such duplicity was not an aberration in my experience, as for a decade and a half I saw every sort of corporate chicanery and statistical tomfoolery. The test-scoring industry seemed focused on getting deadlines met, projects completed, and scores put on tests, but only then was any thought given to meaningful scores being put on them.

I regularly saw unqualified people (myself included, apparently) keep their jobs scoring student responses even when they were altogether no good at the job, either when the acceptable qualifying grades were dropped so low that anyone could meet them, or when the correct answers to the qualifying exams were simply handed out before the tests were taken. I regularly saw statistics get doctored to make group reliability numbers (agreement between the scorers) look better than they really were, as high reliability stats were necessary to convince customers how standardized a job was being done and how "valid" the work really was. I regularly saw distribution numbers fixed to make score results look however a client might have wanted.

Once I attended a rangefinding meeting in Princeton with various test-scoring managers and English professors from around the country, the group having convened to figure out how to score writing samples for a national test. After that bunch of experts had finally hammered out a consensus regarding the writing rubric and writing samples we'd been reviewing, we were told we were scoring "wrong." We test-scoring pros and writing teachers were told our scoring wasn't matching the predictions of the omniscient psychometricians (statisticians/testing gurus), and we were told we had to match those predictions even though the pyschometricians had never actually seen the student essays.

When the next year I read in the New York Times that student writing scores had ended up exactly in the middle of the psychometricians' predictions, I can't say I was surprised: We'd made sure they did.

And that's the thing: In my experience, the for-profit test-scoring industry could produce results on demand. There was no statistic that couldn't be doctored, no number that couldn't be fudged, no figure that couldn't be bent to our collective will. Once, when a state Department of Education didn't like the distribution of essay scores we'd been producing over the first two weeks of a project, we simply followed their instruction to give more higher scores. "More 3's!" became our battle cry on that project, even if randomly giving more 3's was fundamentally unfair to the students whose essays had been assessed differently before.

I guess I'm saying no one really need worry too much about this year's falling FCAT scores, because they're only a number. If it's different numbers that state is after next year, they should just ask. I'm sure Pearson can just make more.

 
FOLLOW EDUCATION