iOS app Android app More

Featuring fresh takes and real-time analysis from HuffPost's signature lineup of contributors
Todd Farley

Todd Farley

GET UPDATES FROM Todd Farley
 

In Defense of the Standardized Testing Industry

Posted: 12/31/10 12:12 PM ET

I am in favor of the Obama administration's continued emphasis on standardized testing, but that's not only because I've long been getting rich in standardized testing. Of course not. My fervent support of the President's proposal to test pretty much every kid at the drop of pretty much every hat results because I do believe the best way to see what's happening in this country's schools is to get the opinions of those for-profit testing companies located hundreds, if not thousands, of miles away from the classrooms. I mean, obviously...

Given the downright scientific nature of the work we in the testing industry do, I can't see how anyone would deign to doubt the results. Multiple-choice standardized tests are scored electronically, for Pete's sake, the penciled-in score sheets simply scanned into massive computers and compared to a pre-loaded answer key. That foolproof system hardly ever screws up, with scoring snafus appearing to happen no more than every couple of years (see SAT Test 2006, Praxis Teacher Certification Test 2004, Minnesota Math Exam 2000....). Admittedly, those were huge mistakes leading to national headlines, failed students, crushed dreams, outraged parents, lost jobs, class-action lawsuits, out-of-court settlements, and the erstwhile name changes of guilty companies, but, again, those boo-boos do seem to happen only occasionally.

Plus, in the infrequent instance there is a problem with that infallible system, there's usually a pretty good explanation for it. Pearson Education, for example, the company responsible for the incorrect reporting of SAT scores in 2006, justified their mistake by explaining they had a hard time scanning the scoresheets due to "abnormally high moisture content" in the score sheets (USA Today). News reports about that colossal blunder invariably included explanations of "wetness," "dampness," and "moisture," and on that point I can only defer to the company's expertise: If Pearson says the problem was clamminess, I assume it was clamminess. While I do feel bad for the students whose college dreams were obliterated when they got the wrong SAT scores back, I don't blame Pearson. They can't prepare for everything, and if their business was undermined by an Act of God like the unexpected meteorological phenomena of springtime precipitation, what's a poor corporation to do?

It would be disingenuous to suggest open-ended questions that are scored by fallible human beings can be assessed in a process as precise as those multiple choice items can be, but I'd say the testing industry comes pretty darned close. It might seem almost impossible to be able to get a huge group of temporary employees (hired off the streets) to mete out points (while under pressing deadlines) to tens of thousands (or more) of varied student responses in any sort of standardized way, but that's only because no one imagines the exactitude with which the testing industry works. We are, however, like a well-oiled machine, able to get those independent-minded part-timers to score in absolute lockstep simply by taking away their ability to think for themselves.

I recall once, for example, a Reading test asking fourth-graders about a passage they'd read about the human tongue and taste buds. One question asked the kids four distinct things (their favorite food, its flavor, where on the tongue that flavor was found, and how the taste buds work), with the original scoring rubric (established by classroom teachers) instructing the scorers to dole out one point for each of the four elements listed above. The teachers writing the rubric imagined straightforward answers like "my favorite food is popcorn, which is salty" (two points!) and "I like apples, a sweet taste found on the front of the tongue" (three points!), a scoring system that worked fine at least until theory turned into practice. Once it did -- once those intransigent schoolchildren started swamping us with all their unusual and unexpected answers -- then the scoring philosophy of those schoolteachers had to be laid to rest and the genius of the testing industry could be brought to bear.

The kids, you see, weren't just saying they liked to eat "apples" or that apples were "sweet." The kids were saying their favorite foods were "grass" and "water" and "Styrofoam," too, and even when they were identifying normal foods like "pizza" as a favorite they were then saying it was "salty," "sour," "bitter," and "sweet" (a.k.a. the entire spectrum of four flavors the human tongue can recognize). Furthermore, the students would often list a favorite food with what seemed an incorrect flavor ("my favorite food is ice cream, which is salty"), and then they would say they tasted that flavor on the tip of their tongue, which is not where one would taste "salty" (the side of the tongue) but is where one would taste ice cream, assuming it was sweet. The first couple hours of this scoring project, in other words, were pretty much total bedlam, massive disagreements within the group of employees I was training about whether "toothpaste" or "ice cubes" could be counted as favorite foods ("no" to the former and "yes" the latter), or "bitter" could be counted as the flavor of pizza (originally "no," at least until we considered toppings such as anchovies and artichokes, so then "yes").

Amid all that arguing ("I refuse to accept ice cubes as a favorite food!"), amid all that bickering ("no, I would not call pizza sweet even if there is pineapple on it!"), I realized I would have to do the same thing I always did. The only way I could ensure those 60,000, fourth-grade student responses were scored by my fifty temps in a standardized way was to establish scoring rules so firm, so rigid, so absolutely unyielding, that we would eliminate from the process any element of humanity.

It wasn't so hard. I did so first by making an exhaustive list of anything that could be counted as an acceptable favorite food (pizza, popcorn, Kool-Aid, water, salt, grass, Gummi worms, etc.) and anything that could not (dirt, plastic straws, real worms, beer, wine, etc.). Then I established that any flavor a student identified would be accepted in conjunction with any favorite food. Ergo, a student identifying "pizza" as a favorite food would be credited for saying it was "salty" (of course), but also for saying it was "sweet" (the pineapple?), "sour" (anchovies, onions, etc.), or "bitter" (anchovies, onions, etc.). Enough kids said that ice cream tasted salty (pistachios?) or sour (lemon sherbet?), and enough kids said that potatoes were salty (uh-huh) or sweet (sweet potatoes?) or sour and bitter (sour cream?), that ultimately I decided we just had to accept 'em all, adult logic be damned. I was also pretty lenient on how the group should award credit regarding the location of the four basic flavors on the tongue, ultimately deciding to accept answers both when the kids identified the correct placement of the flavors (sweet in the front, salty on the side, etc.) and when they did not (sweet on the side, if talking about popcorn, which really would have been tasted on the side, it being salty and all...)

By the end of our training and our re-training, no one could say that we weren't being consistent. In fact, I'd argue that you would be hard-pressed to find any education professional (whether a classroom teacher, Michelle Rhee, John Legend or even Michael Bloomberg) who could have dealt with the scoring of that ceaseless deluge of wacky student responses any more cohesively, any more harmoniously, any more reliably than did my group of temps. As promised, I had them scoring every bit as mechanically, every single bit as mindlessly, as did those brilliant machines scoring the multiple-choice tests.

Yup, my entire group of temps knew to give three points to a student who answered that "pizza" was his favorite food (check), it tasted "salty" (check), and that taste was found on the side of the tongue (check); just as my entire group knew to give three points to any kid who said "grass" was his favorite food (um, okay), it tasted bitter (if you say so), and that flavor was found "all over" (sure thing...). There's a reason it's called the standardized testing industry, you know, so trust us, America. Just send us your money and trust us.