12/23/2010 11:38 am ET Updated May 25, 2011

Teacher Merit Pay: Why Test Scores Shouldn't Dictate Salaries

During the nearly 15 years I toiled in the standardized testing industry, most of my time was spent scoring student responses to the open-ended questions on state assessments, or training others to do the same. Because of that experience, I'm amazed today to hear the suggestion that the jobs and salaries of American teachers should, in part, be based on how well their students do on standardized tests. No more than a couple examples from my time in testing indicate how unwise a decision that might be.

To begin, in my assessment career, we always went out of our way to differentiate between "scoring" and "grading," which in my mind always differentiated between testing and teaching. "Scoring isn't the same as grading," I heard dozens of times in my career.

The speech was always the same. "When grading a student's work," some testing company flack (perhaps me) would say, "A teacher might consider more than just how the student answered a question. The teacher might not give out a grade before also taking into account how much effort the student was putting in, how much improvement he or she was making, how much time was spent studying a subject, or how well other students answered the question. In scoring student responses on these tests, however, all we care about is how the words written on the page match up with our scoring rubric."

"Remember that scoring," the speech always concluded, "is not the same as grading!"

That speech was usually given during a project only after one (or more) of the scorers began to protest that the job they were being asked to do was too one-dimensional, complaining that test "scoring" didn't adequately deal with complex student answers. The disgruntled scorers argued that it was too superficial. The "scoring v. grading" speech was unveiled mostly in order to assuage those scorers' fears by promising them the "scoring" of student answers on large-scale assessments was only supposed to be a quick measure of student work, not anything as deep or meaningful as the real decisions professional educators made about students every day.

So, you can imagine my surprise -- while we, in large-scale assessment once used to produce results only with the caveat that they weren't as robust a measure of student learning as were the grades teachers meted out, now those simplistic scores the testing industry spits out by the millions were going to be used to assess those very teachers. I have to admit, I didn't see that coming.

This second reality about testing and teaching is a little embarrassing to admit, but during my time in testing, the absolute worst people hired to score student responses were classroom teachers, active or former. In truth, that's a statement more about the kind of job test scoring is than it is a commentary on the abilities of the many capable teachers I worked with. But neither I nor many of my colleagues in the testing industry relished the idea of classroom teachers in our pool of test scorers.

The tendency those current or former educators have of giving thoughtful readings to student responses was, frankly, the bane of my existence as a trainer. If I was standing in front of a group of 20, 50 or 100 temporary employees newly hired to score tests, and if we had to get through 100,000 student responses in two weeks, the last thing I needed was for each of those scorers to be giving a meticulous and earnest review to every student response.

Meticulous and earnest reviews of every student response meant the scorers might never agree with each other (one scorer might find some esoteric nugget of wisdom in a single word of the first sentence, while perhaps another would find some major fault in the second), and the scorers agreeing with each other was the primary goal of "standardized" test scoring. When I was a trainer, I didn't need my scorers spending five minutes to look for the hidden truth in every response; I needed them to look for key words and slap down a quick score. People who cared too much (teachers and ex-teachers) were certainly not the people who helped most in that regard, and hence I usually found myself hoping for a team of scorers that wasn't too invested in the state of American education.

To reiterate, teachers and ex-teachers made bad standardized test scorers because they actually gave a damn about the students, while my scoring projects were usually better served by people who cared a little less. Ironically, that means if test scores do end up being used to evaluate the jobs being done by American teachers, those people who "cared a little less" will end up assessing the jobs being done by those classroom teachers who really are invested in American education.