My 8th graders were discussing a documentary film on the representation of Latinos in the media when our conversation veered to the topic of standardized testing. I don't remember how it came up exactly, but if you're a teacher or student in a Chicago school, testing is always lurking close by.
"The MAP test doesn't even test us on what we're actually learning in class," one student protested. After a couple others voiced critiques of their own, I told them that their scores on the MAP--a computer-based test known more formally as Measures of Academic Progress--would nonetheless be used to grade their teachers as part of Chicago Public Schools' new "value-added" system of evaluation.
"But wait, what if you don't teach reading or math?" a student asked, aware that the MAP only assesses those two subjects.
"Then they base your evaluation on the average reading scores of the entire school," I said. "That's how I'll be evaluated." Around the room, faces twisted in confusion.
"But you don't even teach reading or math!" someone said. And then several, almost in unison: "That's stupid!" Leave it to young people to see through the insanity of current education policy in a matter of minutes.
My 8th graders' spot-on analysis that day reminded me of Herb Kohl's prescient 2003 book, Stupidity and Tears: Teaching and Learning in Troubled Times. The stupidity in Kohl's title referred to education policies that were then just kicking into high gear--policies that valued compliance over creativity, sapped the joy from classrooms, and had "the consequence of perpetuating ignorance, keeping poor, defiant, marginalized youth 'in their place.'" The tears were the byproduct of such policies: those of teachers following ridiculous mandates against their better judgment, or of students subjected to the constraints of a scripted, seemingly irrelevant curriculum.
A decade later, things have only gotten stupider, and the widespread embrace of value-added models (VAM) for purposes of teacher evaluation is one of the most obvious pieces of evidence. The complex statistical calculations used in value-added formulas are supposed to isolate a teacher's impact on her students' growth--as measured, of course, by gains on standardized test scores. But there's no convincing research to show that value-added models have done anything to help teachers improve or kids learn, and growing evidence shows them to be wildly inaccurate and erratic.
An April 2014 statement on VAMs by the American Statistical Association noted that they "typically measure correlation, not causation," and that effects "attributed to a teacher may actually be caused by other factors that are not captured in the model." The ASA added, "VAM scores themselves have large standard errors, even when calculated using several years of data. These large standard errors make rankings [of teachers] unstable, even under the best scenarios for modeling."
Anecdotal evidence tells much the same story. Maribeth Whitehouse, a Bronx special education teacher who scored in the 99th percentile--better than nearly all other teachers in New York City--on the 2012 value-added Teacher Data Reports, told the New York Times' Michael Winerip that the data were "nonsense," and wrote a letter with other high-scoring teachers in protest of their use.
Chicago Public Schools started using VAM as part of its teacher evaluation system in 2012-13--my first year back in the classroom after twelve years as a teacher educator. I was fortunate to join a team of talented and experienced 7th and 8th grade teachers, each of whom, in my view, did remarkable work with our kids. I tried hard that year to pull my share of the load--getting to know my students, developing challenging curriculum, and doing my best to plan lessons that were engaging and meaningful. Of course, I sometimes fell short of those aims, but looking back at the end of the year, I didn't question my focus or my effort.
But when VAM ratings were released the next fall, they painted a different picture. My value-added metric was a -0.79. At the time, I didn't know exactly what that meant, but I knew the negative sign in front of the number wasn't good.
My colleagues' ratings weren't much better. In fact, our entire team--with a combined 85 years of experience in Chicago classrooms and former students who had gone on to become lawyers, medical doctors, social workers, and community activists--had negative value-added scores. What that meant was that, according to the VAM calculations, each of us had made a negative difference in our students' growth compared to what an "average" teacher might have achieved. Our students learned less because they had us as teachers.
Even if you know it's all a bunch of number-crunching craziness, even as you realize that the margin of error is almost the same size as your value-added rating, it's still demoralizing. And the assumptions that accompany VAM are maddening: That good teaching can be neatly and precisely quantified. That the depth and breadth of a teacher's work can be captured by student test scores. That cultural connections between a teacher and student are irrelevant. That a mathematical formula, no matter how complex, can grasp the impact of poverty or inadequate housing or exposure to gun violence on the educational life of a child.
The day after we received our value-added ratings, I arrived at school nearly an hour early. One of my colleagues, Cudberto Esparza, was already in his classroom, as he is every morning without fail, ready to provide extra tutoring for students who need it. Sometimes one or two kids show up; others days it's six or seven. Each of them will gladly tell you how much value Mr. Esparza adds to their lives. And they won't need any convoluted calculations to do it.
Adapted with permission of Teachers College Press from the forthcoming book, Worth Striking For: Why Education Policy is Every Teacher's Concern (Lessons from Chicago), by Isabel Nuñez, Gregory Michie, and Pamela Konkol. Copyright © 2015 by Teachers College, Columbia University. All rights reserved.