
Tough talk on teacher accountability is all the rage this summer. Trouble is, we don't know how to handle the perverse incentives that arise the moment we place undue weight on easily manipulated exams. But that hasn't stopped a slew of education leaders from weighing in on the need to hold teachers' feet to the fire.
In the past few weeks, D.C. Schools Chancellor Michelle Rhee made headlines for firing 241 teachers, Secretary of Education Arne Duncan gave a major speech on education reform and Race to the Top finalists were announced for round two, many of which agreed to overhaul their state's teacher evaluation and tenure system.
Even President Barack Obama took up the theme of education, weighing in on his administration's reform agenda for three-quarters of an hour at the National Urban League Centennial Conference - although the president who relied on teacher-union support in his election treaded carefully.
"I am 110 percent behind our teachers," Obama said. "But all I'm asking in return - as a President, as a parent, and as a citizen - is some measure of accountability. So even as we applaud teachers for their hard work, we've got to make sure we're seeing results in the classroom."
The president dismissed educators' fears that their evaluations would be based on standardized test scores alone.
"Everybody thinks that's unfair. It is unfair," Obama said. "But that's not what Race to the Top is about. What Race to the Top says is, there's nothing wrong with testing - we just need better tests ..."
His remarks reflect a newfound perception that recent progress in New York schools has been mostly a mirage, and that the public trusted in tests that were flawed.
The president is right. Yes, we "just" need better tests. But creating better tests is very hard and very expensive. And in a system as vast and complex as ours, it'll be tempting to continue using tests that can be graded quickly and that don't look very different from the ones we now use. But without a radically different approach to standardized testing in this country, we are unlikely to get different results.
Some people seem to believe, however, that we've got everything figured out already - that we can precisely measure each teacher's performance, and that our standardized tests are not just good but infallible.
In this brave new age of accountability, student scores on standardized tests are being used by some districts to decide, in whole or in part, the following: which teachers are first laid off; which teachers are fired; which teachers are rated effective or ineffective; which teachers receive bonuses, and how big those bonuses are; which principals receive bonuses, and how big those bonuses are; which students are required to repeat a year; and which students graduate from high school.
These scores also have been at the center of debates on mayoral control of schools, especially in New York City and Washington, D.C. These cities' mayors, Michael Bloomberg and Adrian Fenty, respectively, have asked voters to elect and reelect them based on how they run the schools in their cities and how their students perform.
The educational decisions now made in part on standardized test scores are neither few nor inconsequential. This is hardly about who gets a sticker for a job well done, or who gets a slap on the wrist for a student's substandard performance.
It is worth remembering, then, Campbell's Law: "the more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
In other words, when important decisions are based on a handful of numbers - like standardized test scores - the numbers soon become unreliable. The incentives to distort the numbers prove irresistible to just about everyone, from mayors seeking reelection and principals hoping for bonuses to teachers wanting to keep their jobs and students longing to graduate.
That policies don't always play out as planned is a truism we must accept. And though we'll often fail to foresee unintended consequences, we shouldn't stop trying to predict - and correct for - them.
A concrete example of unintended and unforeseen consequences will help illustrate this point.
Bus drivers in Santiago, Chile are paid in one of two ways: either a fixed salary, or a variable sum determined by the number of passengers picked up. The original idea behind the differential pay was to encourage buses not to clump together by paying drivers per passenger, which would encourage drivers to space themselves out and allow new customers to accumulate at bus-stops.
Sounds great in theory.
Here's what happened in practice: bus drivers started racing to pass buses ahead of them in an effort to swoop up waiting passengers. Drivers also started leaving bus-stops before boarding passengers had found a seat or a hand-hold. So, in short, the average wait-time dropped for those served by drivers paid per passenger, but the rate of accidents skyrocketed and passenger comfort plummeted.
The lesson here is that fixating on a single metric - in this case, the number of passengers picked up per driver - distorted drivers' incentives. Safety became an afterthought. A system that sought to increase customer satisfaction by reducing wait-time ended up having the opposite effect. The tradeoff for shorter wait-times turned out to be more accidents and fewer satisfied customers.
In their myopic focus on wait-times, then, policymakers in Santiago failed to foresee that their proposed solution would generate negative externalities. And not just everyday negative externalities, like pollution or second-hand smoke, but ones with an immediate and often significant impact: injury or death in a traffic accident. Upon reflection, it should be obvious that short wait-times aren't the only thing that matters to bus-riders. They also want to arrive at their destinations in one piece, without having to visit the hospital or morgue. But policymakers appear not to have considered this.
Now, let's look at recent student test scores in New York City. The public has heard for years from Mayor Bloomberg and Schools Chancellor Joel Klein that the city's schools are improving. Bloomberg and Klein have regularly cited better student test scores as evidence of improvement - that is, higher percentages of students demonstrating "proficiency" on state exams.
But last week it was revealed that these test scores actually show something quite different: not better performances by students, but lower standards and easier-to-pass tests. The same press that dutifully reported student improvement changed its tune.
The New York Daily News titled its piece, "Big, Fat F in Schools," while The Wall Street Journal's headline read "'Hard Truth' on Education."
But what was most surprising about the coverage was that the news surprised anyone. "You mean students haven't really gotten a lot smarter in the last two years?" some wondered.
No, they haven't.
But they haven't gotten a lot dumber either. Their performance is, in fact, largely unchanged.
What changed is simply the state's definition of "proficient."
The gains were merely an illusion, sleight of hand on the part of policymakers and politicians. Mayor Bloomberg said his interpretation was that "the test is harder and more comprehensive," but this wasn't the truth. The test isn't harder or more comprehensive; it's just that the minimum passing score was increased.
The real story isn't that years of gains were erased, as The Wall Street Journal said. It's that that there was no academic progress in the first place - just a lower bar for determining who was declared proficient.
The skeptics among us - those who have questioned such results for months, if not years - felt vindicated at last. But it's a shame that vindication was so long in coming, and it's a scandal that more people are not incensed now. I don't quite understand where the rage and outrage are.
What can we learn from the New York City example? I can think of at least four lessons.
1. We shouldn't get excited or depressed about short-term changes in test scores. Often they don't mean much. Long-term trends are more reliable - and therefore more meaningful. Scores on the National Assessment of Educational Progress (NAEP) going back one, two and three decades are trustworthy. An individual state's scores from last year probably aren't.
2. Politicians are prone to slicing and dicing scores to their advantage. This shouldn't surprise us, but neither should it silence us. Year-to-year changes in scores are unimpressive? Look at the decade-long trend. Long-term trends show no growth? Look at the change over the past two years. This is the game in which Michelle Rhee engaged last month when the percentage of elementary students in Washington, D.C. deemed proficient in reading and math unexpectedly dropped this year. Rhee touted instead the gains since 2007-08.
3. When numbers look too good to be true, they're too good to be true. This is no less true of schooling than baseball and cycling. Seventy-three homeruns in a single season? Hmm. An epic comeback in Stage 17 of the 2006 Tour de France? Hmm. Those results strained credulity because they weren't clean - and people suspected so from the start but had to wait years for confirmation.
We've seen similar things in schools. In New York City, 97 percent of elementary and middle schools earned As or Bs on the district report card last year, compared to 79 percent in 2008 and just 61 percent in 2007. Are most schools getting dramatically better in just one or two years? Probably not. As President Obama said last week, "change is hard. ...We won't see results overnight." We should always be wary of overnight results.
Randi Weingarten, president of the American Federation of Teachers, said in response to President Obama's speech, "there are no silver-bullet solutions for our schools." There's only hard work, day after day and year after year, with the possibility of gradual - real and substantive - improvement. Instant, immense improvement is as rare and elusive as Halley's Comet. It is therefore also suspect.
The most likely explanations for a school whose students dramatically improve from one year to the next are that the test has changed or that the school is serving a different population of students. And the most likely explanation for a school whose students do significantly worse one year to the next is a change in how performance is being measured, not a change in the students' actual performance. This is the story of Public School 85 in the Bronx, where math proficiency among third-graders plunged from 81 percent two years ago to 18 percent last year. The good news is that last year's students probably weren't any worse than their apparently highflying predecessors; what changed was the definition of "proficient," not the students' performance.
4. We remain very far from an accountability system impervious to perverse incentives. Therefore, we must be very careful in how we use student test scores in any decisions, especially those about personnel. A Mathematica study released last month by the U.S. Department of Education says that "in a typical performance measurement system, more than 1 in 4 teachers who are truly average in performance will be erroneously identified" as below average, with a similar percentage of below-average teachers not showing up as underperformers. This should scare not just classroom teachers but anyone who believes our current data systems are infallible. They are not. Importantly, the study also notes that more than 90 percent of the variation in student learning is due to factors beyond a teacher's control. We ignore this fact at our own peril. It does not mean that teachers don't matter, or that teachers cannot or should not be held accountable. But it does mean that we must proceed cautiously and ask tough questions of those who believe we've finally found the holy grail to measure teacher performance.
A version of this story appeared here on The Washington Post's "Answer Sheet" on August 7, 2010.
It is well know that what gets measured is what gets done.
If the US does not objectively measure our students how will we know that they will be prepared for the future they will have to survive in?
The reality is now only seen AFTER the students are sent out into the world and the reality is very dismal. US teenagers and young adults are for the most part very poorly prepared to function in society and are incapable of learning the things necessary to be productive.
In other words, their education foundation is so weak that they have nothing to build on.
All those that rail against testing ignore the reality that the students that come out of our schools will need to survive in a very cruel and competitive world and with the Internet making it possible for many jobs to be done anywhere on earth, the US students are not only competing with each other but with every other human on earth.
So if not "standardized tests," what OBJECTIVE and relevent way is there to measure knowledge?
How can you teach or test a student under these conditions?
The lack of desire to learn is a SOCIAL problem directly caused by parents and the society that the students grow up in.
Far too many social groupings in the US not only do not aggressively advocate for high academics, but some actively resist all attempts at raising academics because they view it as an assault on their culture.
Don't blame the schools, blame the parents. Schools have very limited ability to undo the damage done by parents.
The same will happen with educators, it will be dumbed down because the Unions wont allow teachers to be held accountable. The Unions have Obama in his pocket, end of story status quo and the USA continues to decline in education.
Both the Mormon Church and the US military now this well. Both organizations are constantly training people to to speak a language different from their native one. Both use similar techniques - full immersion where the students are not allowed to use their native language at all during the day.
Students that enter US schools should be taught English before they take any other classes. Until they are fluent in English it is a waste of time to try to teach the other subjects.
Note that Chinese students are REQUIRED to be fluent in BOTH Chinese and English (and the English is taught by immersion with no Chinese used in class).
There is absolutely no reason that US student shouldn't be required to be fluent in English before they are taught anything else.
Teaching and instruction are very hard to measure. No adequate or complete metric has been devised which can tell us what a good teacher is or what results a good teachers should expect to get.
To use a religious comparison, by the metric of "measurable results", the prophets Isaiah, Jeremiah, and John the Baptist were all miserable failures. Very few converts, and the people in authority did not listen to them, and their teaching was ignored. They would have been "fired" for doing a miserable job. Evidently though, people after them thought differently and used different metrics to determine their worth.
I definitely agree that there needs to be accountability. But would Obama want the same metric held to him for his job as he would hold on teachers for theirs?
Until the stake is higher on parents to getting involved in their child's education, and the Government and Business work to empower the parent's ability to address the educational needs of their child we can blame teachers all we want but the problem with education in American isn't going to improve.
We do a miserable job at teaching students to be parents. That should be the primary social studies curriculum.
If we really want to improve our education levels, we need to focus on pre-school children. The first years are critical. I believe that we need free universal child care with a strong educational component. This should not be mandatory, but freely available to every family. There should not be a formal curriculum but children should be exposed to a wide variety of enriching experiences, There should be a parental support system including loaning libraries.
Your 2nd sentence creates a straw man that you beat up for the rest of your post. While your point about standardized tests has merit, what makes you think that teacher accountability is only defined by standardized tests? What about allowing school choice programs where parents can send their kids to schools that have the best academic environment and outcomes (as judged by the parent)? This judgment could focus on things like the college graduation rate of students who attend that K-12 school or the employment rate or something.