05/07/2014 09:43 am ET Updated Jul 07, 2014

A Dialogue With the Gates Foundation About School Reform

Once again, I would like to thank the American Enterprise Institute's Rick Hess and the Gates Foundation's Steve Cantrell for a dialogue on school reform. Here are my first thoughts on the conversations with Dr. Cantrell, as presented in Hess' Aftermath: My Note to the Gates Foundation.

Firstly, I am glad that we mostly stayed focused on stakes attached to standardized tests. There are numerous subjects we can argue over but, in my opinion, high-stakes testing is the issue most worth fighting over. It is a policy that is doing great harm to students and teachers, especially in poor schools. It is the policy that is most complicating and intensifying the battles over the other issues that Cantrell and I sidestepped. I believe that a consensus is rapidly forming among most educators and parents that this testing mania must stop. Then, we can move on to more humane and effective methods of accountability and school improvement.

In education, we are two peoples divided by a common language. Whenever I speak with reformers, I'm always struck by the way we and they use a very few words in slightly different ways, and how extreme and emotional misunderstandings result. The potential for miscommunication is greatest when speaking with persons with little or no teaching experience in the inner city.

My post explained why the predictable consequence of the value-added evaluations that Gates supports, even when balanced by "multiple measures," would be an exodus of teaching talent out of the most challenging schools (where it is harder to raise test scores.) I argued that the Measures for Effective Teaching (MET) project showed that effective teaching can be measured well enough to use value-added and other metrics for diagnostic purposes and for policy discussions. But, it provides no evidence that value-added can be made valid for evaluations of individuals.

The difference is the burden of proof that is necessary for informing policy discussions versus taking punitive actions. When a basic research study is x% inaccurate for individuals, that may be a huge success. But, who would commit to a teaching career where such a chance PER YEAR could damage or destroy it?

Cantrell seemed to hear much, but not all, of my position. He wrote, "John is primarily concerned about error. He believes the new evaluation systems are in the hands of administrators (and statisticians) who through intent or incompetence inaccurately judge teachers in ways that negatively impact their careers."

My position is that the real problem is the predictable response of fearful systems, not incompetent or scheming individuals. "I have no doubt that the efforts by the Gates Foundation can encourage better policies," I argued, "Especially in states and districts that have the confidence born of a history of success, the MET findings can be used properly. For instance, I'm cautiously optimistic about the Tulsa/Gates/Kaiser Foundation collaboration."

The policy issue "is how will they be used, constructively and destructively." How, I asked, "can teachers not oppose reforms that can be beneficial before concrete checks and balances for the inevitable misuses are nailed down?"

Perhaps the best single outcome of the dialogue, from my perspective, was Cantrell's response, "John mentioned the need to put safeguards in place before teaching effectiveness measures are used for consequences. I couldn't agree more."

But, he offered no indication that the Gates Foundation agrees or that it will take actions to help us gain such protections from laws that have already be been passed.

Cantrell also observed that the "desire to use evaluation measures to rank teachers is the real problem here, and is why so many teachers are fearful that they will be inaccurately labeled as ineffective." He responded was that Gates approach was primarily concerned with the bottom 5% of teachers, not the bottom 25% or so. I hope legislators, superintendents and personnel office managers across the nation all understand the distinction. If so, will they redo laws, procedures, and Race to the Top agreements accordingly?

I support the use of better teacher observations, especially through peer review, to remove bad teachers. My complaint is that the Gates Foundation has fed the misperception that the problem in our schools is bad teachers, and the predisposition that a certain percentage must be fired. They helped send the message that a certain preordained target must be culled, as has been done in some Silicon Valley businesses.

But, it makes no sense to encourage more high-stakes testing in order to have a redundant measure for identifying the bottom 5% when the predictable result is that the effectiveness of many (or most) of the other 95% will be compromised by it. And, I must say that the foundation should acknowledge its role in sending the message that the use of metrics to cull a significant number of teachers can turnaround low-performing schools. (Perhaps a first step would be to publicly lobby the Duncan administration to rewrite its School Improvement Grant regulations so that it does not continue to perpetuate the slander that teachers would be overcoming poverty if they were more caring and worked harder.)

Cantrell, at least personally, also seemed to understand why I believe that data-driven accountability is much more likely to damage fearful districts with a history of failure. I wrote, "Powerless districts, suffering from a culture of compliance, are the systems that need help, but they will react predictably to testing, circle the wagons, and impose primitive worksheet-driven instruction in order the cover their rear ends," and "that helps explain why reform has benefited some students while damaging others."

I was dismayed, however, by Cantrell's response to our concern. He seems to gamble that we can fight that systemic culture of fear by imposing more fear on systems. He proposes a simple, "straightforward" solution of holding principals and other administrators accountable for not abusing teachers by misapplying the evaluation systems. In other words, it would try to impose stress generated by disincentives on principals in order to deter them from stressing out teachers through the improper use of disincentives. Part of that solution, of course, would also use the same problematical testing regime to force administrators to stop misusing tests.

Cantrell concluded, "When school systems begin to use measures of effective teaching to assess the effectiveness of their own efforts, teachers will understand that the burden for improving teaching does not sit upon their shoulders alone."

And, that gets us back to the way that small differences wording matter. Personally, I would have felt much more confident if Cantrell had replied, "If school systems begin to use measures ... to assess the effectiveness of their own efforts." In that case, the Gates Foundation might agree that its rational model of carrots and sticks will often (usually?) fail. It might then help nail down concrete safeguards before imposing risky policies on teachers and the systems will understand that the burden for improving teaching does not sit upon the teachers' shoulders alone.

Such a recognition between what systems say and what they actually do could be a step toward an even more important realization about the dangers of experimenting on children. Six years ago, Bill Gates proclaimed that his foundation was engaged in a grand experiment. The next year, the MET study began. The problem is that even before the first preliminary findings were issued, and under the encouragement of the Gates Foundation and the federal government, laws were changed across the nation.

Even when the final report was issued, it included no evidence that value-added evaluations and other test-driven policies they promoted would not cause more harm than good. On the contrary, had the MET been issued first, I would hope that the Race to the Top, School Improvement Grants, and other innovations would have been designed in a very different manner. Now, Mr. Gates estimates it will take another decade to determine whether his experiments worked.

One question is why the MET methodology did not adequately address the role of poverty and why it ignored the all-important issue of sorting of students. Even so, if they had known that the MET would produce such underwhelming results, would persons of good will have advocated for so much high stakes testing?

Secondly, when laws were changed before their inherent dangers were contemplated, the Gates' controlled experimentation, designed to learn about basic research, was transformed into an unsupervised experiment on children and teachers in actual schools, conducted by persons with a full range of motives and competencies. My reading of a vast body of social science, as well as my personal experience in the inner city, leads me to conclude that high stakes testing will continue to produce more harm than good. Dr. Cantrell and others can disagree with my appraisal of the relative costs and benefits. But, I hope they will take more of an interest in estimating the trade-offs inherent in their policy preferences.

If there is one thing that I would hope would come out of this discussion, however, is it an appreciation of the question which I believe should always inform education policy. What parent, I keep asking, would agree to an experiment that was likely to benefit one of his children, but injure another?