02/23/2012 02:37 pm ET Updated Apr 24, 2012

Heeding the Evidence on Teacher Quality

If Secretary of Education Arne Duncan had had any idea what the $45 million Gates Foundation's Measuring Teaching Effectiveness (MET) project would find (and overlook), would he have gambled so heavily on test-driven policies in the Race to the Top and his other "teacher quality" reforms?

Over two years ago, Gates' researchers promised to tackle the all-important issue of whether value-added models could fairly evaluate the effectiveness of teachers with more challenging classes. Their latest report, "Gathering Feedback for Teaching," promises that their next experiment will provide a glimpse at the most important policy questions. In the meantime, not-ready-for-prime time evaluations are doing their damage in Tennessee, Florida, Maryland, Houston, Washington D.C., and elsewhere.

The next step for the MET is studying one small part of a problem with data-driven evaluations. Volunteers will teach classes where students were randomly selected from within their school. They will be evaluated by impartial observers and the best possible metrics, in environments where all stakeholders are on their best behavior. No effort has been announced to address the findings of "Getting Teacher Evaluations Right," by Linda Darling Hammond, Audrey Amrein-Beardsley, Edward Haertel, and Jesse Rothstein as to whether statistical models can be made fair for teachers with large concentrations of English-language learners and/or special education students.

The Met continues to duck the question of whether effective teachers in dysfunctional schools can be expected to meet the same growth targets as their equally effective colleagues in functional schools. The MET's analysis was based on a sample of students that was only 56 percent low income, with only 8 percent being on special education IEPs and only 13 percent being English-language learners. High school students were excluded. Presumably, the MET will continue to call for better assessments to be developed someday-over-the-rainbow, and ask districts to not impose even more counter-productive test prep for primitive bubble-in tests.

On the other hand, what would Secretary Duncan have done three years ago had he known that the MET findings would, "suggest that the classroom practices of the majority of teachers, as many as 85 percent, are remarkably similar?" What if he had known that the research would determine that the prime weaknesses of the majority of teachers are difficulties in recognizing student perspectives, asking better questions, and other issues related to communication? Those, of course, are problems to be addressed by high-quality professional development, not coercion. Surely, Duncan would not believe that the path towards building better student-teacher relationships, teaching analysis and problem-solving, and making instruction more engaging is more likely to occur in the test-driven environment he helped create.

The Gates' "Gathering Feedback for Teaching," stresses the need for multiple perspectives from diverse adults and students. I would hope that Duncan could see the clear contrast between its findings and the ideologues who have ignored teachers and social scientists, and who are already implementing their parodies of "multiple measures." When value-added models, performance rubrics, and the training of evaluators are all dominated by true believers, as in D.C. and other places, the result is multiple hoops to be jumped through.

For instance, the engineers of D.C.'s IMPACT system demand allegiance to the belief that, systemically, classroom instruction can overcome intense concentrations of poverty and trauma. But, only 71 of the 663 teachers rated "highly effective" last year teach at 41 schools in the low income Wards 7 and 8. More than 130 "highly effective" teachers work at 10 schools in the affluent Ward 3. Faced with these hard facts, the D.C. accountability hawks circled the wagons.

This year, up to 60 percent of teachers in some Baltimore schools have been rated as "Unsatisfactory." Their administration is also sticking by its guns, and refuses to consider whether poverty and a long history of administrative policies are reasons why many good teachers do not consistently produce satisfactory results in the toughest schools.

I would hope, however, that Duncan would look to the problems with his RttT in Tennessee and adjust before they produce "a perfect storm" of nonstop testing and unfairness. For instance, this fall in affluent Shelby County the percentage of its teachers who have been rated unsatisfactory is about 60 percent lower than that of Memphis. Moreover, Shelby County's value-added model rated more than four times as many of its teachers in the bottom category in comparison to their rigorous new observations by humans. In Memphis this fall, the statistical model has rated about 46 about of teachers as "below expectations."

I would hope that Secretary Duncan would have thought differently had he read Linda Darling Hammond et. al. They describe an 8th grade teacher who had low test score growth, so the principal had him swap classes with a 6th grade teacher who had high growth. Now, the formerly effective teacher's scores are flat and the previously underperforming teacher gets the biggest increase in the school. The same thing happened with teachers who had a noticeable drops in their value-added ratings when assigned large numbers of English-language learners who were being transitioned into classrooms. One such teacher, who was dismissed, was a former "Teacher of the Year."

Overall, "Getting Teacher Evaluation Right" found:

1. Teachers teaching in grades in which English-language learners (ELLs) are transitioned into mainstreamed classrooms are the least likely to show "added value."

2. Teachers teaching larger numbers of special education students in mainstreamed classrooms are also found to have lower value-added scores, on average.

3. Teachers teaching gifted students show little value-added because their students are already near the top of the test score range.

4. Ratings change considerably when teachers change grade levels, often from "ineffective" to "effective" and vice versa.

Duncan can continue with his soundbites and claim that the messes he helped create are just growing pains. But these unforced turnovers will have lives of their own. Who knows what good would have come from the teacher quality research of the Gates Foundation and others if Duncan (and Bill Gates) had looked at evidence before leaping to risky conclusions.