EDUCATION
01/08/2013 03:00 pm ET Updated Jan 08, 2013

Gates Foundation MET Report: Teacher Observation Less Reliable Than Test Scores

NEW YORK -- A few years ago, Bill Gates decided to learn more about whether a teacher's effect on student learning could be measured. Three years, 3,000 teachers and about $50 million later, the Bill & Melinda Gates Foundation thinks it has the answers.

On Tuesday afternoon in Phoenix, the Gates Foundation released the third and final component of the Measuring Effective Teachers project, a gargantuan effort spearheaded by Harvard economist Thomas Kane.

"Effective teaching can be measured," the authors wrote in the latest installment. They're sure of it because they used a randomized experiment to figure it out. Reliable teacher evaluations, the paper claims, include "balanced" proportions of teacher observation, students' standardized test scores and student surveys. And for the first time, the randomized trial shows that teachers who perform well with one group of students, on average, perform at the same levels with different groups of kids.

The findings are important because of what they may contribute to the debate over changing how teacher evaluation is conducted, which has emerged as a hot-button political issue favored by the so-called education reform movement. Such changes are controversial because the idea of measuring a teacher's contributions to student learning contests the predominant labor management model in education: salaries and benefits that increase with experience, and layoffs based on reverse order of seniority. Measuring teachers promises administrators and policymakers that they can make hiring and firing decisions with an eye toward quality of instruction.

The federal government's Race to the Top competition had states vie for cash by doing such things as formalizing their teacher evaluations to include student test scores. Many states have signed on, and several districts have already implemented such systems.

One major point of pushback to using test scores in teacher evaluations has been the concern that such tools, known as value-added measures, reflect student demographics more than a teacher's ability, and penalize teachers who take on more difficult students. "We didn't know if in fact what we were seeing -- the differences we were seeing between teachers were about the teacher or about the students who were coming into their class," Steve Cantrell, Gates' chief education researcher, told The Huffington Post. "By randomly assigning students to teachers, we were able to show that teacher effectiveness is really about the teachers."

After randomly assigning classes to teachers in consecutive years, Cantrell said, MET found that "the performance of the teachers in the second year was almost identical to the year prior."

While the study shows some reliability in measuring teachers who either overperform or underachieve dramatically, the authors note that "the vast majority of teachers are in the middle of the scale, with small differences in scores producing large changes in percentile rankings."

Moreover, the report found that overall, classroom observations -- the way most teachers around the country have been evaluated for decades -- are highly unreliable on their own. "It is clear from these findings and the MET project's earlier study of classroom observation instruments that classroom observations are not discerning large absolute differences in practice," the authors wrote. They found that counting observations for half of the total score is "counterproductive."

"The way that most teachers have been evaluated forever is completely unreliable," said Tim Daly, who leads TNTP, a consulting group that places new teachers and helps districts implement evaluations. "Before, what we were weighing is, 'Should we move in the direction of using student learning or is it too precarious?' They show we have no choice but to change -- the way they're doing it is totally inadequate."

The report also notes that teacher observation becomes more reliable when more than one judge watches a class. The lesson for districts is that staffers other than principals need to be trained in teacher observations.

Randi Weingarten, the president of the American Federation of Teachers union, has been critical of the MET Project's teacher quality research in the past. But in advance of Tuesday's release, she released a statement lauding the effort. "The MET findings reinforce the importance of evaluating teachers based on a balance of multiple measures of teaching effectiveness, in contrast to the limitations of focusing on student test scores, value-added scores or any other single measure," Weingarten said.

And as for Gates, while the foundation is closing a major chapter on teacher quality research, Cantrell says the next step is culling and tagging a massive video library of teaching practices to be studied by researchers and used by education schools.

Gates also intends to focus on helping teachers improve. "Now we know that we can identify and take up multiple perspectives on what great teaching looks like," said Vicki Phillips, who heads the Gates Foundation's college readiness efforts. "Now, how do we help make sure there's more and more of that?"

CONVERSATIONS