By Erin Grogan and Cassandra Coddington
This post was originally published on the TNTP Blog.
Measuring teacher effectiveness is complicated. That’s the main headline from two new studies out this week that are generating buzz on all sides of the spectrum. That’s not surprising news—all of us who are grappling with this work already know it’s messy—but it’s worth looking at these studies in the bigger picture of the shift toward more robust evaluation systems, especially because some of the findings are easily misunderstood.
The two studies—one on classroom observations and the other on value-added data—both raise questions about the reliability of given measures of teacher effectiveness. Raising questions is great. We won’t improve the work in the field without researchers putting the evidence to the test.
But some of the news accounts of the papers has been a bit perfunctory. In a few cases, they’ve been downright misleading. That’s par for the course with complex papers and reporters on short deadlines. Here’s what you need to know.
Over at the Brookings Institution, the authors look at classroom observations, and while their findings aren’t shocking, they are troubling. According to their analysis of thousands of observation scores, observations are subject to bias. Observers tend to give the best ratings to teachers whose incoming students are high performing, while those teachers working with academically struggling students receive lower scores.
In conversations on evaluation, classroom observations are commonly viewed as one of the more reliable measures of the bunch. For that reason, they’re often given more weight than any other measure in multiple-measure systems—none of the districts studied by the authors, for example, weighted observations at less than 40 percent of a teacher’s overall rating. But this study is a reminder of what we’ve seen before: Observations need to be conducted by trained, normed observers who can give teachers actionable feedback to apply in their classrooms. If not, they’re at risk of disadvantaging teachers who work with our highest need learners.
So where does this leave us? One big recommendation from the Brookings report is that districts should adjust teachers’ observation scores based on the demographics of the students they instruct—but that feels like a dangerous throwback to the days of lower expectations for teachers who work with struggling students. More on point is their assertion that much of the room for improvement in evaluation systems lies in classroom observations, so it’s critical that we pay attention to the places that are getting observation right. The authors point to the value of multiple observation opportunities each year, recommending two to three per teacher, with at least one by a trained observer from outside the teacher’s school. This is doable. Washington, DC has been doing it for all teachers since 2009. In-depth training for all observers is also critical, and so are clear rubrics that focus on student actions.
This is challenging work, but the big message here is clear: If evaluation systems are to meet their primary goal of ensuring greater equity in students’ access to effective teachers, they have to be fair. We have to figure out how to ensure that the most ubiquitous, heavily-weighted measure of teaching is not in fact moving us in the opposite direction.
The other study out this week, from Morgan Polikoff of USC and Andrew Porter of Penn, goes after the other side of the evaluation coin—value-added estimates. According to the authors’ survey of a subset of teachers from the Gates Foundation’s Measures of Effective Teaching (MET) study, value-added scores are not correlated with “instructional alignment” (that is, the relationship between what teachers do in the classroom and the content of state standards and assessments).
The authors conclude that the lack of correlation suggests that value-added models aren’t accurately capturing effective teaching, as defined by the standards to which we’re holding teachers. But their conclusions are easily misunderstood, and they should be considered with some caution.
Here’s why: For one thing, the authors surveyed just 327 teachers out of nearly 3,000 who participated in the MET study. That’s a small slice of the Gates study. But the relationships between classroom observations and value-added for this subset of teachers doesn’t reflect the relationships found in the MET study on the whole. Within the subset of teachers considered by Polikoff and Porter, classroom observation ratings did not correlate to value-added scores. But the full-scale MET study showed that on the whole, there is in fact a correlation between those two measures. In short, it’s not surprising that in a subset of the MET group that already showed no relationship between classroom observations and value added scores, the authors would find no relationship between their own measure of instructional alignment and value-added. That’s a very important facet of the study that has gone largely unmentioned in media write-ups so far.
Nonetheless, the question Polikoff and Porter ask is worthwhile: How well do we understand which teaching techniques lead to high value-added scores? And more importantly, can we help teachers use value-added scores as a roadmap to improve their instruction and get better results for students?
There is still a ton to figure out here, and the connection between value-added and other measures of teacher performance requires more study—with larger sample sizes. But that doesn’t mean value-added isn’t just one powerful signal among many.
Both reports serve as a reminder that no measure of teacher effectiveness can be taken for granted. Ultimately, that’s the point of multiple-measure evaluation tools: No single measure can or should tell us everything about a teacher’s performance. The teacher who doesn’t organize her classroom very well could get a lower observation rating, but still be excelling when it comes to moving her students’ learning forward. A teacher whose classroom appears warm and engaged might be building great relationships with her students but not challenging them sufficiently.
Getting evaluation right is incredibly complicated. But we can’t throw up our hands and give up because it’s hard. Instead, it’s important to remember that prioritizing teacher effectiveness requires a massive shift in perspective and practice—one that demands not just smarter policies, but strong training, effective oversight and resolve.
Erin Grogan is Partner, Research & Evaluation and Cassandra Coddington is Analyst, Partnerships & Research at TNTP.