Six months ago, the Los Angeles Times published a series of articles on teacher effectiveness that relied upon economist Richard Buddin's study of the impact of elementary school teachers on their student test scores. Using seven years of data from the Los Angeles Unified School District, Buddin's analysis looked at how much students' test scores in math and English language arts improved while they were enrolled in particular teachers' classrooms. That change is sometimes referred to as the "value added" by the teacher. The Times decided that Buddin's study was sufficiently valid and reliable that it published a website identifying about 6,000 individual teachers by name on a five-point scale, from "least effective" to "most effective."
At the time, many social scientists raised concerns about the methodology adopted by the Times. They worried, among other things, that the newspaper did not account for all of the factors that lead to higher or lower test scores and that this failure meant that the Times' teacher rankings could not be trusted. But we didn't know for sure, as no one had conducted an independent analysis of the same data used by the Times. Until now. The National Education Policy Center at the University of Colorado has just released a reanalysis of the data. And its conclusions stand in stark contrast to those drawn in the Times reporting.
The goal of the Colorado researchers was to see if they could replicate Buddin's analysis and to test some of the assumptions behind it. They concluded that both the Buddin study and the uses to which the Times had put it were seriously flawed. Yet, on Feb. 7, 2011, the Times published an article written by Jason Felch, one of the authors of the original story with the headline, "Conclusions on Teachers Confirmed." The headline is consistent with Felch's story. Both mischaracterize the facts.
The assertion that the Colorado report confirms that the Times' analysis is based on the finding in both studies that teacher effectiveness varies across teachers. That is, students of some teachers showed more improvement on standardized tests than students of other teachers. The Times' claim is akin to finding agreement between two medical diagnostic procedures because both suggest that some people are healthier than others, even though the diagnostics disagree on who's healthy and who isn't. The Colorado researchers concluded that the Los Angeles Times' method does not reliably identify the teachers who add value to student test scores.
Here are a few of the facts reported in the study but studiously ignored by the Times:
First, the University of Colorado researchers conclude that Buddin's analysis must have left out some important variables that explain the relationship between teachers and their students' test scores. They reach this conclusion because with the variables and methods Buddin used, student test scores are influenced by teachers the students have yet to meet. A statistical model that predicts such impossible results is generally suspect. It gives strong indication that other important variables correlated with student achievement have been omitted and that these variables also influence how students are assigned to teachers in schools.
Second, the Colorado researchers conclude that when one does include those variables (like differences between schools), the rankings the Times used change dramatically, such that about half of the teachers would be assigned a different "effectiveness" rating.
Third, the Colorado researchers find that ratings are subject to significant random error, such that about half of the teachers in the full database cannot be distinguished from "average."
Finally, they contradict the conclusion by Buddin that teacher experience or credentialing do not matter to student achievement, finding instead that inexperienced teachers have significantly less positive impact on student test scores, particularly in reading.
Evaluating teacher effectiveness is a complicated matter, in which changes in test scores in math and English Language Arts may play some appropriate role. There are significant areas of disagreement among experts, policymakers and practitioners, and many opinions about how to proceed. Accordingly, we're all entitled to our own opinion. But not our own facts.