08/14/2012 10:46 am ET Updated Oct 14, 2012

Beware the Gremlins in "Teacher Value Added" Models!

One of the seductive new tools available to policy makers these days is the statistical device known affectionately as the "teacher value added model." What makes this device so seductive is its promise to identify exactly which teachers are performing well in the classroom and which ones are performing poorly.

Here's how the model is supposed to work: Demographic information about school children -- such as each student's age, gender, race, and socioeconomic background -- is fed into a software program. Similar demographic information is compiled about each classroom teacher. At the beginning of a school term each student takes a standardized test on a particular subject -- say American history -- and the results are saved. At the end of the school term the same set of students takes another standardized American history test and this score is compared to the student's initial score. The difference between these two scores would represent -- at least in theory -- each student's intellectual growth in the field of American history.

Now here's the seductive part: By linking each student's score back to the name of his or her classroom teacher, and by "controlling" for all the demographic variables that were collected about each student and each teacher, the software can be asked to paint a statistical comparison. If, after a semester of learning American history, the students in Teacher A's classroom performed better than the students in Teacher B's classroom, despite the fact that these students began the class with identical scores on the first test and were otherwise demographically similar, may we not conclude that it was Teacher A's performance that made the difference? In fact, couldn't we also gather statistical information about each teacher's alma mater so that we can reach positive and negative inferences about teacher preparation programs?

Our new "teacher value added model" sounds like a godsend for education policy makers until we ponder some real world situations. Following are three examples that illustrate just a few of the many dangers:

• Assume the children in Community B -- the region where Teacher B's school is located -- just happened to be suffering from a rather high incidence of influenza (or take your pick of childhood illnesses, or perhaps the traumatic death of a local student, or a locally devastating tornado, or a hurricane, a flood, a snow storm ... etc.) that just happened to coincide with the second test date. Teacher B's students may have performed less well than Teacher A's students not because of anything Teacher B did (much less Teacher B's alma mater) but because of one or more powerful external variables that weren't included in the statistical model.

• Assume the statistical model in a particular jurisdiction has been designed to determine student improvement using a "percentage straight line" assumption, whereby students and teachers are compared with each other based solely on the percentage change in student scores. A student whose grade improves from 50 to 55 is therefore treated as showing greater improvement than a student whose grade improves from 90 to 95, since the first student's score improved by 10% while the second student's score improved by less than 6%. Yet anyone who has taught history -- or coached basketball for that matter -- knows that the second student's improvement may have been much more remarkable. The basketball analogy may be easier to illustrate. Imagine you have been asked to improve the free throw percentage of a basketball beginner whose free throw success rate is 50%. You may find this task much easier than trying to improve the percentage of a veteran whose success rate is already 90%. In fact, as free throw percentages approach 90%, even the finest players' skill levels approach an achievement plateau.

• Or assume that an administrator in Teacher B's school has entered the students' demographic information incorrectly. No one at the state level will catch those mistakes, because bureaucrats at the state level have no independent knowledge of each student's age, gender, ethnicity, or socio-economic background. Since the database is treated as containing highly confidential information -- as well it should -- no one ever verifies the accuracy of the underlying data.

Omitted variables, straight line assumptions, and unchecked errors are only three of the major problems that remain unresolved in teacher value added models. Yet a number of seriously flawed prototypes are already being used to evaluate schools, individual teachers, and even teacher preparation programs -- sometimes with career-ending consequences.

We should beware the gremlins.