Gates' Evidence versus Gates' Spin

12/16/2010 09:06 pm ET | Updated May 25, 2011

Advocates of test-driven accountability are cheering the new study by the Gates Foundation, "Learning and Teaching." The Gates PR machine seems oblivious, however, to the difference between their report's evidence and the spin that they are attaching. The use of those test-score growth models, in the hands of management, will result in "a flood of litigation like none that has ever been witnessed."

So, "reformers" should re-read the Gates report through the eyes of defense attorneys who have just been given even more ammunition for destroying efforts to terminate teachers based on the results of value-added growth models (VAMS). The use of test-score growth models in the hands of peer review committees will continue to withstand judicial scrutiny. Before districts try to use the results of VAMs, as interpreted by administrators alone, they should ask whether firing teachers based on the output of a statistical black box will result in scenes like the following:

Defense Attorney: "How can your statistical model determine whether the failure of my client to reach his test score growth target was due to his ineffectiveness, as opposed to the extraordinary challenges of teaching in a neighborhood high school with intense concentrations of generational poverty? How did you determine that my client's alleged ineffectiveness was not just a statistical artifact attributable to teaching in an ineffective school?"

Gates Expert Witness: "To do so with anything approaching certainty would mean watching the teacher work with many thousands of children in many thousands of classrooms."

Defense Attorney: "In other words, you can't."

Gates Expert: "It may well be easier to use certain teaching practices or to garner enthusiastic responses from students if one's students show up eager to learn."

Defense Attorney: "Your study says that test prep and rote instruction are ineffective in raising test scores. Did my client voluntarily choose those ineffective instructional strategies or were they imposed upon him by the school?"

Gates Expert: "We will be adding measures [and] retraining principals. These are all fairly low-cost ways to get started."

Defense Attorney: "My client teaches English. Did the report not document that 'state English test scores actually reflect the reading comprehension skills that the student brought to class, and they are insensitive to teacher effects?'."

Gates Expert: "Youth (in grades one through three) are improving their reading comprehension MORE during the months they are in school. However, beginning in the fourth grade, that is no longer true! The above pattern implies that schooling in itself may have little impact on standardized assessments after 3rd grade."

Defense Attorney: "My client teaches in a high-poverty high school. What does social science research say about the difficulty of raising scores with students who may have learned to decode, but not to 'read to learn?' "

Gates Expert: "A common interpretation is that families have more profound effects on children's reading and verbal performance than teachers."

Defense Attorney: "Did your study also conclude from student surveys that boredom was not conducive to test score growth? Is not rote instruction imposed by NCLB-type accountability a major cause of poor performance."

Gates Expert: "Boredom has a -0.215 correlation, which is second only to student misbehavior in predicting test scores."

Defense Attorney: "You are also recommending the doubling or tripling of these primitive standardized tests, are you not?"

Gates Expert: "The evidence... is like a colossal diving rod, pointing to the ground, saying, 'Dig here'... We have begun to dig."

Defense Attorney: "The highest correlations between your student survey results and increased test scores was the ability to control class behavior. Did my client establish the district's Code of Conduct, set the system's policies for assessing disciplinary consequences, or make the budgetary decision to not invest in alternative schools and socio-emotional interventions? Are not peer effects within the classroom, and throughout the building, also reasons for the lack of control in the classroom?"

Gates Expert: "Difference in covariates across districts may reduce the reliability of the value added estimates." (For that reason, this study did not address the effects of poverty, and the student sample of special education students was nearly 30 percent below the actual special education population of the districts studied.)

Defense Attorney: "The purpose of your research is to help a principal determine whether a current teacher has lower test growth than the average novice teacher, who could replace him. But should not the question be whether my client raises test scores as much as the average applicant who would be teaching the same type of students in the same building?"

Gates Expert: "Evidence is not perfect. Some decisions based on such data will turn out to be mistaken."

Defense Attorney: "Did your report not say that ANY information (emphasis in the original) should be used to determine tenure? What about false negatives that are a common product of VAMS?"

Gates Expert: "That issue was not addressed in the study."

Defense Attorney: "It is generally accepted that VAMs are less reliable when students are not randomly assigned to teachers?"

Gates Expert: "Our study aspires to resolve that question with a report next summer, but "Kane and Staiger (2008) could not reject that there was no bias and that the value-added measured approximated 'causal' teacher effects."

Defense Attorney: "Isn't it generally accepted VAMs are less reliable with classes of low-performing students?"

Gates Expert: "Evidence of bias at the end of this year may require scaling down the value-added measures themselves."

Defense Attorney: "Your study reported, 'Recall that the .36 difference in student achievement who TRULY ARE in the top versus bottom quartile of teacher effectiveness. The .21 standard deviation refers to the difference for those who are INFERRED TO BE on the top and bottom based on recent performance (which is an imperfect indicator).' The emphasis was yours. Tell me again how your statistical black box distinguishes between teachers who are truly ineffective as opposed to merely inferred to be so?"

Gates Expert: "It may be difficult for a non-specialist to judge how large that is."