01/09/2014 09:00 am ET Updated Dec 06, 2017

Success in Evidence-Based Reform: The Importance of Failure


As always, Winston Churchill said it best: "Success consists of going from failure to failure without loss of enthusiasm." There is a similar Japanese saying: "Success is being knocked down seven times and getting up eight."

These quotes came to my mind while I was reading a recently released report from the Aspen Institute, "Leveraging Learning: The Evolving Role of Federal Policy in Education Research." The report is a useful scan of the education research horizon, intended as background for the upcoming reauthorization of the Education Sciences Reform Act (ESRA), the legislation that authorizes the Institute of Education Sciences (IES). However, the report also contains brief chapters by various policy observers (including myself), focusing on how research might better inform and improve practice and outcomes in education. A common point of departure in some of these was that while randomized experiments (RCTs) emphasized for the past decade by IES and, more recently, Investing in Innovation (i3), are all well and good, the IES experience is that most randomized experiments evaluating educational programs find few achievement effects. Several cited testimony by Jon Baron that "of the 90 interventions evaluated in randomized trials by IES, 90% were found to have weak or no positive effects." As a response, the chapter authors proposed various ways in which IES could add to its portfolio more research that is not RCTs.

Within the next year or two, the problem Baron was reporting will take on a great deal of importance. The results of the first cohort of Investing in Innovation grants will start being released. At the same time, additional IES reports will appear, and the Education Endowment Foundation (EEF) in the U.K., much like i3, will also begin to report outcomes. All four of the first cohort of scale-up programs funded by i3 (our Success for All program, Reading Recovery, Teach for America, and KIPP) have had positive first-year findings in i3 or similar evaluations recently, but this is not surprising, as they had to pass a high evidence bar to get scale-up funding in the first place. The much larger number of validation and development projects were not required to have such strong research bases, and many of these are sure to show no effects on achievement. Kevan Collins, Director of the EEF, has always openly said that he'd be delighted if 10% of the studies EEF has funded find positive impacts. Perhaps in the country of Churchill, Collins is better placed to warn his countrymen that success in evidence-based reform is going to require some blood, sweat, toil, and tears.

In the U.S., I'm not sure if policymakers or educators are ready for what is about to happen. If most i3 validation and development projects fail to produce significant positive effects in rigorous, well-conducted evaluations, will opinion leaders celebrate the programs that do show good outcomes and value the knowledge gained from the whole process, including knowledge about what almost worked and what to avoid doing next time? Will they support additional funding for projects that take these learnings into account? Or will they declare the i3 program a failure and move on to the next set of untried policies and practices?

I very much hope that i3 or successor programs will stay the course, insisting on randomized experiments and building on what has been learned. Even if only 10% of validation and development projects report clear, positive achievement outcomes and capacity to go to scale, there will be many reasons to celebrate and stay on track:

1. There are currently 112 i3 validation and development projects (plus 5 scale-ups). If just 10% of these were found to be effective and scalable, that would be 11 new programs. Adding this to the scale-up programs and other programs already positively reviewed in the What Works Clearinghouse, this would be a substantial base of proven programs. In medicine, the great majority of treatments initially evaluated are found not to be effective, yet the medical system of innovation works because the few proven approaches make such a big difference. Failure is fine if it leads to success.

2. Among the programs that do not produce statistically significant positive outcomes on achievement measures, there are sure to be many that show promise but do not quite reach significance. For example, any program whose evaluation shows a student-level positive effect size of, say, +0.15 or more should be worthy of additional investment to refine and improve its procedures and its evaluation to reach a higher standard, rather than being considered a bust.

3. The i3 process is producing a great deal of information about what works and what does not, what gets implemented and what does not, and the match between schools' needs and programs' approaches. These learnings should contribute to improvements in new programs, to revisions of existing programs, and to the policies applied by i3, IES, and other funders.

4. As the findings of the i3 and IES evaluations become known, program developers, grant reviewers, and government leaders should get smarter about what kinds of approaches are likely to work and to go to scale. Because of this, one might imagine that even if only 10% of validation and development programs succeed in RCTs today, higher and higher proportions will succeed in such studies in the future.

Evidence-based reform, in which promising scalable approaches are ultimately evaluated in RCTs or similarly rigorous evaluations, is the best way to create substantial and lasting improvements in student achievement. Failures of individual evaluations or projects are an expected, even valued part of the process of research-based reform. We need to be prepared for them, and to celebrate the successes and the learnings along the way.

As Churchill also said, "Success is not final, failure is not fatal; It is the courage to continue that counts."