09/19/2013 09:17 am ET Updated Nov 19, 2013

How To Do Lots of High-Quality Educational Evaluations for Peanuts


One of the greatest impediments to evidence-based reform in education is the difficulty and expense of doing large-scale randomized experiments. These are essential for several reasons. Large-scale experiments are important because when treatments are at the level of classrooms and schools, you need a lot of classrooms and schools to avoid having just a few unusual sites influence the results too much. Also, research finds that small-scale studies produce inflated effects, particularly because researchers can create special conditions on a small scale that they could not sustain on a large scale. Large experiments simulate the situations likely to exist when programs are used in the real world, not under optimal, hothouse conditions.

Randomized experiments are important because when schools or classrooms are assigned at random to use a program or continue to serve as a control group (doing what they were doing before), we can be confident that there are no special factors that favor the experimental group other than the program itself. Non-randomized, matched studies that are well designed can also be valid, but they have more potential for bias.

Most quantitative researchers would agree that large-scale randomized studies are preferable, but in the real world such studies done well can cost a lot - more than $10 million per study in some cases. That may be chump change in medicine, but in education, we can't afford many such studies.

How could we do high-quality studies far less expensively? The answer is to attach studies to funding being offered by the U. S. Department of Education. That is, when the Department is about to hand out a lot of money, it should commission large-scale randomized studies to evaluate specific ways of spending these resources.

To understand what I'm proposing, consider what the Department might have done when No Child Left Behind (NCLB) required that low-performing schools offer after-school tutoring to low-achieving students, in its Supplemental Educational Services (SES) initiative. The Department might have invited proposals from established providers of tutoring services, which would have had to participate in research as a condition of special funding. It might have then chosen a set of after-school tutoring providers (I'm making these up):

Program A provides structured one-to-three tutoring.
Program B rotates children through computer, small-group, and individualized activities.
Program C provides computer-assisted instruction.
Program D offers small-group tutoring in which children who make progress get treats or free time for sports.

Now imagine that for each program, 60 qualifying schools were recruited for the studies. For the Program A study, half get Program A and half get the same funding to do whatever they wanted to do (except Programs A to D) consistent with the national rules. The assignment to Program A or its control group would be at random. Program B, C, and D would be evaluated in the same way.

Here's why such a study would have cost peanuts. The costs of offering the program to the schools that got Programs A, B, C, or D would have been covered by Title I, as was true of all NCLB after-school tutoring programs. Further, state achievement tests, routinely collected in every state in grades 3-8, could have been obtained at pre- and posttest at little cost for data collection. The only costs would be for data management, analysis, and reporting, plus some amount of questionnaires and/or observations to see what was actually happening in the participating classes. Peanuts.

Any time money is going out from the Department, such designs might be used. For example, in recent years a lot of money has been spent on School Improvement Grants (SIG), now called School Turnaround Grants. Imagine that various whole-school reform models were invited to work with many of the very low-achieving schools that received SIG grants. Schools would have been assigned at random to use Programs A, B, C, or D, or to control groups able to use the same amount of money however they wished. Again, various models could be evaluated. The costs of implementing the programs would have been provided by SIG (which was going to spend this money anyway), and the cost of data collection would have been minimal because test scores and graduation rates already being collected could have been used. Again, the costs of this evaluation would have just involved data management, analysis, and reporting. More peanuts.

Note that in such evaluations, no school gets nothing. All of them get the money. Only schools that want to sign up for the studies would be randomly assigned. Modest incentives might be necessary to get schools to participate in the research, such as a few competitive preference points in competitive proposals (such as SIG) or somewhat higher funding levels in formula grants (such as after-school tutoring). Schools that do not want to participate in the research could do what they would have done if the study had never existed.

Against the minimal cost, however, weigh the potential gain. Each U. S. Department of Education program that lends itself to this type of evaluation would produce information about how the funds could best be used. Over time, not only would we learn about specific effective programs, we'd also learn about types of programs most likely to work. Also, additional promising programs could enter into the evaluation over time, ultimately expanding the range of options for schools. Funding from the Institute of Education Sciences (IES) or Investing in Innovation (i3) might be used specifically to build up the set of promising programs for use in such federal programs and evaluations.

Ideally, the Department might continuously commission evaluations of this kind alongside any funding it provides for schools to adopt programs capable of being evaluated on existing measures. Perhaps the Department might designate an evaluation expert to sit in on early meetings to identify such opportunities, or perhaps it might fund an external "Center for Cost-Effective Evaluations in Education."

There are many circumstances in which expensive evaluations of promising programs still need to be done, but constantly doing inexpensive studies where they are feasible might free up resources to do necessarily expensive research and development. It might also accelerate the trend toward evidence-based reform by adding a lot of evidence quickly to support (or not) programs of immediate importance to educators, to government, and to taxpayers.

Because of the central role government plays in education, and because government routinely collects a lot of data on student achievement, we could be learning a lot more from government initiatives and innovative programs. For just a little more investment, we could learn a lot about how to make the billions we spend on providing educational services a lot more effective. Very important peanuts, if you ask me.

Subscribe to the Politics email.
How will Trump’s administration impact you?