I am a big fan of the concept of the What Works Clearinghouse (WWC), though I have concerns about various WWC policies and practices. For example, I have written previously with concerns about WWC’s acceptance of measures made by researchers and developers and WWC’s policy against weighting effect sizes by sample sizes when computing mean effect sizes for various programs. However, there is another WWC policy that is a problem in itself, but this problem is made more serious in light of recent Department of Education guidance on the ESSA evidence standards.
The WWC Standards and Procedures 3.0 manual sets rather tough standards for programs to be rated as having positive effects in studies meeting standards “without reservations” (essentially, randomized experiments) and “with reservations” (essentially, quasi-experiments, or matched studies). However, the WWC defines a special category of programs for which all caution is thrown to the winds. Such studies are called “substantively important,” and are treated as though they met WWC standards. Quoting from Standards and Procedures 3.0: “For the WWC, effect sizes of +0.25 standard deviations or larger are considered to be substantively important…even if they might not reach statistical significance…” The “effect size greater than +0.25” loophole (the >0.25 loophole, for short) is problematic in itself, but could lead to catastrophe for the ESSA evidence standards that now identify programs that meet “strong,” “moderate,” and “promising” levels of evidence.
The problem with the >0.25 loophole is that studies that meet the loophole criterion without meeting the usual methodological criteria are usually very, very, very bad studies, usually with a strong positive bias. These studies are often very small (far too small for statistical significance). They usually use measures made by the developers or researchers, or ones that are excessively aligned with the content of the experimental group but not the control group.
One example of the >0.25 loophole is a Brady (1990) study accepted as “substantively important” by the WWC. In it, 12 students in rural Alaska were randomly assigned to Reciprocal Teaching or to a control group. The literacy treatment was built around specific science content, but the control group never saw this content. Yet one of the outcome measures, focused on this content, was made by Mr. Brady, and two others were scored by him. Mr. Brady also happened to be the teacher of the experimental group. The effect size in this awful study was an extraordinary +0.65, though outcomes in other studies assessed on measures more fair to the control group were much smaller.
Because the WWC does not weight studies by sample size, this tiny, terrible study had the same impact in the WWC summary as studies with hundreds or thousands of students.
For the ESSA evidence standards, the >0.25 loophole can lead to serious errors. A single study meeting standards makes a program qualify for one of the top-three ESSA standards (strong, moderate, or promising). There can be financial consequences for schools using programs in the top three categories (for example, use of such programs is required for schools seeking school improvement grants). Yet a single study meeting the standards, including the awful 12-student study of Reciprocal Teaching, qualify the program for the ESSA category, no matter what is found in all other studies (unless there are qualifying studies with negative impacts). Also, the loophole works in the negative direction too, so a small, terrible study could find an effect size less than -0.25, and no amount or quality of positive findings could make that program meet WWC standards.
The >0.25 loophole is bad enough for research that already exists, but for the future, the problem is even more serious. Program developers or commercial publishers could do many small studies of their programs or could commission studies using developer-made measures. Once a single study exceeds an effect size of +0.25, the program may be considered validated forever.
To add to the problem, in recent guidance from the U. S. Department of Education, a definition of the ESSA “promising” definition specifically mentions the idea that programs can meet the promising definition if they can report statistically significant or substantively important outcomes. The guidance refers to the WWC standards for the “strong” and “moderate” categories, and the WWC standards themselves allow for the >0.25 loophole (even though this is not mentioned or implied by the law itself, which consistently requires statistically significant outcomes, not “substantially important”). In other words, programs that meet WWC standards for “positive” or “potentially positive” based on substantively important evidence alone explicitly do not meet ESSA standards, which require statistical significance. Yet the recent regulations do not recognize this problem.
The >0.25 loophole began, I’d assume, when the WWC was young and few programs met its standards. It was jokingly called the “Nothing Works Clearinghouse.” The loophole was probably added to increase the numbers of included programs. This loophole produced misleading conclusions, but since the WWC did not matter very much to educators, there were few complaints. Today, however, the WWC has greater importance because of the ESSA evidence standards.
Bad loopholes make bad laws. It is time to close this loophole, and eliminate the category of “substantively important.”
This blog is sponsored by the Laura and John Arnold Foundation