05/13/2013 05:35 pm ET Updated Jul 13, 2013

Fool's Gold

There is a rumor circulating among the international investment community that some of the gold bars in some storage locations around the world are actually not 100% gold. What appear to be gold bars on the outside, skeptics say, are actually chock full of much cheaper tungsten. What a fraud that would be if that were true!

There is a lot of talk about gold in the educational community as well, mostly about the need for "gold standard" evaluations to support judgments about the effectiveness of a program or practice. Policymakers particularly are concerned that educators make improvements based on such research.

In response, the educational community is looking in every nook and cranny to identify gold standard evaluations in order to justify their judgments about specific innovations. If you do not have such evaluations, it's unlikely that any school superintendent will give you the time of day.

All of this seemingly very laudable behavior regarding quality research and evaluation has me wondering, however, if there isn't a good bit of tungsten hidden in what is currently considered gold. As a long-time consumer of what passes for gold standard research and evaluation, I am skeptical that what everyone pronounces as gold may indeed be tungsten.

I see the failings in research fall into three categories: design, execution, and replication. In the first category are those evaluations that are flawed even before the first bit of data is collected. Here, we can find such tungsten as inadequate sampling, non-random assignment, and failure to control for important contextual variables.

In the second category are execution problems, such as sample attrition, data collection snafus, and faulty data analysis procedures. In the third category, the one I see most often violated, is the failure to replicate the research in different settings and contexts and conducted by different researchers. It is not uncommon for policymakers and educational leaders to make fairly important decisions based on a single research study.

How many research or evaluation reports have you read in which the researchers delineate, early on in the report, the shortfalls in their research design or execution? Perhaps, for example, random assignment was not possible or the sample size is a bit small. No problem there, I would say. But difficulties do arise when the researchers report their findings and conclusions without a mention of their earlier stated caveats. This artful if disingenuous glissade occurs so subtly that one can easily miss it.

So, a lot of what passes for high quality research and evaluation is found upon closer inspection to be much closer to tungsten than gold. The unsuspecting consumer -- often a school superintendent not highly skilled in conducting education research -- is fooled into believing that she has all of the research support she needs for moving ahead with an adoption of a new program or practice.

Education, although a most flagrant violator, is not the only one. Most recently, for example, a famous, now infamous, study conducted in 2010 by two Harvard economists dealing with sovereign debt levels and economic growth was found to have a bit of "tungsten" as well, in this case data analysis errors. To their credit, it was during replications of the original research that the errors were found. Nevertheless, in disciplines beyond education researchers often reach beyond their grasp.

David Colander, in Creating Humble Economists: A Code of Ethics for Economists, observes that economists have a tendency to convey more scientific certainty in their policy positions than the theory and evidence objectively would allow. Too many economists, he argues, are willing to make seemingly definitive scientific statements about policy based on models that they know, or should know, are highly imperfect. My guess is that education research has this tendency as well.

Statistics guru Nate Silver, in The Signal and the Noise, views the problem from a different angle:

One of the pervasive risks that we face in the information age is that even if the amount of knowledge in the world is increasing, the gap between what we know and what we think we know may be widening. This syndrome is often associated with very precise-seeming predictions that are not at all accurate. Moody's carried out their calculations to the second decimal place -- they were utterly divorced from reality.

The reality for educators is the perennial struggle to identify programs and practices that work as indicated in research reports. That's the ultimate gold standard.