04/09/2014 03:38 pm ET | Updated Jun 09, 2014

Lurking Truth in Recent Studies

Jamie Grill via Getty Images

Last month a study of siblings found that breastfeeding conferred no health advantages, while a second study declared older paternal age to be associated with psychiatric problems in children. A third study found no link between saturated fats and heart disease. It was a month of unexpected, and sometimes unsettling, science.

In each case the claim was based on an observational study, or, as my father calls them, "quack reports." But the problem is not the report; it's the interpretation. Even experts struggle to decode observational findings, which is why recommendations about diet and lifestyle, for which randomized trials are rare, constantly change. In one famous example health authorities for years recommended hormone replacement for millions of women based on observed associations between taking the pills and fewer heart problems. It turns out hormone replacement therapy increases heart attacks

What went wrong? The experts initially failed to appreciate that women taking hormone pills in the studies were more affluent and more health-conscious and had better access to doctors, all attributes that make heart attacks less likely. In other words, the lower heart-attack rate wasn't because of the pills but because of the women. When large trials randomly assigned women to take either hormone or placebo pills, the differences in groups disappeared, and the true pill effect emerged: more heart attacks. 

The research term for group differences like this is "lurking variables," and they live up to their name. Lurking variables create and distort relationships, and they can be difficult to see. They are responsible for the overwhelming majority of errors in the interpretation of observational data. Consider a famous classroom example: Studies show that larger shoe size is strongly associated with better reading comprehension. Why? Hint: It's not about the feet. Older kids have bigger feet than younger kids, and they tend to read better. Once the lurking variable (age) is revealed, it is obvious that there is no significant relationship with shoe size. Circus clowns are not, after all, innately talented readers.

But in a world of fast and frightening headlines, the problem of lurking variables is epidemic, even years after the hormone-replacement debacle. Last month's headline "Children of older fathers at risk of low IQ, autism and suicide" was based on a study from Sweden that found associations between paternal age over 45 and psychiatric problems like autism and attention-deficit hyperactivity disorder. What news reports didn't mention is that paternal age under 20 was also associated with psychiatric problems, including suicide attempts and substance abuse. These findings cast serious doubt on older age (and the ill-defined "genetic mutations" cited in most articles) as an explanation. Instead, they implicate lurking variables that come with paternal age extremes, like education, social patterns, and income.

So when should we trust findings in an observational study? Rarely. Rigorous analyses of scientific literature show, unfortunately, that most associations and effects reported in observational studies are probably false. But there is a notable exception: When a study finds no association, it's usually right.  

Take the breastfeeding study, which compared the health of children who breastfed with that of their siblings who didn't, and found no differences. The within-home design is novel, neutralizing lurking variables like genetics and household that tainted previous studies. But some others remain. Maternal illness, allergies, and shifting finances could all lead to feeding differences between siblings, and each could create the appearance of a breastfeeding effect. And yet they didn't.

There are two potential explanations for why not. One is that there actually is a health impact of breastfeeding, but lurking variables canceled it out. This is mathematically unlikely. A complex combination of lurking variables will rarely amount to an effect that perfectly mirrors and opposes a true effect. The second possibility is that the remaining lurking variables were too weak to matter and therefore could not skew the numbers in either direction -- in which case the finding represents a clean, and likely accurate, view of breastfeeding's impact.

This calculus, in which reports finding no effect are much more likely to be correct, is true for virtually all observational studies, something Mark Bittman may or may not have known when, in referring to last month's massive diet study, he announced that "Butter Is Back." The study's finding of no link between heart disease and saturated fats led Bittman to a conclusion that may be more scientifically accurate than the decades of diet recommendations -- based largely on tainted associations -- that preceded it.

Few research results are definitive, and attempts to reproduce the recent findings will, over time, be the best test. But one lesson from the past month will hopefully endure: Public health mistakes are worth remembering.