Most in my cohort of "critical, mad-dog" graduate students fooled ourselves as we read for the first time Bateman's 1948 classic study of sexual selection in fruit flies. In the original paper, we missed the evidence that his method was not able to answer his questions. With hindsight those of us who considered ourselves "critical" might more realistically characterize our then-selves as self-deceived. Why had we not thought -- in the mid-1970s -- to repeat this iconic experiment?
Would a strict repetition calm our more mature fears that Bateman's hyper-cited paper that gave us the evolutionary justification for the double standard was inadequate to the task it set itself?
To repeat Bateman's study with fidelity, my collaborators and I did -- as Bateman had done -- an experiment with multiple populations of potentially breeding adult flies. We used 10 kinds of adults with different visible oddities, "nametags" coded by distinct genetic mutations. As I explained in a previous blog, each adult had a unique dominant gene that coded for a distinct phenotype, its nametag, which no other adult in its population had. When offspring inherited a parental nametag, Bateman only had to look at the offspring to know who was its mother or its father. In other words, the method of Bateman's study was meant to allow investigators to score who scored.
Each adult in each population had its own nametag. So, for instance, one adult would have a distinct mutation that caused bristles -- hair-like sense organs -- to be stumps instead of wild-type, hair-like bristles. Another adult would have curly wings instead of straight wild-type wings. Another adult subject would have no eyes (!) or very, very small, misshapen eyes. Or a subject would have a very tiny head (caused by a gene associated with the lethal condition in humans known as microcephaly). In a population of 10 adults, there were 10 distinct phenotypes coded by 10 distinct genes, each on a different chromosome, so that no adult had more than one odd feature.
However, offspring could inherit mutated genes from one or both parents. Just looking at the phenotypes of the parents made one wonder if an offspring who inherited two nametag mutations might be as good as dead before they were hatched. Brian Snyder and I had hypothesized that double-mutant offspring in Bateman's original study died like flies.
To find out, my collaborators and I put adult virgin flies with "nametags" together in multiple sets of small populations. We then let the adults alone to breed or not as they pleased. Then we looked at all of the 8,093 offspring that our subjects produced, noting for each one whether it did or did not have any subjects' nametags. We scored offspring into four categories just as Bateman had done: those with two nametags, those with only a mother's nametag, those with only a father's nametag and those with no nametags. Then we calculated the frequencies of these types and tested the fit to expectations from Mendel's laws, as we had earlier done with some of Bateman's own data.
Right away, we knew something was wrong. Only 15.6 percent had two nametags, a considerable difference from the 25 percent that Mendel's rules told us to expect. The missing children were those that should have had two nametags, one from each parent. Our conclusion was that the double-nametag offspring had died like flies.
Almost 800 of the expected double-nametag offspring were missing, so we couldn't score whether their parents had scored. Missing offspring implied that we necessarily miscounted within-sex variation in the number of mates, one of the key variables of sexual selection. Alas! Our repetition of Bateman's experiment told us that his method was incapable of informing an unbiased answer to his question about within-sex variation in mate number.
As if this wasn't bad enough, Bateman's method produced yet another systematic error, because the method also mis-measured the number of offspring -- reproductive success, or RS -- of both sexes. In any fair sample of offspring in those species in which everybody has both a mother and a father, we should observe an equal or near-equal number of fathers and mothers. If statistically significantly more offspring turn up for one sex of parent than the other, there's a problem.
The number of mothers and fathers we were able to count was identical for offspring with two parental nametags, but not for the single-nametag kids who got a nametag from only one parent and a wild-type (anonymous) allele from the other parent. In the replication 5,750 offspring inherited at least one nametag. (The others were wild-type who inherited neither of their parents' nametags.) Of the offspring with one nametag, 2,108 (46.8 percent) had only mom's nametag, but 2,400 (53.2 percent) had only pop's nametag, a highly statistically significant difference in the number of offspring we could count for fathers compared with mothers. Something was wrong.
Were our results because we had a few weird populations? No. Most of the populations in our replication had more offspring with fathers than offspring with mothers, proving that the data were biased. Looking at these data without the thought of biased data could make one conclude that number of mates had a bigger effect on reproductive success for fathers than for mothers. In our experiment fathers seemed to have had about 300 more offspring than mothers did, a biological impossibility. Bateman may have erroneously concluded that number of mates had less effect on female reproductive success (RS) than on male RS just because his method was biased! Could Bateman's influential conclusions about the lack of effect of mate number on RS of mothers have been due to an undercount of the number of offspring mothers must have had? Our repetition proved that an alternative explanation to the one Bateman favored is simply that an unreliable method produced biased results.
So why did it take so long to repeat Bateman's study? There are many reasons for failure to repeat studies, particularly foundational studies -- lack of funding, lack of publication venues -- and anyway, who wants to rediscover the wheel? No one gets into the National Academy of Sciences for a confirmatory replicated study. And if one repeats an earlier influential study but gets a different result, thereby challenging the earlier study, well... "Off with their heads!" Someone's bound to be mad, especially if they banked their long scientific career on presumed veracity of the earlier report. The sorry reality is that there are few rewards for repeating studies. Even so, replication is part of the strong suit of the scientific method, one of the best ways to keep us from fooling ourselves.