02/13/2007 01:59 pm ET Updated May 25, 2011

When Is a "Lead" Really a Lead?

In a post
last week that presented an automated survey of North Carolina voters, we
described a three-point lead for John Edwards over Hillary Clinton (34% to 31%)
among Democrats as "statistically insignificant" and said that a six-point advantage
meant that Rudy Giuliani "runs ahead" of Newt Gingrich among Republicans (31%
to 26%). But reader "Thomas" asked
a good question:

When I look at the results for the
Republican candidates, there's a 6-point gap between Giuliani and Gingrich. But
the size of the sample is only 735. Do you think this gap between the two
candidates is really statistically more significant than the gap between the
two Democrats candidates? I'm especially concerned with the size of the
samples, and the way the interviews were conducted (automatically).

Thomas' question gets at an important issue for pre-election
polls: How do we know when a lead is really
a lead?

Let's get to the heart of the matter: The PPP survey
of North Carolina Republicans reported a "margin of error of +/- 3.6%." Presumably, Thomas doubled that margin (getting
+/- 7.2%) and compared it to the 6 point margin separating Guiliani and
Gingrich. That's the right instinct, because the reported "margin of error"
applies to each percentage separately. Looking at it that way, if you apply the
margin of error to each candidate's percentage, you get a set of ranges that
overlaps: somewhere between 27.4% and 34.6% for Giuliani and 22.4% and 29.6%
for Gingrich. So how can that be a significantly meaningful lead?

The issue gets a bit technical, but the bottom line is that the
statistical formula for a confidence interval (the formal term for "margin of
error") for the difference of two
percentages from the same sample produces something slightly smaller than just
doubling the reported margin of error. I'll let my colleague, Prof. Charles Franklin,

While [doubling the margin of
error] is the correct conclusion when there are only two possible survey
responses, it is not correct when there are more than two possible responses,
which is in fact virtually always the case. The difference between the "twice
the margin of error" rule and the correct calculation for the confidence
interval of a difference of multinomial proportions will depend on how large are
the proportion of survey responses other than that of the top two candidates

Franklin's paper** has the complete formula and more details
for those interested (see also Kish, Survey Sampling , 1965, p. 498-501), but
the bottom line is that the margin of error for a difference of two percentages
gets slightly smaller as the
percentage falling into other categories (undecided or third candidates) gets
larger. Franklin illustrates that point with the
following graphic. The horizontal blue lines represent the reported margins of
error (times two) for various sample sizes. The diagonal purple lines show how
the margin of error for the difference of two percentages declines as the total
of the percentages on which they are based ("p1 + p2") decline.


In this case, the margin of error for the 31% to 25%
Giuliani lead is +/- 5.43, which would be just barely significant. So what do we make of that? Thomas' question
implies that we should be skeptical about "barely significant" differences
given that, in this case, the survey was automated. Let's consider that.

First, we need to keep in mind that this sort of
significance test only takes into account the purely random variation that
comes from drawing a sample rather than interviewing the entire population. Other
potential errors could come from low rates of coverage or response (provided
that the missing respondents have different opinions than those interviewed) or
from the wording of the questions or their order. Unfortunately, the "margin of
error" as we know it is not a measure of total
error. So while other sources of error may not alter that "statistical
significance" the result might still be wrong. Poll consumers should keep that
in mind.

Also, the error margins calculated above assume a "simple
random sample," but most political polls involve some weighting and other minor
deviations from pure random sampling, which increase the error margin slightly.

Finally, keep in mind that the reported margin assumes a 95%
level of confidence. That is, we are 95% certain a 31% to 25% lead on simple
random sample of 735 respondents did not occur by chance alone. But there is
nothing magic about 95%, it is just the common accepted standard used by most
public opinion pollsters. If we wanted to be 99% certain, that 6 point lead
would just miss "statistical significance."

All of which brings us to a lesson: As Professor Franklin
likes to put it, we gain little by getting obsessed with "statistical
significance," except when we are a few days before an election (and even then,
it helps to look at many surveys, as we do here on pollster, rather than few). For
a survey like this one, the concept of statistical significance provides an
objective check, but it is more of a guide than a source of absolute rules.

**Charles wanted to make a few small revisions to his paper,
which we should have posted soon.

Subscribe to the Politics email.
How will Trump’s administration impact you?