In a post

last week that presented an automated survey of North Carolina voters, we

described a three-point lead for John Edwards over Hillary Clinton (34% to 31%)

among Democrats as "statistically insignificant" and said that a six-point advantage

meant that Rudy Giuliani "runs ahead" of Newt Gingrich among Republicans (31%

to 26%). But reader "Thomas" asked

a good question:

When I look at the results for the

Republican candidates, there's a 6-point gap between Giuliani and Gingrich. But

the size of the sample is only 735. Do you think this gap between the two

candidates is really statistically more significant than the gap between the

two Democrats candidates? I'm especially concerned with the size of the

samples, and the way the interviews were conducted (automatically).

Thomas' question gets at an important issue for pre-election

polls: How do we know when a lead is really

a lead?

Let's get to the heart of the matter: The PPP survey

of North Carolina Republicans reported a "margin of error of +/- 3.6%." Presumably, Thomas doubled that margin (getting

+/- 7.2%) and compared it to the 6 point margin separating Guiliani and

Gingrich. That's the right instinct, because the reported "margin of error"

applies to each percentage separately. Looking at it that way, if you apply the

margin of error to each candidate's percentage, you get a set of ranges that

overlaps: somewhere between 27.4% and 34.6% for Giuliani and 22.4% and 29.6%

for Gingrich. So how can that be a significantly meaningful lead?

The issue gets a bit technical, but the bottom line is that the

statistical formula for a confidence interval (the formal term for "margin of

error") for the *difference* of two

percentages from the same sample produces something slightly smaller than just

doubling the reported margin of error. I'll let my colleague, Prof. Charles Franklin,

explain:

While [doubling the margin of

error] is the correct conclusion when there are only two possible survey

responses, it is not correct when there are more than two possible responses,

which is in fact virtually always the case. The difference between the "twice

the margin of error" rule and the correct calculation for the confidence

interval of a difference of multinomial proportions will depend on how large are

the proportion of survey responses other than that of the top two candidates

combined.

Franklin's paper** has the complete formula and more details

for those interested (see also Kish, *Survey Sampling* , 1965, p. 498-501), but

the bottom line is that the margin of error for a difference of two percentages

*gets slightly smaller* as the

percentage falling into other categories (undecided or third candidates) gets

larger. Franklin illustrates that point with the

following graphic. The horizontal blue lines represent the reported margins of

error (times two) for various sample sizes. The diagonal purple lines show how

the margin of error for the difference of two percentages declines as the total

of the percentages on which they are based ("p1 + p2") decline.

In this case, the margin of error for the 31% to 25%

Giuliani lead is +/- 5.43, which would be just barely significant. So what do we make of that? Thomas' question

implies that we should be skeptical about "barely significant" differences

given that, in this case, the survey was automated. Let's consider that.

First, we need to keep in mind that this sort of

significance test only takes into account the purely random variation that

comes from drawing a sample rather than interviewing the entire population. Other

potential errors could come from low rates of coverage or response (provided

that the missing respondents have different opinions than those interviewed) or

from the wording of the questions or their order. Unfortunately, the "margin of

error" as we know it is not a measure of *total*

error. So while other sources of error may not alter that "statistical

significance" the result might still be wrong. Poll consumers should keep that

in mind.

Also, the error margins calculated above assume a "simple

random sample," but most political polls involve some weighting and other minor

deviations from pure random sampling, which increase the error margin slightly.

Finally, keep in mind that the reported margin assumes a 95%

level of confidence. That is, we are 95% certain a 31% to 25% lead on simple

random sample of 735 respondents did not occur by chance alone. But there is

nothing magic about 95%, it is just the common accepted standard used by most

public opinion pollsters. If we wanted to be 99% certain, that 6 point lead

would just miss "statistical significance."

All of which brings us to a lesson: As Professor Franklin

likes to put it, we gain little by getting obsessed with "statistical

significance," except when we are a few days before an election (and even then,

it helps to look at many surveys, as we do here on pollster, rather than few). For

a survey like this one, the concept of statistical significance provides an

objective check, but it is more of a guide than a source of absolute rules.

**Charles wanted to make a few small revisions to his paper,

which we should have posted soon.