iPhone app iPad app Android phone app Android tablet app More

Featuring fresh takes and real-time analysis from HuffPost's signature lineup of contributors
Amir Aczel

GET UPDATES FROM Amir Aczel
 

Can Facebook Call the Election?

Posted: 08/31/2012 11:58 am

Exactly 20 years -- and five presidential election seasons -- ago, at the end of August 1992, I was flying from London back to Boston, and at 41,000 feet had one of the most interesting ideas in my professional life. I was then a professor of mathematics and statistics at what is today called Bentley University (then Bentley College) and about to start teaching an advanced statistics course. What would happen, I asked myself, if instead of giving my class the usual boring examples, I would have them do a class project: use statistics to predict the results of the upcoming presidential election?

Professional polling organizations like Gallup have thousands of employees and bottomless pockets, but the hubris of youth told me that I knew statistics better than they did, and that with 25 students, 12 phone lines, and a budget (generously provided by the college) to make a mere 1,000 phone calls to voters around the country, I could call the election with high accuracy. We spent the semester learning about sampling methods, stratification, bias in surveys, and sampling distributions, all in preparation for our big event. Then, the night before the election, we stayed up late in our "operations center," each group of students manning a phone, armed with a state-of-the-art random sampling scheme we had developed throughout the course that would give every voter in the United States an equal chance of being selected for our representative sample. Over takeout pizza and countless cups of coffee, we ran a poll of 1,000 voters that perfectly spanned the entire United States, with all its regions, area codes, and phone exchanges proportionally represented in our sample. And we did it! We were able to call the results of the election to within half a percentage point for one of the three candidates who ran that year, Ross Perot, and within 1% and 1.5%, respectively, for the other two: Bill Clinton and George H.W. Bush. A good experimental design allowed us to do so well -- obtain a result of higher overall accuracy than that of Gallup -- and our predictions were reported in some newspapers and on radio programs.

The point is that, by 1992, telephones had become a viable way of conducting political polls. But it hasn't always been that way. Before 1936, an important magazine that no longer exists, the Literary Digest, had been able to call presidential elections so well that the New York Times would regularly report the Digest poll results on its front page during every election campaign. In 1936, Digest editors decided to outdo themselves, hoping to gain even more prestige for their magazine. They would collect a sample of unprecedented size: 10 million voters! (Of these, 2.4 million responded -- still a sample that is immensely large.) The hugeness of the sample, the researchers believed, would guarantee them a supreme accuracy. Unfortunately, they did not fully understand the concepts of randomness and bias. The Literary Digest was a conservative publication, and its readers tended to vote Republican. The Digest used its readership as one source of sampled voters, thus introducing a bias. But two more sources of bias existed -- and they are more interesting for us here -- one was automobile registration plates, and the other was telephone numbers!

Now, today we poll people using the phone all the time. But this was 76 years ago: People who had cars and/or phones tended to be wealthier, and wealthy people, as we know, tend to vote Republican. So the frame, the statistical base for the sampling, was highly flawed -- it had a built-in bias to the right -- and so even a fantastically large sample size of 2.4 million could not make that natural bias go away. (There is also the problem of non-response, the fact that of 10 million people, less than a quarter responded; but that is another issue. Incidentally, my students told the people they polled that their grade in the class depended on their answering the poll, so our non-response was close to zero!). Because their vaunted sample indicated that the Republican candidate, Kansas Governor Alfred Landon, would win the election, the magazine went boldly out with this prediction -- prominently reported on the front-page of the New York Times and other newspapers. History decided otherwise, and the Democratic candidate, Franklin Delano Roosevelt, won the election in a landslide. The Literary Digest soon closed its offices in disgrace, having lost both face and readership.

So why am I telling you this story? In 1936, using telephone numbers to generate a frame from which to collect a random sample was a prescription for disaster. By 1992, phones had become the most efficient way to generate good samples because they were easy to use (who wants to travel from town to town, knocking on doors?) and in the meantime phones had become ubiquitous and no longer exhibited a preference to be owned by richer people: so the built-in bias was almost completely gone. (I have to say "almost" because there are people with no phones, but they are very few and their phonelessness may not be as highly income-dependent as it would have been in 1936). Another interesting fact to note here is that sample sizes need not be very large. A well-designed survey, in which good probability sampling is carried out, may well contain as little as 1,000 voters and still give excellent information. (The statistical rule is that in a random sample of size n, the sampling error at 95% probability is roughly plus or minus one divided by the square root of n. Thus, for a sample of size 1,000, the sampling error at 95% probability is plus or minus about three percentage points.*)

We have now moved to the next level of technology -- from phones to the Internet. And the question is: Have we progressed to the point at which using the Internet as a source of statistical information valid or not? And this is what brings me to Facebook. Both President Obama and Mitt Romney have their own Facebook pages, on which Facebook users can click the "like" button. I tracked the "likes" for Obama and Romney over three days, and here is what I found:

Barack Obama:
28,004,524 likes on August 28, 2012
A day later, after the GOP convention nominated Mitt Romney:
28,014,250 likes
And a day later:
28,023,918 likes

Mitt Romney:
5,332,105 likes on August 28, 2012
A day later, after the GOP convention nominated him:
5,440,065 likes
And a day later:
5,482,806 likes

First, from these data it appears that Obama is more liked than Romney by a ratio of five to one (although, as president for almost four years now, Obama has had more of a chance to collect "like" clicks). Then, we see some interesting trends here: Obama gained roughly 10,000 "likes" a day over two days. But Romney gained more than 100,000 "likes" the day he was formally nominated by the GOP at the convention in Tampa, and more than 40,000 "likes" the following day. It appears to me that following the two "like" counts, for Obama and for Romney, might be an interesting way of tracking something that may act as a proxy toward the popular vote on November 6 as we move through time -- something like a continuous kind of opinion poll. I italicize "might," and I am being very tentative here, because of the cautionary tale of the Literary Digest. I think that this might be an interesting pair of statistics to follow as they change through time precisely because I want to know whether or not there is a bias in using Facebook as an indication of who might win the election.

Is using Facebook like using phone numbers in 1936? -- meaning, is there an inherent bias here? The income element is probably not there: Facebook users are not, by any indication, either richer or poorer than nonusers. But with Facebook there may be two potential sources of bias: one is age, and the other is sensitivity to privacy issues. It appears that Facebook users may tend toward younger segments of the population, but I don't know whether this is a fact; and many non-users seem to have an obsession with "privacy." If age and sensitivity to privacy move voters either to the left or to the right, then using Facebook as an indicator of who is likely to win the November election, as measured at a given moment in time, is flawed. Otherwise, it may be an excellent, fast and easy rough source of some approximate kind of information about where the popular vote might be heading. Another question is whether Facebook users are quick enough to "unlike" a candidate once they change their minds about him. This may be an important hidden factor here.

How you can help: You could go to Facebook from time to time and in the "Comments" below paste what you see as the number of "likes" for each candidate. Then, after the election results are out, we will see whether this polling method worked or not (although one result will not necessarily tell us whether the method is biased or not -- the Literary Digest did get it right, by chance, several times before its fatal debacle.)
_____________________________

* Technical Note: With two candidates, the distribution of the sample proportion for one candidate is binomial, and its variance term is: p(1-p)/n, where p is the actual proportion of votes for the candidate. With a large sample, we approximate the binomial distribution with a normal distribution, so we use the multiplier 1.96 for 95% confidence. Once the election result is known, this can be done by calculating 1.96 times the square root of the expression above for the sampling error (or "margin of error") at 95% confidence. But before the results of the election are known, p is unknown, but an upper bound for the expression p(1-p) can be obtained (as a fun exercise in basic calculus) at p=0.5. An upper bound for the sampling error at 95% is then obtained with the 1.96 almost perfectly canceling with the square root of (0.5)x(0.5), leaving the rough upper-bound estimate of plus or minus one over the square root of n for the sampling error of the survey. With three candidates, the actual distribution is multinomial, rather than binomial, and the formulas are more complicated.

 
 
 
FOLLOW TECH
Exactly 20 years -- and five presidential election seasons -- ago, at the end of August 1992, I was flying from London back to Boston, and at 41,000 feet had one of the most interesting ideas in my pr...
Exactly 20 years -- and five presidential election seasons -- ago, at the end of August 1992, I was flying from London back to Boston, and at 41,000 feet had one of the most interesting ideas in my pr...
 
 
  • Comments
  • 40
  • Pending Comments
  • 0
  • View FAQ
Comments are closed for this entry
View All
Favorites
Bloggers
Recency  | 
Popularity
Page: 1 2  Next ›  Last »  (2 total)
12:01 AM on 09/04/2012
Tracking FB likes of a 4 year president with a newly appointed candidate? I can't "like" Obama again, but whoever the R nominee would be would get more likes. This is like comparing apples to submarines. I feel dummmer after reading that.
photo
HUFFPOST SUPER USER
canchita
11:39 PM on 09/02/2012
One reason for Obama's popularity with the Facebook members is that you have to know how to read to have an account on it.
photo
HUFFPOST SUPER USER
carl cid inting
There are no tyrants where there are no slaves
10:40 PM on 09/01/2012
Nobody remembers Mitt Romney's speech at the end of the Republican convention. But everyone's still talking about Obama's empty chair and how the Republicans had Clint Eastwood explode in their faces.
photo
HUFFPOST SUPER USER
William Brock
10:18 PM on 09/01/2012
OBAMA 2012
07:59 PM on 09/01/2012
Given that something like half of Obama's Twitter followers are fake accounts(Romney has his share too), is it all that unlikely that a good chunk of his Facebook fans were bought and paid for?
photo
HUFFPOST SUPER USER
William Brock
10:32 PM on 09/01/2012
Here we go, conspiracy time again! Their trying to get us! OMG! Where is my frick-in money Facebook? OBAMA 2012
This user has chosen to opt out of the Badges program
photo
10:59 PM on 09/01/2012
As I read your comment I can see why most of the people of this country is so confused you do not want to see the truth because of the blinders that you have on try open your eyes and understand this one fact that the president has done a good job for the people of this country.

If it was not for the republican agenda stifling progress with Willie do on a whole lot better but instead you are people like you are constantly telling lies and that is all you have to show for yourselves at this time.

I have come to understand that you just do not understand what exactly is really going on in your own party they are a bunch and do nothing politicians that only want two further their careers on the backs of the hardworking people of this country.

And just on that I really feel sorry for you because you are a part of the problem have a wonderful day.
02:35 AM on 09/02/2012
Well, I know that any government monopoly is inherently less efficient and vastly more expensive than anything the free market can provide.

I also know that, left to his own devices, Obama would create a government solution to everything, creating yet another financial disaster and leading the country to ruin.

Thanks but no thanks.
photo
HUFFPOST SUPER USER
bfcg
Praise the holy Sasquatch
07:36 PM on 09/01/2012
For most Romney voters, probably less than half actually like him.
And to one of the previous posters, a 54/46 vote ratio is a landslide.
07:34 PM on 09/01/2012
OBAMA 2012
photo
hyperion126
"curiouser and curiouser"
03:41 PM on 09/01/2012
People have numerous accounts in the same household...for their infant children, their dogs, cats, and horses, their businesses.
10:25 AM on 09/01/2012
While the idea is neat. There are a few things that could easily skew the results. One is age. Not that older people don't use Facebook (although they do) but those below voting age could like a candidate.

Also even if Facebook was evenly spread throughout demographics, and we took out a specific percentage who were underage and assuming most underage voters would be more likely to "like" Obama (how many 16 year olds even know who Romney is?) not everyone that uses Facebook "likes" people. I use Facebook to keep in touch and quickly share photos and thoughts, but have never "liked" anything.

Finally, using the idea of how many new "likes" poses an issue. Someone who voted for Obama in 2008 and planning on voting for him again in 2012 may already have "liked" him in 2008. We can all assume this election will be close to 50/50 maybe 54/46 tops. So since 23 million more people like Obama already, that is 23 million less people to add to his existing total.
01:33 AM on 09/01/2012
Should we consider that men who wear 1930's hats might be best at calling the election?
photo
HUFFPOST PUNDIT
Todays Illusion
Ordinary and undistinguised citizen.
05:24 PM on 08/31/2012
Facebook is
Grandma & Grandpa
Pre-teen, non-voting age teens

and was over before the stock went public.

Facebook would skew conservative, the ones not working . . . stay at home moms and the retired.
photo
HUFFPOST BLOGGER
Amir Aczel
05:15 PM on 08/31/2012
Thanks for all your comments, everyone! I realize that this is fraught with danger--statistical and otherwise--but I wanted to just point out a POSSIBILITY: What IF Facebook could call the election? None of us know the real answer. Wait till November 6, and see how many "like"s each candidate gets, and compare these two numbers with the actual popular vote (not the Electoral College results). It's just a game! OK?
photo
HUFFPOST SUPER USER
William Brock
10:29 PM on 09/01/2012
I like it......
08:05 AM on 09/02/2012
The 100,000 likes for Romney versus the 10,000 for Obama could be because of the RNC 4 days of network induced bounce which decays . It would be interesting to see if there is a similar bounce of likes after the DNC convention that could nullify the high 100,000 number of likes for Romney. What is interesting is the 5 to 1 ratio. Obama has been president for almost 4 years. Romney has been running for president for at least 5 years. But by now you can tell I am bias toward Obama.
HUFFPOST SUPER USER
Allene Stucki
01:28 PM on 08/31/2012
Awful lot of teenagers on Facebook, but not many grannies - an awful lot of grannies at the voting booth, but not many teenagers.
photo
HUFFPOST SUPER USER
Cowboylove
07:30 PM on 08/31/2012
Clearly you are not familiar with Facebook. There are a lot more grannies than teenagers on there today. It started out with young people, but most of the people I see today are older people.
01:22 PM on 08/31/2012
See the "Hacked By Mitt Romney" Facebook page for stories of how unsuspecting Facebook users somehow became fans of Mitt Romney's page.
02:36 PM on 08/31/2012
Mitt Romney is stuffing the Facebook ballot box.
01:21 PM on 08/31/2012
I've uncovered evidence that Mitt Romney's stuffing the Facebook ballot box: http://www.markturner.net/2012/08/25/is-romney-manipulating-his-social-media-numbers/
photo
HUFFPOST SUPER USER
William Brock
10:33 PM on 09/01/2012
Who cares? He still not going to win in November. So there!