05/22/2008 12:02 pm ET | Updated May 25, 2011

Using Statistics to Estimate the Michigan and Florida Elections That Didn't Happen

With all the ink that's been spilled over the question of how Michigan and Florida will be represented at the Democratic convention, with a much-anticipated meeting of the Democratic rules committee at the end of this month to consider this question, and with another round of speculation in the last day or so about whether Hillary would "carry the fight to the convention" over Michigan and Florida, I'm surprised that I haven't seen anywhere anyone trying to answer the following basic question:

If Michigan and Florida had voted when they were supposed to according to DNC rules, and if both candidates had campaigned there, what would have been the result?

Of course, the true answer to this question is an unknowable counter-factual. But, as any economist or social scientist could tell you, economists and social scientists answer such questions all the time, often on the basis of considerably less data than exists in this case.

Since January, 48 states and DC have voted, and we know the results of these votes. Throw out, perhaps, New York and Illinois as outliers. We also have detailed demographic data on who voted, in the form of exit polls, for example, here.

We also have detailed demographic information on who lives in Florida and Michigan - the US census.

We should also include a right-hand variable for days since Iowa to allow for a time trend. Assume that Michigan and Florida held their elections on the earliest date that would have been allowed. Also include an indicator variable for whether the contest was a primary or a caucus.

Run a multiple linear regression of the results against the demographic data, time, and the primary/caucus indicator variable. Use the estimated relationship to "predict" what the results would have been in Michigan and Florida. Anyone with the data can do this using Excel (you may have to install the statistics add-in.)

Suppose, as a first pass, that we do the following: ignore other candidates, make the left hand variable Obama's share of the Obama/ Clinton vote in terms of delegates, ignore the internal dynamics of delegate apportionment within the state, pretend that the demographics of the Florida and Michigan votes matched their demographics in the US Census (this last one should be straightforward to correct by estimating the demographics of the turnout first, but that would take another round of entering data from the census for each state.)

This would be a great project for college statistics classes that use Excel, or for any group of motivated people working collaboratively. It would take a little effort to enter the data into the spreadsheet, but if a group were working on it, they could easily divide up the task using for example a shared spreedsheet under Google Docs. Folks could publish their results on the web, so that everyone could see what the right hand variables were, and examine and compare different models (another interesting exercise.)

It would also be a nice project for some enterprising journalists.