05/17/2010 05:12 am ET Updated May 25, 2011

March Madness for (Statistically Inclined) Dummies

Note: If you'd like to follow this strategy but don't even want to fill out a bracket, CBS Sportsline has an autofill option called "Historical Random" that is essentially what I'm suggesting.

March Madness is probably the only time where a large number of people have a rooting interest in probability in sports. By following a statistical framework and mixing in your own judgment about upsets (or just random guessing), you'll have a much better chance of filling in a pretty good bracket.

From 1985 to 2008, there have been 1536 teams in the tourney, or 96 teams of each seed. Below, based on all this data, I've compiled what each round looks like on average in terms of the number of teams left of each seed. The closer your bracket looks like this framework, the more the historical performance suggests that you'll have success. Then pick which of the seeds continue to advance. For example, if by a certain round there are expected to be 2.5 #7 seeds left, you might want to rule out one 7-seeded team for sure, include two of them for sure, and then base whether or not you would pick the remaining one on the context.

How would you know what specific teams to pick? Your own thoughts, experts, or simply just random guessing. The last one is especially fun if you want to try and annoy people who follow college basketball religiously.

Keep in mind that for a small group, picking the favorites is still the way to go, with no to little randomization. And obviously, with so much going on in the tourney, this approach is nowhere near a guarantee of even having the best bracket among your friends anywhere near 50% of the time. And if you're trying to get the best bracket in the country, screw any advice and just pick random upsets.

If you do follow this framework, please tell me how it worked out for you. There are some more advanced tips after the data:


(Round of 64 has four of each seed)

Round of 32:
#1 seeds: 4.00
#2 seeds: 3.83
#3: 3.38
#4: 3.17
#5: 2.71
#6: 2.75 (not a typo, 6 seeds have done better than 5 seeds)
#7: 2.50
#8: 1.83
#9: 2.17 (again, 9 seeds have done better than 8 seeds)
#10: 1.50
#11: 1.25
#12: 1.29
#13: .833
#14: .635
#15: .167
#16: 0

Sweet Sixteen:
#1: 3.50
#2: 2.50
#3: 2.00
#4: 1.71
#5: 1.46
#6: 1.50
#7: .750
#8: .375 (Low because always has to beat the #1 seed to get here)
#9: .125 (Ditto)
#10: .750
#11: .458
#12: .667
#13: .167
#14: .0416
#15: 0
#16: 0

Elite Eight:
#1: 2.88
#2: 1.83
#3: .958
#4: .583
#5: .209
#6: .500
#7: .250
#8: .250
#9: .0416
#10: .292
#11: .167
#12: .0416
#13: 0
#14: 0
#15: 0
#16: 0

Final Four:
#1: 1.75
#2: .875
#3: .500
#4: .375
#5: .167
#6: .125
#7: 0
#8: .125
#9: .125
#10: 0
#11: .0832
#12: 0
#13: 0
#14: 0
#15: 0
#16: 0

Championship Game: Didn't find data Champion: Didn't find data


More complicated stuff:

  • There are a lot of ways to work in the decimal part of the average besides just rounding to the nearest whole number. One interesting way would be to go to, and use the number generator they have on the right with a min of zero and a max of 99. If the decimal part of the average is greater than this number, round up. If it's lower, round down. Here's an example: you don't know whether to send one or two #4 seeds to the Sweet Sixteen. The average is 1.71. If the random number generator is less than 71, pick two teams. If it's greater than 71, pick only one team. If it's 71, run it again. This is a really easy way to generate an accurate estimation of number of upsets in your bracket.
  • If you fill in your bracket from the best seeds and work your way down, take into account the seedings of each individual matchup versus the expected seeding. For example, let's say in the first round you had a 13 seed upset a 4 seed and now that 13 seed is facing a 5 seed in the round of 32. The 5 seed's chances of winning just got better. Even if this appears to screw up your closeness to the historical averages, it really doesn't, as 5 seeds historically beat 13 seeds more often than they beat 4 seeds. This isn't true for every seed combo, though - the difference between 7 and 10 seeds seems to be negligible once you get past the first round.
  • If you fill in your bracket from the worst seeds and work your way up, keep in mind that the data just refer to the average number of teams of a seed left in each round, not a given team's chance of getting to that round if they have already made an upset. For instance, if you have an 9 seed knock off a 1 seed in the Round of 32, their chances of also winning in the Sweet Sixteen aren't nearly as bad as the .0416 (average number of #9 seeds in the Elite Eight) figure would suggest.
  • If you want to figure in actual knowledge of college basketball in a more prominent way, you could forget this framework altogether and assign probabilities to each game. Then use the random number generator in the same way as the first tip - for example, if a team has a 60% chance of winning, have them advance if 60 is greater than the number generated and have the other team advance otherwise. This is a particularly good option if you're filling out multiple brackets, as each of your brackets would essentially be a separate random simulation of the tournament based on as much of what you know as possible. Since you probably just want at least one to work out, the random factor would insure you against a wrong pick in one draft because you might have had it right in the other.

P.S. For more March Madness stuff, including team-specific analysis, visit the website of my organization, the Harvard Sports Analysis Collective.