Because the initial foray into statistics, Simpson's Paradox, was well received, we continue that thread, this time taking up something that was briefly mentioned in one of the comments--that "average" sometimes isn't meaningful. Think about the weather. In Northern Virginia, virtually every December day was above normal, sometimes record-settingly so. Virtually every day in January has been below normal. If you averaged them, you'd end up saying that the weather was normal for these two months when there's hardly been a single "normal" day.
Consider this array of 11 numbers.
The most common measure of "average" or what statisticians call "central tendency" is the mean. To get the mean we add up all of the numbers and divide by the number of numbers, in this case, 11.
Mean = $71,828,014,000/11 = $6,538,901,000
The last number in the array is not hypothetical, but was Bill Gates' wealth as of January 30 (www.greenspun.com/WealthClock if you're interested in that sort of thing). So the average of Mr. Gates and 10 other citizens shows them all as multibillionaires even though no one else is worth more than $80,000. The mean is affected by extreme values, also called outliers. In terms of this set of numbers, Bill Gates is one extreme guy.
Another common measure of central tendency is the median. The median divides a group of numbers in half--half will be larger than the median, half smaller. In the distribution above the median is $40,000--there are five numbers larger, five smaller. In considering a median, Bill Gates is just another guy. So when you think you might have a few very high or very low numbers, the median might be more representative of the average.
For instance, a Virginia newspaper reported that teachers were refusing to give students zeros for work not turned in because the zero had a devastating impact on the students' average score. The teachers were calculating average with a mean. Given the other scores, a zero was an outlier, an extreme value and did, indeed, exert a large downward tug on the average. Had the teachers used the median for their measure of average, a zero wouldn't have had nearly as much impact and the resulting average would have been more typical of the students' work.
The most common median in the everyday lives of educators is typically called the 50th percentile--50% of all test scores are above it, 50% below (presuming a "normal"--bell curve--distribution).
The third measure of average is the mode and isn't used very much. It is the most commonly occurring number. In the above array, $10,000. Using the mode, the least wealthy citizens are also the "average." Where you see the mode referred to most often is when researchers report that they have "a bimodal distribution" (or even trimodal). A bimodal distribution looks like a two-hump camel.
If the distribution of numbers is "normal"--that is the numbers fall into a bell-shaped curve--the mean, the median and the mode are identical. In fact, one way of telling if you have a normal distribution or not without plotting it as a graph is just to look at the mean, median and mode and see if they are different.
When people talk about average scores or average anythings, they don't always specify which statistic they have in mind. Sometimes they do this on purpose. In the days of the first Bush tax cuts, Bush and his supporters claimed that the average tax payer would get a large break. His opponents contended it would be a much smaller amount. An analysis by the Washington Post indicated that a teacher would get a break that would allow her to buy a new TV (say, 27", not HD). A person making a million a year would get a $90,000 tax break. Those million dollar people were outliers, extreme values so the Bush team used the mean to allow the extremes to have an impact. The anti-break people were using the median. The median was more representative of the tax break because many more people have salaries similar to teachers (about $49,000 a year now) than rake in a million dollars a year.
The first law people learn in introductory statistics classes--and then forget immediately--is "no measure of central tendency without a measure of dispersion." Measures of dispersion simply indicate whether or not the array of numbers is tightly bunched up close to the average (however measured) or whether they vary a lot with some numbers way above average and some way below. The most common measure of dispersion is the standard deviation and I'll deal with that when I get around to talking about standard scores (SATs, ACTs, GREs, MCATs, LSATs, IQs, NAEP, and the test scores from international comparisons are all standard scores).
Well, this stuff isn't as sexy as Simpson's Paradox, but it's basic, really basic to becoming a wise consumer of numbers found in commission reports, op-eds, surveys, and research.