TECH
07/17/2015 12:59 pm ET Updated Jul 17, 2015

Which Curse Words Are Popular In Your State? Find Out From These Maps.

Put your shades on. These words might burn.

We live in a country of swearers.

We swear at work, as well as home. Our politicians have dirty mouths too. Studies have even shown that cursing can reduce stress and be beneficial to our health.

The words we choose to use vary, however. These maps of the 48 contiguous U.S. states show where different swearwords are used: 

  • A positive z-score (orange clusters) represents U.S. counties where the word is relatively common, while a negative z-score (
    Jack Grieve
    A positive z-score (orange clusters) represents U.S. counties where the word is relatively common, while a negative z-score (blue clusters) represents areas where the word is less common. 
  • Jack Grieve
  • Jack Grieve
  • Jack Grieve
  • Jack Grieve
  • Jack Grieve
  • Jack Grieve
  • Jack Grieve
  • Jack Grieve

 

As shown in the maps above, the use of curse words across the 48 states follows regional patterns. A good old-fashioned "damn" is more common in Southern states along the coast, while people in urban areas love to drop the f-bomb. And who knew the "c-word" was so popular in New England?

The author of these "swearword maps" is Jack Grieve, a professor of forensic linguistics at Aston University in England, who published the maps on his Twitter on Thursday. Grieve has been working with Diansheng Guo at the University of South Carolina on a project to map dialectal variations using geo-coded tweets, as part of the Digging into Data Challenge.

For the swearword maps, specifically, Grieve used a corpus of almost 1 billion geo-coded tweets, amounting to almost 9 billion words, that were collected between October 2013 and November 2014.

Here's how Grieve explained the process of analyzing and putting the maps together: 

1. Sort all tweets by county.

2. For any word (e.g. fuck) we measure its relative frequency in each county by diving the total number of occurrences of that word in that county by the total number of words in that county. In that way we control for variation in the number of tweets per county. Not doing something like this is the main problem with most Twitter based maps you'll find online. They end up just basically being population density maps.

3. We then take that raw map and use a hot spot analysis (a Getis-Ord Gi local spatial autocorrelation analysis) to identify underlying clusters. We do this because the maps tend to be pretty noisy, so this really helps for visualization and interpretation.

4. We then map the Getis-Ord z-scores to identify clusters. Specifically, a high z-score means that that county is in the midst of a region where that word is relatively common, a negative z-score means that that county is in the midst of counties where that word is less common.

 

"I just thought it was something people on Twitter would enjoy. And it's not really something I'd be looking at writing an academic paper about," Grieve told HuffPost in an email. "These maps really seemed to take off."

And, in case you were wondering, Americans also love "butts" more than "boobs." 

CLARIFICATION: This post has been updated to clarify that Grieve's maps include only the 48 contiguous states.

CONVERSATIONS