The History of Our Emotions in Millions of Books

Emotions are considered private, ever changing, and often difficult to correctly recognize in others. Can we track the emotional states of an entire society, even trough different epochs?

One can adopt a humble strategy: record what people say or write in their daily life, and just count how many times they use words that are associated to various emotional states. Are people saying they are 'frightened' during a war, or 'joyful' in prosperous times? Naturally we are very diverse (my Italian grandma was complaining a lot in the last years of her life, yet I guess she was quite happy), but if one starts to collect data from many people, and has some good model of what to look for, some general patterns may start to be recognizable. What is extraordinary today is that we all spontaneously create massive amounts of this data by tweeting our frustration for bad news, updating our Facebook statuses with excitement for a party, rating restaurants or books, and posting in blogs.

Vasileios Lampos and coworkers used this strategy to analyze almost 500 millions tweets in United Kingdom. They were able to show that the emotional content of tweets was clearly associated with current events, like public spending cuts by the government or the English riots of August 2011. We thought that if this technique worked for present day events, one could apply the same methodology to a longer time scale, and try to retrace the expression of emotions in the books of last century, admittedly a humble strategy for a less humble purpose.

Together with Alex Bentley and Philip Garnett, we started to mine the Google Ngram database, which contains more than 5 million books, and we focused on books in the English language published between 1900 and 2000. Words representing emotions were divided in six main categories, referring to six 'principal' emotions: anger, disgust, fear, joy, sadness, and surprise. For each emotion we used lists of hundreds of 'mood-words' semantically related to the principal emotion. For example, for 'anger' there were words such as angry, despise, enviously, harassed, irritate; for 'joy': cheerful, enjoy, enthusiastic, exciting, and so on.

We were initially surprised to see how well periods of positive and negative moods were associated with historical events, similar to what was seen in the Twitter data with contemporary events. The Second World War, for example, is marked, not surprisingly, by an increase of words related to sadness, and a correspondent decrease of words related to joy.

The most interesting results nevertheless came later. First, we observed a steady decrease of the usage of the words present in our lists throughout the whole century. With a single exception: the words associated with the emotion 'fear' reverted this trend from the 1980s, when they started to increase in frequency.

Second, we found that, within the context of the overall decline of the use of mood-words, their frequency in American books has increased, in respect to British books, starting from the 1960s. American books are markedly more 'emotional' than British books in 2000, but this was not the case until 50 years ago.

In our paper we presented these trends without giving interpretations, but let's take the decreasing frequency of mood-words, and speculate on the reasons behind it. The diffuse grand narrative here is that we all became emotionless zombies. Well, perhaps. But other more parsimonious explanations may exist. For example, it might simply be that the words we use to express emotions have changed with time. This is certainly true, but since we used lists of modern words, used for a contemporary analysis of Twitter mood, any bias should have been in the opposite direction.

Also, many have remarked that one of the main trends in the 20th century literature is summed up by the slogan 'show, don't tell'. A sad character does not have to be described as 'sad', but the situation should provoke this feeling in the reader. If this were enough to explain our result, our finding would basically be a confirmation of the success of this principle.

However, we did not analyze the best selling books, or the critically acclaimed ones, but a huge amount of material containing everything from Hemingway to self-published amateurs, gardening treatises and DIY manuals. It would be surprising if the 'show, don't tell' principle would have been so successful through different genres. Moreover, this would predict all mood-words to decline in frequency, while we found one category, fear, to increase. It seems problematic to assume that the principle was not considered just in this emotional domain.

So perhaps books really became less 'emotional', but what about society? I like to think that during the 20th century other media - radio, movies, television, and more recently, Internet -- have appeared, and the emotional content, or certain ways to express it, shifted from books to these other media. This hypothesis has the advantage of being testable (so I can be proven wrong) using the same, or similar, methodology, on different media. Also it suggests we might be still able to express our emotions, zombie walks aside.

We have now access to what, until few years ago, would have been an unimaginable, at least for social and human sciences, amount of data. New archives, like the one we used, are digitized every day, and most of what we do leave, whether we want it or not, footprints that can be analyzed and quantified. For an anthropologist there is a bright side to it: we never, ever, had so much information on human behavior. Digital humanities or Culturomics are trendy labels today, however the availability of these data, if accompanied by good theories (data are necessary, but not sufficient) might indeed revolutionize our understanding of human cultural dynamics.