MEDIA
06/07/2017 11:57 am ET

How Good Taxonomy Can Drive Your SEO Strategy

HuffPost publishes thousands of articles each month. It was only a matter of time before the free-form tags caught up with us.

Welcome to HuffPost’s Keeping It 100. From infusing our culture with data to figuring out how to reach Gen Z and cultivate niche distributed communities, we’ll give you an inside look at the hits and misses of HuffPost’s biggest bets.

Eleven years on the internet has an upside and a downside when it comes to SEO.

The upside: brand recognition, brand authority and a sea of great content from talented writers and editors. The downside: the internet moves quickly, and best practices change. So when we discovered a lagging search presence, we weren’t sure what was causing it. A little housekeeping was in order.

We looked at over a billion pages on the HuffPost domain as part of a full audit. We examined the site architecture, poked around the sitemaps, and looked at server issues. We made sure all of our technical t’s were crossed.

Next, we investigated our search profile. We ranked for lots of keywords. But they weren’t always priority keywords. The keywords drove lots of organic traffic. But it wasn’t always quality organic traffic.

Finally, we turned to our content. Everything looked normal at the article level. Sure, a few things needed tweaking, but overall there were no five-alarm fires.

So what was the problem? While article-level traffic was prominent, category-level traffic was suspiciously low. Even though we consistently covered important subjects, organic traffic didn’t reflect that coverage.

That’s when we realized the problem that had been ballooning for years: our tags.

What Are Tags?

Tags are values used to classify and organize content. In our content management system, editors assign tags to each post. These tags speak to the entities in the story ― things like a person, a place, a country or any broader concept. Once a post is published, the assigned tags are displayed on the front end. Here is an example of tags on our site:

Each tag points to a corresponding tag page. These tag pages can either house a few related articles or a large inventory of articles.

Here is one version of a tag page on our site:

The tag page for actor Chris Pratt houses every single piece of content that's tagged with his name.
The tag page for actor Chris Pratt houses every single piece of content that's tagged with his name.

How Do Tags Impact A Site?

Tag pages act as the connective tissue of a site. They’re great for users, because they help users discover more related content.  

But they’re also great for search engine spiders. Spiders rely on links to find, decipher and catalog content on the web. Tag pages ― pages that naturally house numerous links ― create pathways for the spiders to locate content that might be buried deep within a website.  

Even better, tags pages use common keywords that piggyback off popular search queries. This setup is ideal for search, and extremely valuable when it comes to ranking for individual actors, musicians, politicians and companies.

We thought our tag pages looked peachy. Until we came upon pages that looked like this:

Hmmm.
Hmmm.

 And then this:

Ugh. Awkward collar tug.
Ugh. Awkward collar tug.

Editors were adding their own tags, often pairing topics and descriptors that were one-offs and only applicable to a handful of stories. And those tags were creating new tag pages. The number of uncontrolled vocabularies were playing a clear role in our search visibility.

How Did Tags Impact Search?

It might sound counterintuitive to say we wanted to avoid these kinds of tag pages: Shouldn’t more pages in the search index mean more pathways we have to capture our reader?

Nope. Not since Google introduced its Panda algorithm update. That change actually penalizes low-quality, thin content pages. Our redundant tag pages fit the bill. We were at risk.

Using a site operator in search, we exposed the effect that tags were having on the search and user experience. For example, a search for “Chris Pratt Anna Faris” produced mixed results:

We’re sure you’re just as confused as we were.
We’re sure you’re just as confused as we were.

Eleven years of this free-form tagging had made things difficult in search. There was a tag page for every keyword variation. An antiquated search approach that once worked well was hurting our site authority.

By creating so many entrances to HuffPost, we were bloating the site with similar content pages and sending the Google crawler mixed signals. Link equity wasn’t being concentrated in one powerful topic page. Instead, we were splitting the value across multiple pages. That hurt our ability to rank favorably for big topics (like “Jurassic World” movie stars). It also meant competing search results, user confusion and traffic cannibalization.

This isn’t uncommon. Digital publishers born in the same era as HuffPost are also victims of the same free-form tagging issues. Take a look at some familiar faces:

Sorry to call you out, Vice and Buzzfeed.
Sorry to call you out, Vice and Buzzfeed.

The Solution

There seemed to be no end in sight. Our tagging library continued to grow and get out of control. We had to put a solution in place before things ballooned even more.

The ideal solution was actually pretty simple. We needed a fix that would organize our content and lift some of the manual work from our editors. That’s when Relegence came into the picture.

Relegence is a semantic technology combining the power of natural language processing with machine learning and a standardized taxonomy to understand and structure content. The technology identifies the most appropriate categories and entities to automatically append to the scanned content ― putting an end to our erroneous and wild tags!

With controlled vocabularies in place, we had to tackle the other elephant in the room: the number of orphan tags with no associations. (For example, a tag like “Chris Pratt Anna Faris” that wasn’t linked to the “Chris Pratt” tag.)

The logic was straightforward. Add a step in our CMS’s import process to assign all unique tags (children) to related tag groups (parents). The new setup would pull in all associated content around a parent topic, creating robust topic pages with semantic ranking power.

These are the types of pages Google loves to crawl and surface to users. Good from a user perspective, good from a crawler perspective, good from a back end perspective. A win-win-win.

We were finally making a dent in the billion-page grooming process.

The Results

Today, our tag pages are working for us rather than against us.

We see low-quality tag pages scrubbed from the search index and high-quality tag pages rise to the top. We rank for relevant topics and themes core to our business. Most importantly, we drive new traffic to tag pages we once brushed off ― including our Donald Trump tag page. 

As recently as last year, the Donald Trump tag page was underutilized and underperforming. After consolidating all Trump-related tags into one page and including link-building efforts, we were able to see the fruits of our labor.

We went from being nonexistent in Google search results for Trump to ranking on page one. It’s a major achievement for such a competitive keyword.

Our success doesn’t end there. Our other big wins include:

  • More useful, authoritative tag/topic pages;

  • Improved page crawling and rankings;

  • Better content analysis and content retrieval.

The cleanup is still a work in progress, but we’re making strides. Our site taxonomy will continue to evolve over time, and so will its influence on our site’s search performance. These are changes everyone can get behind. Even Chris Pratt.

CONVERSATIONS