iOS app Android app More

Featuring fresh takes and real-time analysis from HuffPost's signature lineup of contributors
Adam Fendelman

Adam Fendelman

Posted: October 10, 2008 07:17 PM

Since a few weeks ago, I've lost all Google search traffic to HollywoodChicago.com, which is my Chicago-based film publication (others are on staff, too). As you could imagine, this wouldn't exactly be a welcome cherry on anyone's sundae.

It was especially unnerving considering I only recently finished battling GoDaddy. They sent me a groundless hosting bill for $6,579.91 that was reduced to $969 and then waived entirely (in large part due to writing the story here).

So, why would a Web publication suddenly lose all its search traffic from Google and Google News? This Web publication, by the way, has been publishing original reviews, interviews and news on film, theater and TV since early 2000. There are a number of reasons why it holds Internet legitimacy.

It is published by a Chicago Film Critics Association-accredited film critic (yours truly), is approved on the Tomatometer at Rotten Tomatoes and has been cited in dozens of publications around the world including the Los Angeles Times, Chicago Sun-Times, USA Today, FOX Chicago, CBS 2 Chicago and AOL's Cinematical.

I don't mention these facts for bragging points. I mention them because (while it might seem like it would) they don't matter when it comes to this Google issue.

Google didn't stop indexing HollywoodChicago.com because the search behemoth suddenly thought the publication was illegitimate. Google also didn't destroy the site's PageRank for this same reason. PageRank, by the way, is the way Google measures how important your site is and how high your site appears for various search queries.

After spending nearly two years to earn a Google PageRank of 5/10, in the last couple weeks my PageRank dropped to an impossible 0/10. At 0/10, that means Google thinks my site is as unimportant as someone who'd throw up a blog today with absolutely nothing on it.

On my fix mission, my first step was finding out what went wrong. This proved to be challenging because (as you may know if you've ever tried to get in touch with a human at Google) it's rather difficult to get in touch with a human at Google.

I started by e-mailing my issue to press@google.com since I am a journalist and this was a legitimate problem. I got a quick response. Someone at Google forwarded the inquiry to someone else at Google. She wrote to me: "My colleague forwarded me your inquiry about PageRank and your site. Let me look into it and I'll get back to you."

A week went by and she went MIA. I sent several follow-up e-mails to her with no dice. The problem persisted. I started seeking other methods for correction.

Though many people likely don't know it exists, Google Webmaster Tools is a very helpful application. It shows you what your site looks like from the perspective of Google's brain. Google indexes your Web site using an automated Internet spider called Googlebot.

Google Webmaster Tools gives you a wealth of information about what Googlebot sees in your site along with what's wrong with it.

I didn't think to check there until I found a Google Group (a discussion forum) specifically to help Webmasters. Very knowledgeable people (including people who work for Google) respond there very quickly. It's an excellent way to leap that hurdle of actually getting to speak with Google humans.

John Mueller (who publicly lists himself as a Webmaster trends analyst at Google Zurich) was the first one to reply. He hit the nail on the head right out of the gate. John recommended that I check for errors with my site in Google Webmaster Tools.

Once I was alerted to do this, the graph below from Google Webmaster Tools tells the whole story. Notice the section I've highlighted in red. This crawl graph shows you when Google stopped indexing my site entirely.

2008-10-08-googlecrawlstats.jpg

Sure enough, there were thousands of new errors. There were 11,260 HTTP problems due to a "4xx error" along with 2,684 URLs restricted by robots.txt and 2,480 errors for URLs in my sitemap. In layman's terms, this means Google was being blocked from indexing my site.

Of course, I didn't mean to do that. So why did I?

John asked if I was banning any IP addresses from accessing my site. Sure enough, I was. Like most sites out there, spammers also have an interest in mine. While I have various technologies in place to block them from posting content that violates our policies, sometimes they sneak through. For them, I've banned their IP addresses directly.

In banning IP addresses directly in response to spammed content, I also accidentally banned Googlebot.

Whoops! How would I have known that? I didn't know one of the many IP addresses for Googlebot. I didn't know to do a reverse IP lookup to determine that banning 66.249.73.% was indeed preventing Googlebot from indexing my whole site.

(Tip: You can do a reverse IP lookup at a site like this to make sure you're not inadvertently banning an IP address that you shouldn't be).

(Another tip: If you encounter an issue similar to this, use this highly practical and responsive Google Group to help you diagnose your issue. This is what solved it for me.)

While Google Webmaster Tools clearly showed me the problem, I still wouldn't have known why until John tipped me off. Though I'm still waiting for Google to index my site again and for my PageRank to return, I can rest calmly now that I've definitively found the problem and I could potentially help to prevent you from making the same mistake.

While the answer here was indeed in existence, in the end it ultimately lacked visibility and a clear explanation of what to do about it.

Just as GoDaddy could have sent an automated e-mail notifying me that I was about to go way over my disk quota and be charged thousands of dollars in penalties, Google could have also sent me an automated e-mail to let me know that thousands of URLs had been restricted from their index.

That tool could be set with a user-defined threshold so you as a Webmaster could determine when you'd receive that message. You'd essentially pick how many errors would have to be registered in order for you to care about it.

For me, that simple e-mail would have saved me weeks of lost traffic, hassle and headaches. For you, that simple e-mail could prevent such an issue from ever happening in the first place. For Google, that simple and proactive e-mail could mean what's arguably the best search engine in the world could be just a bit better.