SEO Tools & Process to Fix Google Panda Penalty

Updated: December 26, 2012

Google, Bing and Yahoo cracked down hard on duplicate content starting December 2010.  Penalties hit hardest on February 24, 2011 in the Google Panda algorithm update.  Bing and Yahoo rankings followed suite.

How To: The SEO Tools and Process to Address Duplicate Content

Compare Webpage Duplicate Content

Comparison SEO Tool

An SEO services client with which we work has developed multiple websites for different brands, but the client recycled the content.  Instead of writing 100% unique text for each website, paragraphs and sometimes whole pages were used universally across multiple websites.  They were getting away without noticeable revenue loss, so despite existing duplicate content penalties (though not actual penalties – more accurately wasting crawl budget and possibly dividing link juice) on interior entry pages, the client decided it was not a big enough priority to rewrite all the content … until now.

February search engine algorithm updates penalized entire websites that have pages similar to any other site that the search engine credits as the originator. Even if words are rearranged and the brand name is switched out, the Google algorithm is not fooled.  Google chooses one website as the originator and penalizes the others.

In late 2010, various rankings started to slip.  On February 24th, clients with duplicate or similar content across different websites saw a total drop off for #1 ranked keyword phrases.   Google guidelines for duplicate content indicate that the algorithm perceives these similar pages as  “deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic“.

By the way, these guidelines were [initially] updated March 20, 2011, less than a month after the [first] Panda algorithm update.

Systematic Process of Identifying and Addressing Duplicate Pages

If you are intimately familiar with your websites, like this search engine optimization consultant is, you already know which pages are similar and possibly causing duplicate content penalties. If you are an SEO agency taking on a new client with duplicate content issues, leaving it up to you to figure out where the duplicates are within their online properties, then you may need a few SEO tools to help identify possible duplicate pages.

UPDATES: Google Panda Filter Dates


To determine if your website fell victim to a Panda filter, check your traffic.  Panda “penalizes” your website by dropping your rankings, not just on pages with duplicate or thin content, but universally across your website – including your homepage.  If you were hit by the Panda Filter, you will see a significant traffic drop on one of the dates above.  Subsequently, if you address the issue, your traffic should be adjusted the next time the Panda filter runs.  Each time the filter runs, it updates the Google index.

Update: February and March 2012 updates were merely when the Panda Filter was run again in order to refresh the Google index, so that changes made to websites will be reflected in Google.  In other words, if you implemented fixes to get out of the Panda Filter before February 28 or March 23, you can check traffic at those dates to see if your fixes did the trick.

Identify Duplicate Content

This paid tool makes your process simple, but costs money. Copyscape is a tool for finding copyright and plagiarism offenders.  Since Google generally penalizes the copier and not the original author, plagiarism is not an SEO issue.  For this reason CopyScape is not a regular part of the SEO arsenal of tools.  Don’t ask me why as an SEO I even know about it, but I’ve known about it for years (guess that’s part of the multidisciplinary Blend SEO approach).

The process to identifying duplicate pages is, using the paid version called CopySentry, you can feed it your non-penalized website and let it find the duplicate content out there amidst your penalized websites.

Using free tools takes a little more time and effort.  Download and install A1 Sitemap Generator, a great sitemap generator program with a fully functional free 30 day trial.

  1. Run a scan of your penalized website.  It will generate a list of pages of the website, ignoring those blocked by robots.txt, following any redirects or canonical tags – meaning you have a list of webpages that spiders crawl.
  2. Among the sitemap output choices, you can create a text file list of page URLs with each URL on a separate line.
  3. Paste this list into excel and start your search for duplicate content.  Use your intuition and Google to check blocks of text.  If a different website ranks number one for any block of text, that website is credited as the originator.  Mark this URL and the originator URL.

Test or Check Duplicate Content

Once you have a list of pages from your penalized website and their counterpart on the originator website, you will want to check their similarity.  Are they similar enough to require a rewrite?  Run the URLs through a webpage comparison or duplicate content checker tool.  After you go through several pages between two sites, you will eventually get a feel for where the cut-off is for rewrite verses no rewrite.  Any pages with similarity higher than your cut-off require a writer to take a look for the duplicate or similar language.  Similar content on the penalized website must be completely rewritten.

Update: Unfortunately, my favorite duplicate content diagnostic tool has been abandoned.

This first similarity tool is my favorite.  Some only give a single percentage, letting you wonder how much of that similarity is due to non-visible code.  This tool tells you, without inundating you with too much information.  There’s no captcha, so checking through your list is quick.  I embedded the form below, so you can try it here.


Compare Webpage Duplicate Content

This next tool, embedded below simply gives you a single percentage.  Depending on the template between your penalized and originator websites, the number you get will seem pretty low.  My cut-off with this tool was about 10%.  Anything over 10%, required a copywriter to rewrite the page or at least a section of the page.

Similar Page CheckerEnter First URL
Enter Second URL

 

This comparison SEO tool by SEO Book compares the page titles, meta information, and common phrases occurring on different pages.

Here are several more alternatives you can try.

There are 7 comments for this article
  1. Jeanine Grondahl at 8:55 am

    This information is really realy very important for me. Thank you very much for such helpful information.

    • Gregory Lee Author at 9:27 pm

      Mr. Free, its not any worse than using commenting for link building. You just have to be careful. If you make sure you submit to relevant directories in a way that would be useful for users, then you’re fine. Just like your comment – crafted with good enough English to almost pass for sloppy typos, somewhat on topic. If this blog wasn’t human moderated and I just let any comments through, I would get a penalty for linking out to anyone and so would your client who apparently sells voice over IP. The same would go for a directory – you would both get penalized for building links carelessly and for giving out links carelessly. You better be careful comment link building using your exact keyphrase for anchor text. People are saying Google is acting out against repetitive keyword anchor text links. You better be changing up the anchor text keywords you use for your commenter name as much as possible if you don’t want to get your client penalized.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.