Posting duplicate content on your sites has never been a great idea because it’s not customer friendly. In the post-Panda world, posting the identical content in multiple places can seriously damage your page rankings and make visits from Google’s web crawling spiders less and less frequent.
Though the problems surrounding duplicate content sound simple, once you dig in you’ll find that the subject is surprisingly complex. Besides the many sites that re-use content without care, there’s plenty of sites that wind up generating duplicate content without even realizing it.
Let’s take a quick look at what exactly constitutes duplicate content and how webmasters can deal with it as efficiently as possible.
Defining Duplicate Content
The short definition of duplicate content is any content that appears on two or more pages. It doesn’t matter if the pages have different URLs, if the content is identical, the spiders will tag it as such.
Understanding what constitutes duplicate content is important because many sites wind up generating thousands of duplicate pages without ever realizing what they’re doing. Dynamically generated pages that get indexed by search engine spiders are the main culprit behind untended duplicates.
It’s especially important that affiliates are clear on this concept and how to rectify it (which we’ll discuss in a moment) because affiliate tracking codes create indexed pages in exactly the manner described here. As your new prospect clicks through your site, he or she is generating unique pages that are duplicates of the original, only the URL contains the tracking code, too.
This is also a problem for sites that utilize the new https:// protocol. Without even thinking about it these pages are now generating https:// pages to go with their http:// pages that were already indexed.
Why Duplicates Matter
Duplicate content has always been on Google’s radar, it’s how they deal with it that’s changed. In the olden days when Google spiders ran into a bunch of duplicate content they put the main file on top and duplicates in what they called the supplemental index.
The supplemental index helped keep all those copies of the same info from appearing on the main search pages. That’s why you sometimes get that message from Google asking if you want to repeat your search with results that were omitted.
Learn more about this straight from Google at the following links:
We also encourage you to use your Webmaster tools!
In the post-Panda SEO world duplicate content can cause some serious harm to your site’s overall rankings.
Getting your head around spider bots and web crawlers is tough because so much of what Google does seems to border on the magical. But even the mighty Google only has so much bandwidth. There’s a limited numb er of pages that get crawled every day. If they repeatedly run into duplicate content, they won’t pay as many visits to your site.
Dealing with Duplicates
If you, or your site, has been creating duplicate content for any length of time, you’ll want to deal the problem as quickly as possible. How you deal with it depends on how many duplicates you’re dealing with.
- 404 Error – Take the duplicate pages off your site, replace them with 404 Error and Google will handle the rest.
- Google URL Removal Tool – Over at the Google Webmaster Tool Page is an app that allows you to submit individual URL’s for removal from indexing. Sorry, there’s no batch submit option.
- Canonical Tags – All the big search engines give you the option of marking certain pages as the sole copy of content that should be indexed. Canonical tags are easy to implement and there’s a great description of them here.
- Follow/No Follow Tags – This attribute represents another way of telling spiders not to follow the link. This tag isn’t appropriate for all duplicate content, so be certain you know the differences between canonical and follow/no follow tags before you start making changes to your code.
Much Much More
There’s a lot more to know about duplicate content than what we have room to discuss here. The bottom line is that Google is playing games when it comes to content. Old SEO tricks that once drew traffic like crazy just don’t cut it in a world run by Pandas. Quality content and plenty of it is the rule of day.