Duplicate Content

Content that is published on multiple pages, without having at least a 20-30% variation may be considered duplicate content. The 20-30% is not an official stance but based on the research and learning over time. Sometimes digital marketers will refer to a ‘Duplicate Content Penalty,’ but Google claims that this is not a penalty, just a protection for the first publisher. It is OK to block quote in text as long as it is attributed correctly, but you should not expect to rank on the block quote, if Google already has it indexed from elsewhere on the web. In some cases, companies may have a good reason for publishing the same content twice. In the case that you need to do this, and multiple versions of the same piece of content are crawlable, Google requests that webmasters use a ‘canonical tag’ to indicate to Google which version of the content you would prefer Google ranks. In cases where the content is available on multiple domains, you can use a ‘cross-domain canonical tag’ to link from the content on the various domains to the version of the content on the domain that you would like to rank, to consolidate the SEO value of the various copies of the content in one place. Many sites will accidentally and un-knowingly allow their server to create duplicate versions of pages that are indexable, because of differences in the URL that do not change the content on the page. This is sometimes referred to as DUST – Duplicate URL, Same Text. This can happen with variations caused by tracking code, or information that is passed in the URL, or sometimes even with lack of standardization in URL rendering for HTTP and HTTPS, trailing slashes and inclusion of a ‘www’ subdomain or not. Self-referencing canonical tags can help prevent these situations from having a strongly negative impact on rankings.

Related Terms:

Canonical Tag
Indexing