Duplicate content is a page on your website that has the same or very similar content to another page on your site (or someone elses’). It can be a duplicate of an entire page or just a section of it. Duplicate content lowers the ranking of your website with search engines and makes your message appear spammy and unprofessional.
The problem of duplicate content has several sides as well as can have different causes. From technical or unintentional errors to deliberate plagiarism, many factors can cause duplicate content. Before we discuss technical aspects, we have to first understand what content duplication is.
This article will also discuss some of the most important reasons why people have duplicate content on their websites and how to avoid it.
What is Duplicate Content?
Google describes duplicate content as “substantive blocks of content within or across domains that either completely matches other content in the same language or are appreciably similar. Mostly, this is not deceptive in origin.”
In other words, it is exactly the same content that can be found on different web pages in Google search results.
There are two main types of duplicate content
- Intentional (aka plagiarism)
- Unintentional (that results from technical reasons)
Sometimes, webmasters unintentionally create and publish internal duplicate content on their websites. At other times, they knowingly steal the content from other websites and publish it as their own.
Let’s discuss some causes and quick fixes for both of these types of duplicate content.
Intentional Duplicate Content or Plagiarism
Intentional duplicate content or plagiarism refers to the copying and pasting text from another website and posting it as your own. Unfortunately, it is not a rare phenomenon and lots of websites are suffering from it.
Causes of Plagiarism
There are several reasons why web content plagiarism is increasing every day.
- Rising Content Demand
The amount of content available on the internet has significantly increased over the past several years. The majority of firms are having trouble keeping up with the competition, let alone getting ahead of it.
The ever-increasing demand for material does not, unfortunately, always result in the originality quality that search engines (and consumers) are looking for. Consequently, content plagiarism is increasing.
- Need for Creative Ideas
It is always difficult to come up with new and creative ideas to engage. Organizations need to allocate lots of resources and time to create unique, original, and engaging content. When they lack those resources, they look for shortcuts and hire cheap writers who plagiarize and produce low-quality content.
- Plagiarism is Not Considered a ‘Problem’
At other times, small business owners do not have sufficient knowledge of internet marketing, best content development practices or on-page SEO. They tend to read a blog post they consider ‘good’ and copy it or reword it, thinking that it will engage their audience and help them rank. They don’t realize that the plagiarized content is not optimized for their website nor does it address the needs of their particular audience or market area. Just because we think some content sounds good doesn’t make it appropriate (or ethical) to publish on your website.
Types of Plagiarism
There are different ways of plagiarizing content.
- Direct Plagiarism
Direct plagiarism is the worst type of plagiarism. And we’re not just saying that because it rhymes.
Direct plagiarism is when you copy-paste an entire article or blog post, remove the original author’s name, and submit the new page (to the search engine) as your own work.
This type of plagiarism can result in getting kicked off of Google, or even worse: getting deranked without being given any warning at all!
It also opens you up to legal liability as many businesses who invest the time in original content are also pro-actively leveraging free duplicate content tools to ensure they’re not being plagiarized. Bloggers, SEO professionals, photographers, and video content producers are always on the lookout to protect themselves from plagiarism.
It is frequent practice for anonymous writers who work for cheap to rewrite text from another website. This is the most unproductive (but also the quickest) approach to writing without making use of spinning software.
If your writers are skilled at rewriting, the work will get past a plagiarism checker without you ever knowing about it, but it may still be unethical or unlawful.
In addition, patchwriting creates a number of issues for SEO. Google loves original, fresh content that provides value to the readers. When writers recycle old postings, they contribute nothing new to the conversation. Rand Fishkin does a great job at explaining the concept.
- Copy and Paste
Sherry Gray describes this type of plagiarism in a SemRush blog article. According to her, cutting and copying fragments of text from a variety of online sources in order to piece together a coherent article is a great example of sloppy writing.
When there are a lot of writers working on the same topic, it is possible that there will be some overlap in their work. That won’t get you in trouble with search engines. When many sentences or even entire paragraphs are taken from other websites, this becomes a separate offense.
- Self Plagiarism (Accidental or Intentional)
Some writers accidentally plagiarize their own work. They consistently use the same words, phrases, and patterns in their communication. When one writes in a specialized field, there is a high risk that they would unwittingly plagiarize their own work without even recognizing it. At other times, the writers deliberately copy and paste the sections of their own work on their website. If it happens frequently, it can also harm your ranking.
Unintentional Duplicate Content
Unintentional duplicate content can result from a variety of technical issues. Some of these are the following:
Causes of Unintentional Duplicate Content
- https:// Version Vs. http:// Versions
https:// and http:// are two separate ports. If you publish the same content with these two live versions, they will be considered duplicates. The problem of duplicate content, in this case, can be resolved by utilizing the canonical link element. This small code is added within Header tags of your site’s page and allows the webmasters to declare their favorite location of the content.
Websites that are more than 5 years old or were built before the days of SSL certificates, usually suffer from this issue as those sites tend not to have regular maintenance. If you see a little red slash through the ‘lock’ icon next to your URL in a browser window, this probably affects you.
- URL Parameters
Sometimes a website has more than one URL variation caused by parameters like click tracking and analytical code. The order in which these parameters appear in the URL can also cause the problem of duplicate content.
Similarly, sometimes a website has a printer-friendly version of a piece of content. If that version gets indexed along with the regular version, it can also cause a duplicate version. As a result of these technical issues, several duplicate pages of a web page can be created.
One way to avoid URL duplication is to use a URL format consistently across all internal links. Choose an internal link format and use the same for adding internal links to your website.
Drawbacks of Duplicate Content
There are several myths about duplicate content penalty and some webmasters consider it more harmful than the toxic backlinks. In reality, most unintentional duplicate content doesn’t cause a penalty.
However, both intentional and unintentional duplicate content can harm search engine rankings. How does it damage your rankings? Let’s discuss it in detail.
- Lower Search Engine Ranking
The first drawback of duplicate content is lower search engine rankings. According to the Google Developers Guidelines, “Google tries hard to index and show pages with distinct information.” This means that if your web page doesn’t have unique content, it can hurt your search engine rankings.
Sometimes when there are multiple versions of the content available, it confuses the search engines. As the bots are unable to decide which version should rank higher, it lowers the performance of all the versions of the content.
- Burns Crawl Budget of Google Bots
Google routinely spends loads of resources to crawl/index a website. These resources include servers, personnel, internet and electricity bills, and many other costs. And it doesn’t stop at some point if the website is big; this process continues until there isn’t any more content to be found on the site. If Google crawlers crawl your web pages and find the same content again that it has found elsewhere, it will stop crawling your site. This might leave your site or important pages of your site un-crawled. This can ensure that all your content in indexed and crawled by Google crawlers.
- Ethically Wrong to Plagiarize
Plagiarism is ethically wrong. It’s a form of stealing when you take someone else’s work and pretend it’s yours. Here are some reasons why plagiarism is considered lying to readers.
- You’re pretending that the ideas and words are actually yours and it’s highly unlikely that content is a true reflection of your professional views, approach, services or even overall understanding of the topic at hand.
- It steals credit from others. It’s like replacing a trade service’s lawn sign (ie: roof by ABC Roofers) with your own. If the original roofer notices (and they probably will), they’re going to come after you. If customers notice, it’s really going to hurt your reputation and destroy your credibility within the market.
Eventually, plagiarizing hurts everyone involved.
- Less Organic Traffic
When you are not getting the right kind of organic traffic, it often has something to do with duplicate content. We all know that the top ranking SERPs (search engine results pages) for a keyword get the most traffic. Lower-ranking due to duplicate traffic=Less organic traffic. To avoid duplicate content, make sure you check each page on your site for these common mistakes:
- Copying and pasting from another page without changing anything.
- Copying and pasting from an external source without changing anything
- Using the same meta title tag across multiple pages
- Dilutes Link Equity
Duplicate content is like a virus that can spread through your entire website. It dilutes the link equity. If multiple pages on your website have the same or very similar content, the other websites don’t know which page they should link to. Consequently, they make backlinks to multiple versions of content instead of making backlinks to one page. This lowers search engine rankings and affects your credibility as a brand.
One of the best ways to get rid of duplicate content is to ensure that every page on your site has unique content, even if it’s just a few words different from the previous page. Common places to look include reiterating of your mission statement, about/team bios, and spotlights on specific offers/programs. Slight recording of these areas between pages is very important.
You can also use software that checks for duplicate content across your entire website in order to fix any issues before they become major problems.
- Poor User Experience
External duplicate content is a UX (user experience) nightmare. When your website visitor sees the same (scraped) content repeated across multiple pages/domains, they’re likely to leave your site in search of one that doesn’t have any duplicates. This is not what you want when you’re trying to drive traffic and conversions during a content marketing campaign.
- Visitors expect to see unique content on each page of each site they visit. When they don’t find what they’re looking for, they get frustrated and leave your site, never to return again.
- Duplicate content reduces the value of your brand by diluting its message and confusing visitors. Your potential customers can’t be sure about who you are, what you stand for, or how you’re better/different than competitors.
Someone Stole My Content: Will that Duplicate Content Rank Higher?
Sometimes we see the duplicate version ranking higher than the original version of a web page. If duplicate content is so bad, why does it seem to rank higher sometimes?
According to John Mueller, the Search Advocate of the Search Relations Team at Google, Google may give duplicated or plagiarized content preference over the genuine. He indicates that it is because of the low overall quality of the website belonging to the original publisher.
If you’ve a ton of original, quality content that could help you rank but your site is a technical SEO disaster, unorganized, old and unmaintained, and/or you’ve other factors working against your authority (like poor quality backlinks), your sites overall ‘authority’ could negate the quality of your content.
If Google determines that another website that contains duplicate content has a higher quality (overall), it will prioritize showing that website’s version of the content higher in the search results than the original publisher.
This tells us that Google rankings are determined not only by the content of a website but also by the metrics of the website as a whole.
Improve the general quality of your site, and rank better in the search results by doing so.
How to Solve the Problem of Intentional Duplicate or Plagiarism?
Web content plagiarism can harm the original version as well as the duplicate version. Sometimes, even your entire website is plagiarized without your knowledge and you get penalized by Google. It is because Google algorithms are not perfect and they guess but can not tell for sure which version of web content is original and which version is a copy.
The best strategy is to stay vigilant and ensure that there’s no duplicate version of your content on the internet.
Typically search engines like Google aren’t going to knock on your door to tell you they’ve penalized you for duplicate content.
There are several ways to trace and identify plagiarized content.
- Use a Duplicate Content Checker Tool
Several free as well as premium tools can help you find out if someone is plagiarizing your web content. Use one of the tools to identify duplicate versions of your content on the web. If you find out that some small part of the content on your website has been duplicated, try to rewrite it immediately.
Some of the duplicate content checker tools are:
2. Prove that You Wrote the Copy First
Use the Wayback machine at Archive.org. If there’s a duplicate version of your content somewhere on the internet, this tool can prove who published the copy first.
3. Report to Google
Use this Request form to report duplicate content to Google. Google takes action against such cases of content theft and penalizes those websites.
How to Solve the Problem of Unintentional Duplicate Content
Unintentional duplicate content primarily results from technical reasons. As we mentioned earlier, if you accidentally create multiple versions of a web page, they will be seen as duplicate content by Google.
The following are some ways to fix this type of duplication.
Set up a 301 redirect to solve the problem of duplicate content.
This is because Google will only index and rank one copy of each URL. So if you make sure that you’re only using one URL for each piece of content, then when someone goes to that page, they’ll be taken to the correct version-no matter where they come from-and there won’t be any duplicate content issues.
A 301 redirect goes from one page to another permanently (so there is no need for additional code or anything). This means that once you have set it up, it will always work.
Rel=canonicals is another way of dealing with duplicate content. This attribute tells the search engines that a given page should be treated as a copy of a specific URL. This way, all the links, content metrics, and ranking power applied to that page should be credited to the specified URL. Add the rel=canonical attribute to the HTML head of each duplicate version of a page.
Like a 301 redirect, Rel=canonicals pass the link equity or ranking power to the duplicate page. However, it takes less development time to implement.
Self Referential rel=canonical Tag
The people who aim to steal your web content also pose a threat to your SEO ranking. You can add a self-referential rel=canonical link to your existing pages. While not all scrapers will port over the full HTML code of their source material, some will do it for sure. The people who copy the self-referential rel=canonical tag will ensure that the site’s original version gets the credit for the original piece.
It is estimated that around 30% of the entire internet is impacted by the issue of deliberate plagiarism or unintentional duplicate material.
If you are affected by duplicate material, we hope that you now understand how to resolve the issue and mitigate its effects on your site’s authority, ranking, and credibility.
The best strategy to deal with content duplication is to stay vigilant. Use a plagiarism checker tool to detect plagiarized content. Resolve the technical issues on your website to ensure that you are not accidentally creating multiple versions of your web pages.
Dealing effectively with the problem of duplicate content ensures better SEO and better web performance.