A study, conducted by Raven tools on 200 million page crawls, found that 29% of the pages had duplicate content issues.
You work hard to dig up subjects that your audience would be interested in reading. Then, you conduct research to ensure that you present the best ideas and data to your audience. Finally, you craft stellar content, edit and publish it.
To what end?
A scraper is waiting for your article to gain eyeballs. Once your article gains traction, he will quickly copy and paste the content on his own website without any a second thought.
He will receive a chunk of your traffic by copying your work. And, he’ll also claim to be the creative person that put together the piece.
But wait, there’s more…
The copied version of your article can even beat your original one in search rankings for your target keyword (the chances of this happening are tiny, though).
Here’s an example of a repurposed piece of content and the original one, both ranking for the same keyword. The same can happen when a scraper steals your work.
So, a scraper can hamper your brand and rope in traffic from your hard work.
I know this is a major pain. Creating engaging content was cited as the biggest challenge by B2B marketers, in the 2016 B2B Content Marketing benchmarking report.
Further, 81% of marketers are planning to increase their use of original content, as per Social Media Examiner’s report.
So, in this article, I want to help you deal with copycats. Sure, you won’t be able to stop every scraper from stealing. But, you can definitely use the situation to your advantage.
Before I share strategies and tools that you can use to fight scrapers, I want to tell you some good news…
There’s a fear, floated by many marketing gurus, that Google will penalize you for duplicate content issues. That’s why many bloggers are afraid of repurposing their own content.
But, Matt Cutts officially stated, in 2013, that there’s no such thing as a duplicate content penalty.
25-30% Of The Web’s Content Is Duplicate Content & That’s Okay.
Yes, Google does reserve the right to penalize your website for duplicate content, but only if you’re excessively copying content and trying to manipulate search results.
Alright, now that all’s not lost when your content gets plagiarized, let’s begin with the tools and steps that you need to follow in dealing with scrapers.
Generously perform internal linking in all of your articles, monitor incoming links using Google Webmaster Tools and use this WordPress plugin
You cannot stop people from stealing your content, but how about driving some free traffic?
If a scraper copies your content – he probably won’t put in the effort to remove the hyperlinks.
So, a great way to take advantage of them is by performing generous internal linking in all of your articles. It’s great for your website’s SEO, improves your website’s crawlability and aids your visitors in navigation. Besides, you’ll also drive traffic to the linked pages from the copied article.
Your internal product and category pages might not attract links on their own. But, by linking to them, you ensure that the link juice on your website flows well.
That’s why I also refer to my guides on internet marketing regularly, in all my articles.
Glenys performed an internal linking experiment and was able to rank a buyer keyword on the first page of Google and Bing by adding 14 internal links.
Ensure that you refer to relevant articles/pages on your blog and use descriptive anchor text (don’t stuff keywords in every hyperlink), while internal linking.
I know it feels good to get links and referral traffic. But, what if the scraped website is flagged by Google?
You don’t want to get penalized by Google for shady links. So, you need to keep monitoring your webmaster tools for new incoming links. Navigate to Traffic > Links to Your Site.
From the domains listed inside this section, you can click on any domain to find the specific pages that the links come from.
By clicking on cornerstone-works.com, Kristi Hines found that the website was blatantly copying her post titles. On visiting the website, she found that whole articles of her were copied, word-for-word.
The Google Webmaster Tools will also notify you if the search giant finds any duplicate issues with your website.
If you’re on WordPress, you can also set up trackbacks to get notified whenever a website links to a page on your website.
After being listed in ProBlogger’s 20 Bloggers to Watch in a 2012 article:
Kristi Hines found that 18 sites had stolen her content verbatim (with links intact), in her WordPress dashboard through trackbacks.
Besides internal linking and monitoring incoming links, I also recommend that you ping the Pubsubhubbub plugin.
What does pinging the tool do?
Remember I told you how a copied version of your article can outrank you? This happens mostly in situations when Google discovers the copied article before your original version.
The Pubsubhubbub plugin is a way of telling a trusted source that you’re the original source of the article and published it first.
If you aren’t on WordPress, here’s the pubsub protocol you can use.
Set up link attribution for your copied content with this ‘magical’ tool
People will copy content from your website using the keyboard shortcut, right-click or the drop down menu.
But, wouldn’t it be great if, when anyone copies a certain amount of text from your website – you place a link to the original piece of content in the copied section of text?
If you could do that, the link to your website would automatically appear at the location where the text is pasted. Here’s an example of how Rolling Stone place their social media account and original article links, whenever someone copies content from their website.
There are 3 parts of the code – The Core Code, The Modification Code and the Hook Code. Here’s how to set up the tool by using them.
Step #1 – Download the Core Code from here and save it as ‘copyenrich-filter.js.’ Then, download the Modification Code from here and save it as ‘copyenrich.js.’
Step #2 – Upload the files on your server, under a new directory, like – http://www.yourdomain.com/js/.
Step #3 – For activating the script on a page, you’ll need to add the hook code into the footer – just before the body closing tag. If you want to activate the script on the whole website, you’ll need to make changes to your website template.
Now that you’ve activated Copy-Magic-Paste, let me show you a few customizations that you can perform.
Set the minimum length at which you want to modify copied content – You can change the number of characters at which you want your website citations to appear, by modifying the number under filter_minlength.
Add additional text behind your source URL – You can also change the content that appears before your URL under filter_source_url.
The default is “Found on:”
Note: Leave the \n\n in place. Otherwise, the copied content will be lost.
Track the number of times your content gets copied from Google Analytics (GA) – If you’re analyzing your website traffic through GA, then you can even track the copies of your content with this script. It contains a parameter ‘copy on page’ that you can check inside GA.
If you want to change the name of a parameter, feel free to do so by modifying the filter_analytics_name.
You’ll find the copy events in GA by navigating to Behavior > Events > Overview.
Want to know the exact text from the exact page where it was copied?
Then add ‘page’ as a secondary dimension, after navigating to Event labels > View full report.
There are other customizations and functions of this brilliant Copy-Magic-Paste tool that you can review here.
Stay on top of the scrapers using these three tools
Unless you’re performing regular searches of all your article titles or random chunks of text from your website, you cannot possibly find every website that copies your content. A better alternative is to get help from these 3 tools.
Screaming Frog – This is a free tool that can crawl up to 500 pages on your website and find duplicate content issues.
After downloading and installing the program, you’ll need to enter your site’s URL. Then, press the ‘Start’ button.
Once the tool returns with the results, you can click on a field (from page titles, meta description, H1, H2, images and more) and choose the duplicate filter for finding duplicate content occurrences on your website.
Copyscape – The tool promises to be the most powerful plagiarism search engine on the web. You just need to enter the URL of your content and let the tool find out if there are any duplicates floating around on the web.
You can also buy Copyscape premium to check up to 10,000 pages on your website with a batch search. They also offer banners (like the one below) to warn plagiarists and defend your website’s content.
Google Alerts – This is an awesome free tool, by the search giant, for finding if your content is re-published anywhere online. You can set up an alert when the exact title of your post appears, by putting it in quotation marks.
You can either set up an email notification or even send these alerts to an RSS feed, like Kristi did.
File a DMCA complaint with the hosting provider of the scraper and later with search engines
Once you’ve found the websites that are copying your content, you can either choose to ignore them or go after them.
If you’re getting good referral traffic and links that help your SEO (by the tools and strategies I outlined above), then skip this step. But, if they are a high authority website and can outrank you in search results, then going after the scrapers might be worth your time.
Start with directly getting in touch with the webmaster of the site through their contact form, social media accounts and email addresses. You can use the Who.Is tool to find information on who owns the domain and the administrator’s address.
If they are a professional website that accidentally copied your content, they will remove your content after seeing your notice.
If they don’t respond (or you can’t find any email address/contact information), then I recommend that you to get in touch with the hosting provider or the domain registrant. The information is available by plugging the website in the Who.is tool.
In the above screenshot, it’s visible that the website is registered at Godaddy and hosted on Hostgator. So, you can fill in the DMCA form at Hostgator here. You can also get in touch with Godaddy at [email protected] to notify them that a website that’s hosted by their company is stealing copyrighted material.
The hosting providers and processors of digital information are required to follow a stringent procedure for removing copyrighted content under the Digital Millennium Copyright Act (DMCA).
If the host doesn’t take action (suspend/remove the website or take down the content), then:
1. Hire the takedown services by DMCA ($ 10/month).
If you’re on WordPress, you can even use their badge on your website to warn thieves.
2. File a DMCA complaint with Google, Yahoo and Bing directly to de-index copied content.
You’ll be required to fill in a simple form with details of the copied content links and your original copyrighted work.
Google has been separating duplicate from original content ever since 1997. They know the web contains a lot of it. But, “duplicate content” only became a buzzword in 2005. Look at how the interest in the phrase climbed.
You need to remain careful, though, and ensure that you get the maximum benefits from your quality content. Ultimately, remember that Google is on your side and scrapers can even get completely removed from the search giant results.
Has your website content ever been copied by any other website? How did you resolve the issue?
See How My Agency Can Drive More Traffic to Your Website
- SEO - unlock more SEO traffic. See real results.
- Content Marketing - our team creates epic content that will get shared, get links, and attract traffic.
- Paid Media - effective paid strategies with clear ROI.
Are You Using Google Ads? Try Our FREE Ads Grader!
Stop wasting money and unlock the hidden potential of your advertising.
- Discover the power of intentional advertising.
- Reach your ideal target audience.
- Maximize ad spend efficiency.