Kathryn recently shared some ideas about how writers can deter content scrapers—the people and bots that republish your content without your permission, often using your RSS feed.
In the comments on that post I shared a few examples of what I do in that situation to hit content thieves where it hurts—their traffic and income.
What happens if your content thief is anonymous though? Can you identify them? Who do you send a take down notice to? How do you find out who their hosting company is?
I did a bit of digging to help Kathryn identify her own scraper and which company hosts the scraper’s website so she could try to have the infringing content removed. In this post I’ll give you some tips and tools to help you do the same. But first:
Why You Should Pursue Content Scrapers
Many bloggers ignore content theft. The most common excuse I’ve heard is that it’s better for them to just take the links back to their own website that come with it. But the truth is, content scrapers can do real damage. Here are some things to consider:
- Scraped content might outrank your content. Yes, it has been known to happen.
- Low quality “unnatural” links can hurt your site’s search engine rankings (don’t assume the links they give you are a good thing).
- If you let these bad links accumulate, you might end up with a huge mess to clean up later using Google’s disavow tool. Remember, Google has a history of getting stricter about what kinds of links can hurt you, not getting more lenient. While they usually identify and discount links from big scrapers, there is no guarantee they’ll catch smaller or newer ones (or manual content thieves).
- Failing to go after content thieves could hurt your ability to license your content to other publications later, even in print. Even if you don’t do this now, do you really want to rule this option out for the future? Remember, attribution and a link back in no way changes the fact that a scraper is violating your intellectual property rights—something that has real value.
- While this won’t be a concern for all bloggers, if you happen to work as a freelance writer, failing to pursue content thieves could devalue your work. Why should clients pay you for unique material if they see you let others profit from your work for free by simply taking it?
- It doesn’t have to take a lot of time to get stolen content removed if you use standard templates for your take down notices.
- Frankly, sometimes it just feels good.
Now that you know why it can be important to pursue content scrapers, let’s talk about how you can go about it.
How to Identify Content Thieves
You know a site is stealing your content. But you don’t know who to contact to request that the stolen content be removed.
You’ve tried the obvious only to find out they have no contact page and there are no names or email addresses anywhere on the site. Who do you contact? If you can’t contact someone from the site, how do you find out who their host is so you can contact them?
Here are some tips and tools that might help you identify content thieves:
Check WHOIS records for domain registrant information if the site is hosted on its own domain.
If you’re lucky, you’ll find a name, email address, mailing address, and phone number. If you’re unlucky, you’ll only find information for a domain privacy company blocking the registrant’s contact info. You might also find their site’s nameservers (more on that below).
The Scraper’s Own Website
Browse around the scraper’s site a bit. See if they’ve responded to any blog comments. If so, their name might be there. See if they have a forum or other features on the site that might offer their name.
This is how I found the handle Kathryn’s scraper was using. She was the administrator listed for the site’s forum, which only had two users. Then I could easily tie her to the site through other sources like social media profiles where she promoted the site.
Social Media Accounts
On that note, search popular social networks for links to the scraper’s site. If there is only one person linking to it in their profile or updates, chances are good you’ve found the site’s owner. You might even be able to contact them this way with an informal request to remove your content.
Your Blog Comments
Check your blog comments from your blog’s admin area. Scraper sites often leave trackbacks in these comments. If they did, you might find an IP address associated with their site. Try to grab this information before deleting them. I usually move them to my trash folder so I still have them for reference, and then I delete them when the site finally complies with the take down notice.
Reverse IP Lookups
Use a reverse IP lookup tool. Enter the scraper’s IP address or domain name. You’ll get a list of other sites hosted on the same IP address.
If there are a lot of sites, they’re likely using a shared host, so you won’t learn much from this alone. But if there are just a few sites hosted on that IP address, it’s more likely these sites are owned by the same person. Try running them through a WHOIS search to see if there’s a common owner.
This can be especially helpful if a scraper has legitimate sites in addition to their scraper site. They sometimes only think to use domain privacy services for the scraper blog, meaning you can find their contact information through one of the other sites they own.
Next visit WhoIsHostingThis.com to help you identify the scraper’s hosting company.
Sometimes you’ll be able to find the host you need to contact right away. Other times you might get a reseller or a datacenter instead of the actual hosting company. In that case, pay attention to the domain name listed within the “Name Servers” section.
For example, with a scraper I’ve recently been dealing with, CyrusOne was listed as the host. But they run datacenters which is a little bit different than what we’re looking for—the host that leases their server space to the scraper.
Looking beyond that, WebsiteWelcome.com was listed in the name servers section. So I visited WebsiteWelcome.com and was immediately provided with an abuse-related email address.
I contacted them with a take down notice, and received a very prompt reply from HostGator, the actual host on record. They informed me they were giving the site owner 48 hours to resolve the issue, and if they failed to do so HostGator would “disable” the content in question. This is the end result you’re after.
Chances are good that these tactics will help you identify your scraper, or at least their hosting company. At that point, it’s time to take action. While it looks like a lot of work, it really isn’t. It takes no more than a few minutes to go through most of these tools, where all you have to enter is the scraper’s domain name. And you probably won’t have to use all of these.
How to Stop Content Thieves
I take a multi-tier approach to dealing with content thieves, and so far it’s never failed. While I have some go-to lawyers I would use if necessary, it has never had to go that far. So don’t assume the only way to get results is to spend a lot of money taking someone to court. It’s not.
Here’s the basic progression I follow and one I recommend:
1. The “Nice” Approach
This is when you would contact the site owner directly with a polite, yet firm, take down request, giving them an opportunity to remove the content without you taking any further action against them. (I find that my level of “politeness” is inversely proportional to the amount of my content they’ve stolen.)
I officially give them 48 hours. I’ve occasionally given them a bit longer if they contact me saying it will be taken care of soon, or if it slips my mind. It happens.
This is as far as it usually has to go, and it weeds out the truly ignorant who believe there is nothing wrong or illegal with what they’re doing.
2. Hit ‘Em Where it Hurts
If they ignore my take down request, the gloves come off. Hey, they were warned.
Unlike many people I don’t go right to the content thief’s hosting company to have the material removed. Why? They’re usually stealing from others as well, and my DMCA notice to the host will only affect my stolen articles most of the time. Plus, it feels good to strike back once in a while.
Instead, I start by hitting them where it really hurts—their traffic and revenue sources.
Getting the Stolen Content De-indexed from Search Engines
First, I contact the major search engines with DMCA requests to have the infringing material removed from their search results. This way, if the scraper happens to be getting any search traffic, that can be shut down.
Stripping Their Ad Revenue
If the site has any advertisements, the fun part comes next. Make a note of every advertiser they work with. This will usually be an ad network, but sometimes there are individual advertisers.
If the site owner is using an ad network to serve ads on the infringing content, report them to the network. They’re almost guaranteed to be in violation of the ad network’s terms. After all, the advertisers paying that ad network don’t want their ads running alongside illegally-published material. So the network is in a position to take action.
The real perk of this is that scrapers frequently have more than one website. They also can usually associate multiple sites with one ad network account. If they get their ad network account banned, it not only prevents them from running ads on the site stealing your content, but it has the potential to strip their ad revenue from all sites they own.
If they have private advertisers, you could also reach out to them. Chances are they’ll discontinue their ad contracts if they find out their company is associating with a site owner who openly breaks the law. Some won’t. And it won’t always be worth your time if there are many private advertisers.
3. Go Above Their Head
If, after all of this, you still can’t get your content removed (or if you don’t care about going through the second step at all, which is fine), go to the host.
If the host is in the U.S., they will very likely act quickly. With my own most recent scraper, I actually received an email while writing this guest post letting me know that the infringing content was removed. In fact, the entire site scraping my content was removed.
This site was also hotlinking my images (instead of hosting a copy themselves, they were loading it directly from my server to their site, which steals your bandwidth). This is what one stolen article looked like when I had a bit of fun and redirected image files to an “I steal content” image when they were loaded from his site.
Some hosts will have a DMCA request form for you to fill out. But usually you can send them a DMCA request via email. Here is a cease and desist letter template and DMCA notice templates you can use.
Even if the content thief isn’t located in the U.S. or another country that honors those copyrights, the site might be hosted there. And keep in mind that even if a host isn’t located somewhere a DMCA request is valid, they likely have their own terms of service with their customers. And it’s highly likely that publishing stolen content violates those terms. Sometimes you can get the host to take action on the TOS violations even if it doesn’t technically go through a DMCA notice.
In rare occasions you might come across your content being published on an “untouchable” site where both the site owner and the host are out of reach and uncooperative. At that point you can always opt to pursue it legally if the issue is causing enough harm. But for most bloggers, that probably won’t be worthwhile.
In that case, your best bet is to stick to getting them de-indexed from search engines and targeting their revenue streams. Sometimes they’ll remove your content just to get back in the good graces of Google and their advertisers, or to get you out of their hair.
How do you usually deal with scraping and other content theft?
About the Author: Jennifer Mattern is a professional blogger, freelance business writer, and owner of All Indie Writers where she writes about freelance writing, indie publishing, and blogging. Find her on Twitter (@AllIndieWriters) or on Google Plus (+JenniferMattern).
Image Credit: Canva