28 February 2008

Content Scrapers, Scum of the Interwebs

Don't you just love it when you do the work and someone else takes all the credit? Yeah, me too. Imagine my surprise when I saw that all my blog posts were posted elsewhere. The entire text was taken. There wasn't a credit or linkback attached. They were just plain...usurped! Yes, UbuntuLinuxHelp.com steals articles. Don't bother looking for my stuff now; I've already complained and had my content removed. You might want to check and make sure your blog isn't being ripped off too, though. What's even more fun is that they then CC-by-SA-NC-license the articles. There's a page listing their licensing. Yes, they republish and relicense other people's articles without permission. Isn't that just lovely?

There are quite a few blogs which are being ripped off in this way. I've attempted to contact a bunch of them. We'll see how that goes.

For the sake of clarification, I want to point out that I don't care if you take a paragraph or two and then do a "link to full article here...." thing. If you want to translate a whole post into a foreign language, that's fine (I saw a site today that did that) as long as there's a linkback with it. But please, always remember to cite your sources. If you don't, it's plagiarism.

EDIT: the person responsible for the plagiarism has been fired from the site.


matthew said...

Now I feel left out. None of my blog posts were stolen. :(


Anonymous said...

This comes up every so often on the blogs I read. But surley there's got to be a more open-source-y, technical solution to this if the people affected put their heads together. Everyone contacting these sites individually just doesn't scale.

My thoughts include:

* does it really affect you if people 'steal' (i.e. copyright infringe) your blog posts?
* does this make it harder or easier for people looking for ubuntu/linux info or help to find what they're looking for with Google
* these sites are clearly attempting to make money. How can you hit them where it hurts? By contacting Google or other ad providers & search engines and having them barred from advert contracts or removed from search listings?

How about a site that you register with, it reads your blog content via your feed then regularly searches for it on other blogs. These blogs could be ranked and lose points for lack of original content, no linkbacks, stripping copyright info, excessive advertising or other signs of spamminess which should seperate the bad sites from personal planets and community aggregators.

The group running this site could then be assigned the power to contact the advertisers on behalf of all the registered users with their content being re-used, with a standard list of demands (I'd suggest linkbacks and not stripping copyright should be sufficient. I'm more worried about Google-pollution than plagiarism per se. But maybe individual users could decide what they expect for their feeds)

On a less idealist theme maybe someone could write an app that spams the comment sections with "stop stealing my content you parasite".

Anonymous said...

Look at their latest post. Seems like someone over there got fired, tossed out or something. Good for them! Were you the one that tipped them off?

Mackenzie said...

The IP subnet here is blocked from accessing that domain because they didn't like it when I left a comment asking "so who's blog did you steal this from?" on one that turned out to be original.

Mackenzie said...

Ah correction, the owner emailed me to tell me that he found my comments on the blog in the moderation queue and got rid of the guy. He apparently unblocked my IP address before he left. Now he's looking through the server trying to see if he can find logs showing what was there and what's been removed etc.

Philip said...

Good news that the site owner eventually listened to you. Sad news that "someone" got fired over this issue. Surely (re-)educating the individual in the correct use of licensing would have been a better option?! :(


Przemek Kulczycki said...

Report them to Google!

Jeffrey said...

Congratulations on spotting and stopping the plagerism. While I disagree with Phillip, that guy should have been fired, it doesn't sound like he thought he did anything wrong and had to be removed. (Just the impression I got from the post.)

James Henstridge said...

It seems a bit ironic that the post about the person being fired for plagiarising your content includes a trackback from a site that has scraped this article ...

Mackenzie said...

I filed a complaint with AdSense about the person running the site doing the trackback