I've been following posts about copyright in the last few weeks.
Tim Hall and Brent Ozar both take issue with plagarists stealing their content.
This is an expansion of a comment I made on a post by Jake at AppsLab
Firstly, I'm personally not worried about theft of my content. But then I'm not in the league of Tim or Brent. When I get around to it, I'll rejig the blog layout so that each page has this Create Commons licence.
But the wider issue is a technical one. The internet is a lot about digitizing content, duplicating it and distributing it. The music and movie industries have battled this for a while, and not done too well. But, through the CodingHorror blog, I see that YouTube is applying technical solutions.
Basically when material is uploaded, they run it through some algorithm to try to determine if it is copyright. So I figured, how could this apply to text-based content. The key is Google, who happen to own YouTube. Yes, there is Bing and Blekko and Wolfram Alpha. There was Cuil, which died quietly a few months back. But Google is search.
It would be quite feasible for Google to implement some plagiarism block. As I indicate in the title, if internet content isn't indexed by Google, it comes pretty darn close to not existing in any practical terms.
How I'd envisage it working is this:
I write a blog piece and run it through a Google Signature Recorder. It picks out, or has my assistance in picking out, some key phrases. Then, as they index the internet world they check out for other sites with those key phrases.
If they find one, they report it to me. I can either mark it as 'OK, licenced, whatever' or I hit the veto button. If I do that, they pull the offending pages from their search index. Worst case, if they find 80% of a site is offending content, they could decide to remove the entire site from the index.
Removing the offending site from their index doesn't hurt the search facility. The original content is still there. They can even do 'magic rewrites' to count pointers and links to offending content as pointers to the original for PageRank purposes.
What might their motivation be for this ?
With Youtube, where they are hosting the service, it may be part legal necessity. But they host Blogger and Google Docs too. More than that though, this could be a service they could sell. If my content is valuable enough for me to want to protect it, I can afford to pay Google $50 a year to monitor it. The advantage of making this a paid service is that, if there is a challenge (maybe Fred says I've registered his content), there is a financial paper trail to follow.
Ultimately, as YouTube isn't obliged to host your videos, Google isn't obliged to index your content.