Lately I’ve noticed a rise in the number of Google search results that just lead to a bunch of ads plus some automatically-generated content copied from other web pages, rather than pages with the original content I’m looking for. This is the latest step in an ongoing arms race between the search engines (and their users) and so-called search engine optimization companies that try to funnel searchers through to their customer’s ad-laden sites rather than going direct to the site they want. The SEOs are essentially using Google’s own infrastructure against it, creating Google-hosted blogs, generated using content from (I’m guessing) the results of Google searches, all sprinkled with links to pages containing nothing but Google-supplied Ads.
Google’s trying to stop folks from gaming the system like this, but I expect there’s some kind of fundamental limit to what can be done to stop it. You could probably even describe it as a theorem:
For any automatically-indexed search engine of sufficient size, it is possible to construct a document that has a high page rank for a given query even though the constructed document adds no useful information beyond that which would have been returned without it.
A corollary would be:
The more complete a search engine is in terms of documents indexed, the lower the relevance of its search results will be in terms of the ratio of documents with original content vs. documents that simply copy information from other pages.
If this does, in fact, wind up being a fundamental theorem for search engines, I have a humble suggestion for what we should name it: Göögel’s Incompleteness Theorem.