Linguist, lexicographer, radio host, public speaker

Technorati adds features but can’t get core search results right

I spend a large part of my day using data searches. Not just Google, but Google Groups, Factiva, Proquest, LexisNexis, Icerocket, Feedster, Newspaperarchive.com, the Oxford English Dictionary, OneLook, JSTOR, A9, Google Print, and a slew of other search front ends. I also grep against a couple gigs of XML, use my own site search, and constantly access two private custom-built data searches, all of them of varying complexity.

I can give you a dozen problems with any one of them—things they do poorly, stupidly, or not at all. Common mistakes include not using true Boolean, not permitting any Boolean, not storing raw text in image-based PDF files so that terms can be searched for within the files, timing out too quickly when a user is idle, having too many stop words, not listing stop words, having very poor text created by optical character recognition, not allowing true phrase searching, not allowing searching by date, using frames, not allowing bookmarking of results, etc., etc., etc.

But what I want to piss on at the moment is Technorati. It searches blogs. Or at least, that’s what it’s supposed to do. They keep adding all this tag junk and favorites crap when they still haven’t mastered the art of simply returning decent results.

When I search for a term and results summaries are provided, the search term must appear in context in each of the search summaries. The point of a summary is to help people judge whether or not the linked item is worth visiting. If the search term doesn’t appear in the summary, then what the hell’s the point?

It’s the same with the RSS feeds that are created from searches. RSS aggregators are all about the summary. All about it. Why would any site deliver to me a customized RSS feed based upon a search for a single word and not always include that word in all of the summaries of all of the articles returned for that RSS feed?

The thing that irritates me the most? Technorati used to do this right. What possibly could be a good reason for changing that?

author avatar
Grant Barrett