September 25, 2008

Shouldn't this search interface look more like Google?

Remember the Spike Lee commercial for some company’s athletic shoes? Spike appears on camera trying to compete against a professional basketball player who flies around the court outmaneuvering Spike and sinking baskets with ease. At the end Spike is seen studying a sneaker and muttering to himself, “The shoes. It’s gotta be the shoes.� Remember this; we’ll come back to it.

How do libraries and other information managing entities achieve precision in their search systems? We have had two basic models. One is the alphabetical list. By following strict rules on formulating name, title, and subject access points, our card catalogs and online catalogs have made it possible to focus in on a particular search target by searching an entry term, and then browsing the ordered entries filed under that term. This makes finding the title “War and Peace� much simpler than if one were to do a keyword search on the words “war� and “peace� appearing in titles.

Computers have enabled the use of Boolean searching, which achieves precision by searching for less frequent terms or combinations of terms found in our metadata records. An unspecified keyword search on “war� and Tolstoy� is fairly precise; a search on “war� in a title field and “Tolstoy� in an author field is even more precise. Both browsing and Boolean searching are valuable tools for narrowing down the number of records needing to be reviewed to a reasonably relevant few based on data in the records.

Then there’s Google. Google’s single search box is widely admired as a model of simple, efficient design; and the results Google produces are remarkably successful in responding to users needs. However, this is not because of Google’s search model. It’s because of Google’s brilliant ranking algorithm. By ranking its thousands of results exceptionally well, Google is able to bring the likeliest candidates for meeting its searcher’s needs to the top of the result list. This ranking is based primarily on Google’s access to a vast amount of online behavior related precisely to the objects it indexes, web pages and their URLs. By gathering data about the behavior of searchers in choosing web pages and the frequency of links to particular pages from other pages, Google is able to assign rank values to its results that often bring the most sought after sources to the top.

Most data sets do not and will never have the kind of behavioral data that drives the Google ranking algorithm. Without it, they cannot begin to achieve the success Google finds in bringing certain result set members to the top of result lists. Yet search interface designers continue to adopt the Google model of a single, unqualified search box, which has such widespread public support. We all want to be like Google, so we want to look like Google.

“The shoes. It’s gotta be the shoes.�

Faceted browsing is best understood as an alternative to Google’s linear ranking. The faceted terms offered to the searcher highlight commonalities in the result set data—subjects, authors, dates, etc.—which the searcher can then use to reduce the results to a set which more precisely meets the searcher’s needs. This is a definite improvement over a single, poorly ranked list; but ultimately it also raises new questions. Is a Google-style popularity ranking best in all cases? Is it satisfactory to see only the top ranked facet terms, or do searchers want to see them all? If hit counts are used for ranking, is there an implicit expectation that the facet terms are consistently formed, so that the number of hits for an author or subject are not split among two or more terms for the same entity, thereby lowering its rank? Are there other more intuitive orders that would be expected for sorting facet terms, e.g., dates in chronological order, names in alphabetical order, classifications in compressed hierarchies? Why do we assume that popularity is the best predictor of what a searcher browsing facet terms is looking for? Just because it works for Google?

“The shoes. It’s gotta be the shoes.�

The Spike Lee who directed this commercial is a lot smarter than the character he plays. Next time you hear someone calling for a search interface to be like Google, be like Spike. Mutter to yourself, ‘The shoes, it’s gotta be the shoes,� and smile.

Posted by s-hear at 5:36 PM | Comments (0)