Search Result Clustering

MSRA SRC is a tool for searching web with the Search Result Clustering (SRC) technique, that was developed at Web Search and Mining Group in MSR, Asia. On-the-fly it clusters a search engine’s search results into different groups, and provides meaningful and readable names for these groups. SRC changes the traditional representation of search results into a non-linear way, so as to facilitate the user’s browsing.

Traditional clustering techniques don’t work for this problem because the documents are short, the cluster names should be readable and the algorithm should be efficient for on-the-fly calculation. The method takes on the whole problem in a different way and overcomes the difficulties in traditional clustering methods. It tries to first identify salient topics by identifying distinct and independent keywords, and then classifies the search results into these topics. Check out the release notes.

You can see the result map on the right when I search on my name.

The SRC technology facilitates Web users browsing through the long list of search results. Several typical usages of this technology are as below:

  1. Query disambiguation – When a query is ambiguous, SRC can group the search results according to different senses of the query word. The examples are: jaguar, saturn, apple
  2. Sub-topics discovery – Many query terms contains sub-topics, SRC can display all important sub-topics of a query term on the Web. The examples are: data mining, iraq, digital camera
  3. Fact finding of peoples – When a query is a person name, SRC can find the affiliation, position, interests and related persons for him/her. The examples are: bill clinton, rick rashid, harry shum
  4. Relationship finding of peoples – When a query is two person’s name, SRC can find out their relationships. The examples are: “Kai-Fu Lee” “Ya-Qin Zhang”, “Rick Rashid” and “Dan Ling”
  5. Q&A – When a query is a question, SRC can find out the possible answers and rank the most probable one as the first answer. The examples are: the biggest ocean, “The World’s Hottest Computer Lab”, chinese premier

Published by

Amit Bahree

This blog is my personal blog and while it does reflect my experiences in my professional life, this is just my thoughts. Most of the entries are technical though sometimes they can vary from the wacky to even political – however that is quite rare. Quite often, I have been asked what’s up with the “gibberish” and the funny title of the blog? Some people even going the extra step to say that, this is a virus that infected their system (ahem) well. [:D] It actually is quite simple, and if you have still not figured out then check out this link – whats in a name?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.