Semantic is beautiful: clustering and diversifying search results with graph-based Word Sense Induction
Department of Computer Science, Sapienza University of Rome
Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a ﬂat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to represent a different meaning of the input query, thus taking into account the language ambiguity issue. However, Web clustering methods typically rely on some notion of textual similarity of search results. As a result, text snippets with no word in common tend to be clustered separately, even if they share the same meaning.
In this talk, we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). Key to our approach is to ﬁrst acquire the senses (i.e., meanings) of a query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on datasets of ambiguous queries, show that our approach outperforms
both Web clustering and search engines in the clustering and diversification of search results.
Roberto Navigli is an associate professor in the Department of Computer Science at the Sapienza University of Rome. He was awarded the Marco Cadoli 2007 Italian National Prize for the best Ph.D. thesis in Artificial Intelligence and is the holder of a 2010 ERC Starting Grant in Computer Science and Informatics.
Currently, he is a member of the editorial board of Computational Linguistics and the Journal of Natural Language Engineering, and a track chair of WWW 2012. His research interests lie in the field of Natural Language Processing, including Word Sense Disambiguation and Induction, knowledge acquisition, ontology learning and semantically-enhanced Information Retrieval.