Easy Web Search Results Clustering
When Baselines Can Reach State-of-the-Art Algorithms
Abstract
This work discusses the evaluation of baseline algorithms for Web search results clustering. An analysis is performed over frequently used baseline algorithms and standard datasets. Our work shows that competitive results can be obtained by either fine tuning or performing cascade
clustering over well-known algorithms. In particular, the latter strategy can lead to a scalable and real-world solution, which evidences comparative results to recent text-based state-of-the-art algorithms.
Datasets
Code
EasyWebSRC
Evaluation Tool
SRCEvaluator (configured for F_1 and F_b^3 metrics)
Paper
José G. Moreno and Gaël Dias. Easy Web Search Results Clustering: When Baselines Can Reach State-of-the-Art Algorithms. EACL 2014. [PDF]