What is Carrot2 and what it is not
Carrot2 is a library and a set of supporting applications you can use to build a search results clustering engine. Such an engine will organize your search results into topics, fully
automatically and without external kowledge such as taxonomies or preclassified content.
Carrot2 contains two document clustering algorighms designed specifically for search results clustering: Suffix Tree Clustering and Lingo. Carrot2 also contains components for fetching search results from several search engines, such as Yahoo!, MSN Live, Google, but it also supports other sources of documents like Lucene, Solr or Google Desktop index.
Carrot2 is not a search engine itself, it does not have a crawler and indexer. There is a number of Open Source projects you can use to crawl (Nutch), index and search (Lucene, Solr) your content, which can then be queried and clustered by Carrot2
For more details please visit Carrot2 API Manual
Trying Carrot2 clustering with Solr Index on Carrot2 Document Clustering Workbench
Carrot2 Document Clustering Workbench is a standalone GUI application you can use to experiment with Carrot2 clustering on data from common search engines or your own data.
You can use Carrot2 Document Clustering Workbench to:
Installation and running of Carrot2 Document Clustering Workbench
I assume that you have followed my previous tutorial on How to install Nutch and Solr on Ubuntu 10.04 and you are successfully able to generate Solr response in XML format!
To run Carrot2 Document Clustering Workbench:
Algorithm: Lingo (or other as per your choice)
Query: USC (As per your choice)
Results: 100 (or more if you want)
Summary Field Name: content
Title Field Name: title
URL Field Name: url
Service url: http://127.0.0.1:8080/solr/select/
Then click on Process to view results in internal browser.You may play with Attribute view to tune clustering..Enjoy :)