Research


Area Of Interest : Web Mining and Information Retrieval

Present Work:

1) Web Document Clustering using Particle Saurm Optimization

2) Distributed Web document clustering model

Web clustering is one of the most important ways of information retrieval and taxonomy management for the Web. In this paper, we adopt a network communication model, where the main objective is to allow nodes in a network to first form independent local document clustering, and then communicate with other nodes to enhance the local clustering. Communication overhead has been minimized through the use of cluster key phrases extracted from the clusters with the help of graph and spanning tree representation of documents. Initial clustering, as well as merging of local data with other nodes data in the network, uses a clustering method, called similarity histogram-based clustering. This approach achieves significant improvement in local clustering solutions without the cost of centralized clustering, while maintaining the initial local clustering structure. We provide the result of web-based experiments to show the result of the clustering on Yahoo News data.

 

Past Work:

1) Enterprise Expert Search:
Objective:
To make web user interface for a given enterprise (CSIRO) for searching the experts in a field (query given by the user).
Challenges:
 a)  Topics and expertise judgements come from real users (basically no idea what your results might turn out to be then).
 b) To search experts from the corpus, based on the topic definition only
 c) Different algorithms work different for different sets of topics (uncertainty of algorithm being good enough)

2) Enterprise Document Search:
Objective:
To make web user interface for a given enterprise (CSIRO) for searching the documents for a query given by the user.
Challenges:
a) More than 3,70,715 documents clubbed in to 266 bundle files.
b) Classification/clustering the corpus to produce relevant and diverse result sets to serve the purpose of resource finding.

3) Enterprise Search engine with clustered set of results:

This Search engine gives different sets of result of a query. Each set involves the documents related to the topics of your query. Each set has topic as its set name. User has not to search more in a long list of documents. He can search documents related to his topic very easily.

4)  Similarity Based Fuzzy Clustering of web pages:

In this experiment, we examine clustering method: PFCM method using a new approach of textual information. We found the PFCM with the similarity metric is particularly effective, as demonstrated on three datasets of web query results. Experiment also proves the result by Hypothetical testing.

5) Clustering of web pages using Textual and Hyperlink similarity:

In this experiment, we examine clustering method: PFCM method using a new approach of combining textual information, hyperlink structure and co-citation relations into a single similarity metric. We found the PFCM with the new similarity metric is particularly effective, as demonstrated on three datasets of web query results.