Software
Sparse Kernel Clustering for large Number of Clusters
MATLAB implementation of the sparse kernel k-means algorithm, using the RBF kernel.
Usage: Unzip the downloaded file, unzip the flann-*-src.zip into the directory, and run the file: main_sparse_kernel_kmeans.m. Provide the following parameters:
Buffer size parameters: Maximum buffer size m, Initial buffer size l
Neighborhood size for constructing the sparse kernel: p
RBF kernel width: lambda (Change the kernel definition in lines 36 and 71 if a different kernel needs to be used. Also change the flann input parameters in lines 34 and 69.)
Batch size: batchSize
Number of clusters: k
Eigenvector re-orthogonalization interval (determines how often the updated eigenvectors are orthogonalized): reorth_count
Parameter for lazy clustering (determines how often the points added to the buffer are clustered): reclustercount
Approximate Stream Kernel Clustering
MATLAB implementation of the stream kernel k-means algorithm, using the RBF kernel.
Usage: Unzip the downloaded file, and run the file: main_stream_kernel_kmeans.m. Provide the following parameters:
Buffer size parameters: Maximum buffer size m, Initial buffer size l
RBF kernel width: lambda (Change the kernel definition in lines 29 and 59 if a different kernel needs to be used)
Batch size: batchSize
Concept drift parameters: recency_threshold (determines how fast the concept changes) and recency_factor (rate of decay of the clusters - best value: 0.1)
Number of clusters: k
Eigenvector re-orthogonalization interval (determines how often the updated eigenvectors are orthogonalized): reorth_count
Parameter for lazy clustering (determines how often the points added to the buffer are clustered): reclustercount
Kernel k-means based on Random Fourier Features
This is a MATLAB implementation of the fast kernel clustering algorithm proposed in the ICDM 2012 paper "Efficient Kernel Clustering Using Random Fourier Features".
Code is available for download here.
The .m file has a simple example demonstrating the working of the algorithm.
Approximate Kernel k-means
This is a MATLAB implementation of the approximate kernel k-means algorithm for large scale clustering.
Details about this algorithm are available in the KDD 2011 paper "Approximate Kernel k-means: solution to Large Scale Kernel Clustering"
Code is available for download here .
Usage: Unzip the downloaded file. The zipped file includes two files: approx_kkmeans.m and example.m. While the former is the core implementation of the approximate kernel k-means algorithm, the latter gives an example of how the algorithm is invoked on a simple 2-D dataset.
Incremental Topic and User Modeling
Java code to perform topic modeling in growing documents and modeling users based on their posts to the edmodo portal (binaries available on request).
SVM integrated with two-level k-means
This is an implementation of the two-level k-means algorithm for fast linear classification. It is based on the SVMlight implementation of SVM by Thorsten Joachims.
Code is available for download here .
Details about the algorithm can be found in the paper "Two-level k-means clustering algorithm for k–t relationship establishment and linear-time classification"