Software

    • Sparse Kernel Clustering for large Number of Clusters

      • MATLAB implementation of the sparse kernel k-means algorithm, using the RBF kernel.

      • Usage: Unzip the downloaded file, unzip the flann-*-src.zip into the directory, and run the file: main_sparse_kernel_kmeans.m. Provide the following parameters:

      • Buffer size parameters: Maximum buffer size m, Initial buffer size l

      • Neighborhood size for constructing the sparse kernel: p

      • RBF kernel width: lambda (Change the kernel definition in lines 36 and 71 if a different kernel needs to be used. Also change the flann input parameters in lines 34 and 69.)

      • Batch size: batchSize

        • Number of clusters: k

      • Eigenvector re-orthogonalization interval (determines how often the updated eigenvectors are orthogonalized): reorth_count

      • Parameter for lazy clustering (determines how often the points added to the buffer are clustered): reclustercount

    • Approximate Stream Kernel Clustering

      • MATLAB implementation of the stream kernel k-means algorithm, using the RBF kernel.

      • Usage: Unzip the downloaded file, and run the file: main_stream_kernel_kmeans.m. Provide the following parameters:

      • Buffer size parameters: Maximum buffer size m, Initial buffer size l

      • RBF kernel width: lambda (Change the kernel definition in lines 29 and 59 if a different kernel needs to be used)

      • Batch size: batchSize

        • Concept drift parameters: recency_threshold (determines how fast the concept changes) and recency_factor (rate of decay of the clusters - best value: 0.1)

        • Number of clusters: k

      • Eigenvector re-orthogonalization interval (determines how often the updated eigenvectors are orthogonalized): reorth_count

      • Parameter for lazy clustering (determines how often the points added to the buffer are clustered): reclustercount

    • Kernel k-means based on Random Fourier Features

    • Approximate Kernel k-means

      • This is a MATLAB implementation of the approximate kernel k-means algorithm for large scale clustering.

      • Details about this algorithm are available in the KDD 2011 paper "Approximate Kernel k-means: solution to Large Scale Kernel Clustering"

      • Code is available for download here .

      • Usage: Unzip the downloaded file. The zipped file includes two files: approx_kkmeans.m and example.m. While the former is the core implementation of the approximate kernel k-means algorithm, the latter gives an example of how the algorithm is invoked on a simple 2-D dataset.

    • Incremental Topic and User Modeling

      • Java code to perform topic modeling in growing documents and modeling users based on their posts to the edmodo portal (binaries available on request).

    • SVM integrated with two-level k-means