This page contains details of the work done at Data Mining Research Laboratory at The Ohio State University on Locality Sensitive Hashing (LSH). LSH is a hashing scheme usually applied to high dimensional data for (1) approximate similarity search and estimation and (2) approximate nearest neighbor search. LSH exists for a variety of similarity/distance measures. Details of the original LSH work can be found here.
We have used the LSH technique to significantly speedup the All Pairs Similarity Search problem for the Jaccard, cosine and kernel similarity measures. We have also done precise theoretical analysis on the quality of the output (recall and estimation accuracy) for our proposed algorithms. Following are the papers and source code.