Approximate DBSCAN
Overview
A method for density-based clustering in low to median dimensional space.
Avoids the quadratic running time of traditional DBSCAN implementations.
Especially suitable for iterative parameter tuning to maximize the quality of clustering.
Code
Version 2.0 (6 Jan 2017)
Binary: 64-bit Ubuntu, 64-bit Windows
What's new:
Incorporates plenty of new optimization.
Improved Version 1.0 by over 20 times at small epsilon values.
Specialized faster algorithms in 2D space.
Data generators for creating clusters with varying densities.
Performance documented in the long version of our SIGMOD'15 paper.
Old Versions
See this page for the past releases, as well as the source code of our SIGMOD'15 implementation (which is now obsolete).
Remark 1: All the binary and source code is released "as it is", WITHOUT ANY WARRANTY. All the copyrights are reserved.
Remark 2: See this page for some real datasets for testing the efficiency of the algorithms.
Publications
Junhao Gan and Yufei Tao. DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation. Proceedings of ACM Conference on Management of Data (SIGMOD), pages 519-530, 2015. (Winner of the best-paper award)
Junhao Gan and Yufei Tao. On the Hardness and Approximation of Euclidean DBSCAN. ACM Transactions on Database Systems (TODS): Volume 42 Issue 3, August 2017.
Contact us
Please email your questions, comments, and suggestions to approxdbscan@gmail.com. Limited technical support may be provided. We also welcome donation of datasets suitable for density-based clustering.