Suffix/LCP Array Construction
pSAscan
A suffix array construction algorithm that successfully incorporates parallelism into external memory computation, resulting in a record-breaking performance. Using the algorithm we computed the suffix array of a 1TiB file on a modest machine in a little over one week. The largest processed instance previously reported in the literature was 80GiB.
Published in:
Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi: Parallel External Memory Suffix Sorting, CPM 2015 DOI | Slides
fSAIS
External memory suffix array construction based on the inducing principle. fSAIS is currently the fastest and most space-efficient general suffix array construction running in O(sort(n)) I/O complexity. The key improvements, compared to previous implementations are using our own implementation of radix-heap as the main priority queue, and our own disk-block management, which drastically reduces disk space usage.
Published in:
Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi, Bella Zhukova: Engineering External Memory Induced Suffix Sorting, ALENEX 2017 DOI
LCPscan
The first external memory algorithm for constructing the LCP array given the text and its suffix array as input. The only previous way to compute the LCP array for data that is bigger than the RAM is to use a suffix array construction algorithm with complex modifications to produce the LCP array as a by-product. Compared to the best prior method, our algorithm needs much less disk space (by more than a factor of three) and is significantly faster.
Published in:
Juha Kärkkäinen, Dominik Kempa: LCP Array Construction in External Memory, ACM J. Exp. Algor. 2016 DOI
Juha Kärkkäinen, Dominik Kempa: LCP Array Construction in External Memory, SEA 2014 DOI
EM-SPhi, EM-SI
Two external memory algorithms, constructing the LCP array given the text and its suffix array. The new algorithms are around two times faster than LCPscan. In addition, both algorithms are parallelized (the parallel versions improve the speed up to another factor of two), both are in-place (i.e., do not need any extra disk space, except to accommodate the output), and both work for inputs over large alphabets.
Published in:
Juha Kärkkäinen, Dominik Kempa: Better External Memory LCP Array Construction, ACM J. Exp. Algor. 2019 DOI
Juha Kärkkäinen, Dominik Kempa: Engineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet, SEA 2017 DOI
Juha Kärkkäinen, Dominik Kempa: Faster External Memory LCP Array Construction, ESA 2016 DOI | Slides