Suffix/LCP Array Construction


pSAscan

A suffix array construction algorithm that successfully incorporates parallelism into external memory computation, resulting in a record-breaking performance. Using the algorithm we computed the suffix array of a 1TiB file on a modest machine in a little over one week. The largest processed instance previously reported in the literature was 80GiB.

Code

Published in:


fSAIS

External memory suffix array construction based on the inducing principle. fSAIS is currently the fastest and most space-efficient general suffix array construction running in O(sort(n)) I/O complexity. The key improvements, compared to previous implementations are using our own implementation of radix-heap as the main priority queue, and our own disk-block management, which drastically reduces disk space usage.

Code

Published in:


LCPscan

The first external memory algorithm for constructing the LCP array given the text and its suffix array as input. The only previous way to compute the LCP array for data that is bigger than the RAM is to use a suffix array construction algorithm with complex modifications to produce the LCP array as a by-product. Compared to the best prior method, our algorithm needs much less disk space (by more than a factor of three) and is significantly faster.

Code

Published in:


EM-SPhi, EM-SI

Two external memory algorithms, constructing the LCP array given the text and its suffix array. The new algorithms are around two times faster than LCPscan. In addition, both algorithms are parallelized (the parallel versions improve the speed up to another factor of two), both are in-place (i.e., do not need any extra disk space, except to accommodate the output), and both work for inputs over large alphabets.

Code

Published in: