sais

An implementation of the induced sorting algorithm

sais is a very simple and small library that provides an implementation of the induced sorting [1] based suffix array construction algorithm. The algorithm runs in O(n) worst-case time, and MAX(2n, 4k) worst-case extra working space, where n and k are the length of the input string and the number of alphabets.

Source codes:

These packages are released under the MIT/X11 license.

C/C++ (GNU Make) version
File: sais-lite-2.4.1.zip
Size: 17,097 bytes
SHA1: b3b7396da40022ed7dd7c19ffba81587020750c2
Last update: 2010/08/07 Improved performance
C/C++ (CMake) version
File: sais-2.4.1.zip
Size: 46,304 bytes
SHA1: cccf8d8bd18a893c58527d16ce6613923e22b411
Last update: 2010/08/07 Improved performance
Java version
File: sais-java.zip
Size: 19,603 bytes
SHA1: 79828ab134408a8b414aa517a1885cebf43418fe
Last update: 2010/09/17 Improved performance.
C# version
File: SAIS-CSharp.zip
Size: 29,875 bytes
SHA1: e1d52fb74c8551cb2881fc2ff3d92ad19a86db22
Last update: 2010/09/17 First release.

Reference:

  1. Ge Nong, Sen Zhang and Wai Hong Chan, Two Efficient Algorithms for Linear Suffix Array Construction, 2008.

Benchmarks:

Environment

CPU 2.66 GHz Intel Core 2 Duo E6750
RAM 2 Gb main memory
OS Ubuntu 10.04 x86_64
Compiler gcc version 4.4.3

All programs were compiled with gcc/g++ using '-O3 -fomit-frame-pointer -DNDEBUG' optimization options. The times are the average of five runs, in seconds, and were measured using the standard Unix 'time' command. (user + system). The spaces were measured using the 'memusage' command.

SACA programs

DC Difference-Cover algorithm (v = 32) http://www.cs.helsinki.fi/juha.karkkainen/publications/cpm03.tar.gz
DS Deep-Shallow sorting algorithm http://people.unipmn.it/manzini/lightweight/
KA Ko-Aluru algorithm http://ko.pang.cn.googlepages.com/software2 (local copy)
LS Larsson-Sadakane algorithm http://www.larsson.dogma.net/research.html
sais My implementation of the IS algorithm http://sites.google.com/site/yuta256/sais/

Testfiles

Manzini's Large Corpus
http://people.unipmn.it/manzini/lightweight/corpus/
Maniscalco's Gauntlet Corpus
http://www.michael-maniscalco.com/testset/gauntlet/

Results

Time in sec.
Files Size DC DS KA LS sais
totals 963869142 735.966 999.802 547.178 526.732 194.920
chr22.dna 34553758 20.112 7.062 19.378 9.474 9.076
etext99 105277340 69.886 35.544 80.868 58.282 31.430
gcc-3.0.tar 86630400 53.986 42.578 43.196 36.760 15.194
howto 39422105 21.976 7.776 21.740 14.936 8.422
jdk13c 69728899 64.454 33.744 37.912 38.900 11.012
linux-2.4.5.tar 116254720 74.984 24.176 61.260 49.140 21.646
rctail96 114711151 114.886 55.872 73.992 63.634 23.014
rfc 116421901 80.100 31.028 66.692 63.148 23.554
sprot34.dat 109617186 83.530 29.900 68.596 51.742 24.506
w3c2 104201579 91.496 60.716 52.586 70.746 19.238
abac 200000 0.066 26.446 0.020 0.022 0.010
abba 10500600 10.324 29.356 2.650 12.532 1.024
book1x20 15375420 12.820 116.468 8.444 18.772 2.756
fib_s14930352 14930352 15.954 223.756 3.910 16.234 1.828
fss10 12078908 13.370 98.098 3.608 13.136 1.446
fss9 2851443 1.804 7.170 0.580 2.506 0.218
houston 3840000 1.852 152.070 0.490 1.104 0.138
paper5x80 981924 0.406 0.596 0.206 0.386 0.054
test1 2097152 1.978 8.034 0.338 2.246 0.118
test2 2097152 0.952 8.076 0.342 2.258 0.122
test3 2097152 1.030 1.336 0.370 0.774 0.114

Space in MiB
Files Size DC DS KA LS sais
totals 963869142 5400.42 4610.47 8191.68 7353.76 4596.3
mean - 5.88 5.02 8.91 8.00 5.00
chr22.dna 34553758 193.60 165.18 289.97 263.62 164.77
etext99 105277340 589.85 503.23 907.34 803.20 502.01
gcc-3.0.tar 86630400 485.38 415.87 709.50 660.94 413.09
howto 39422105 220.88 188.45 331.54 300.77 187.98
jdk13c 69728899 390.68 333.40 609.71 531.99 332.50
linux-2.4.5.tar 116254720 651.36 555.76 977.81 886.95 554.35
rctail96 114711151 642.71 548.32 1004.98 875.18 546.99
rfc 116421901 652.29 556.53 956.52 888.23 555.15
sprot34.dat 109617186 614.17 524.03 930.06 836.31 522.70
w3c2 104201579 583.82 498.09 912.00 795.00 496.88
abac 200000 1.12 0.98 1.75 1.53 0.96
abba 10500600 58.83 50.21 86.20 80.11 50.07
book1x20 15375420 86.15 73.52 132.42 117.31 73.32
fib_s14930352 14930352 83.65 71.71 117.16 113.91 71.20
fss10 12078908 67.68 58.05 107.05 92.16 57.60
fss9 2851443 15.98 13.71 25.27 21.76 13.60
houston 3840000 21.52 18.46 28.79 29.30 18.31
paper5x80 981924 5.50 4.72 8.59 7.49 4.69
test1 2097152 11.75 10.10 18.34 16.00 10.00
test2 2097152 11.75 10.10 18.34 16.00 10.00
test3 2097152 11.75 10.05 18.34 16.00 10.13