An implementation of the induced sorting algorithm
sais is a very simple and small library that provides an implementation of the induced sorting [1] based suffix array construction algorithm. The algorithm runs in O(n) worst-case time, and MAX(2n, 4k) worst-case extra working space, where n and k are the length of the input string and the number of alphabets.
Source codes:
These packages are released under the MIT/X11 license.
- C/C++ (GNU Make) version
-
File: sais-lite-2.4.1.zip
Size: 17,097 bytes
SHA1: b3b7396da40022ed7dd7c19ffba81587020750c2
Last update: 2010/08/07 Improved performance - C/C++ (CMake) version
-
File: sais-2.4.1.zip
Size: 46,304 bytes
SHA1: cccf8d8bd18a893c58527d16ce6613923e22b411
Last update: 2010/08/07 Improved performance - Java version
-
File: sais-java.zip
Size: 19,603 bytes
SHA1: 79828ab134408a8b414aa517a1885cebf43418fe
Last update: 2010/09/17 Improved performance. - C# version
-
File: SAIS-CSharp.zip
Size: 29,875 bytes
SHA1: e1d52fb74c8551cb2881fc2ff3d92ad19a86db22
Last update: 2010/09/17 First release.
Reference:
- Ge Nong, Sen Zhang and Wai Hong Chan, Two Efficient Algorithms for Linear Suffix Array Construction, 2008.
Benchmarks:
Environment
CPU | 2.66 GHz Intel Core 2 Duo E6750 |
---|---|
RAM | 2 Gb main memory |
OS | Ubuntu 10.04 x86_64 |
Compiler | gcc version 4.4.3 |
All programs were compiled with gcc/g++ using '-O3 -fomit-frame-pointer -DNDEBUG' optimization options. The times are the average of five runs, in seconds, and were measured using the standard Unix 'time' command. (user + system). The spaces were measured using the 'memusage' command.
SACA programs
DC | Difference-Cover algorithm (v = 32) | http://www.cs.helsinki.fi/juha.karkkainen/publications/cpm03.tar.gz |
---|---|---|
DS | Deep-Shallow sorting algorithm | http://people.unipmn.it/manzini/lightweight/ |
KA | Ko-Aluru algorithm | http://ko.pang.cn.googlepages.com/software2 (local copy) |
LS | Larsson-Sadakane algorithm | http://www.larsson.dogma.net/research.html |
sais | My implementation of the IS algorithm | http://sites.google.com/site/yuta256/sais/ |
Testfiles
- Manzini's Large Corpus
- http://people.unipmn.it/manzini/lightweight/corpus/
- Maniscalco's Gauntlet Corpus
- http://www.michael-maniscalco.com/testset/gauntlet/
Results
Files | Size | DC | DS | KA | LS | sais |
---|---|---|---|---|---|---|
totals | 963869142 | 735.966 | 999.802 | 547.178 | 526.732 | 194.920 |
chr22.dna | 34553758 | 20.112 | 7.062 | 19.378 | 9.474 | 9.076 |
etext99 | 105277340 | 69.886 | 35.544 | 80.868 | 58.282 | 31.430 |
gcc-3.0.tar | 86630400 | 53.986 | 42.578 | 43.196 | 36.760 | 15.194 |
howto | 39422105 | 21.976 | 7.776 | 21.740 | 14.936 | 8.422 |
jdk13c | 69728899 | 64.454 | 33.744 | 37.912 | 38.900 | 11.012 |
linux-2.4.5.tar | 116254720 | 74.984 | 24.176 | 61.260 | 49.140 | 21.646 |
rctail96 | 114711151 | 114.886 | 55.872 | 73.992 | 63.634 | 23.014 |
rfc | 116421901 | 80.100 | 31.028 | 66.692 | 63.148 | 23.554 |
sprot34.dat | 109617186 | 83.530 | 29.900 | 68.596 | 51.742 | 24.506 |
w3c2 | 104201579 | 91.496 | 60.716 | 52.586 | 70.746 | 19.238 |
abac | 200000 | 0.066 | 26.446 | 0.020 | 0.022 | 0.010 |
abba | 10500600 | 10.324 | 29.356 | 2.650 | 12.532 | 1.024 |
book1x20 | 15375420 | 12.820 | 116.468 | 8.444 | 18.772 | 2.756 |
fib_s14930352 | 14930352 | 15.954 | 223.756 | 3.910 | 16.234 | 1.828 |
fss10 | 12078908 | 13.370 | 98.098 | 3.608 | 13.136 | 1.446 |
fss9 | 2851443 | 1.804 | 7.170 | 0.580 | 2.506 | 0.218 |
houston | 3840000 | 1.852 | 152.070 | 0.490 | 1.104 | 0.138 |
paper5x80 | 981924 | 0.406 | 0.596 | 0.206 | 0.386 | 0.054 |
test1 | 2097152 | 1.978 | 8.034 | 0.338 | 2.246 | 0.118 |
test2 | 2097152 | 0.952 | 8.076 | 0.342 | 2.258 | 0.122 |
test3 | 2097152 | 1.030 | 1.336 | 0.370 | 0.774 | 0.114 |
Files | Size | DC | DS | KA | LS | sais |
---|---|---|---|---|---|---|
totals | 963869142 | 5400.42 | 4610.47 | 8191.68 | 7353.76 | 4596.3 |
mean | - | 5.88 | 5.02 | 8.91 | 8.00 | 5.00 |
chr22.dna | 34553758 | 193.60 | 165.18 | 289.97 | 263.62 | 164.77 |
etext99 | 105277340 | 589.85 | 503.23 | 907.34 | 803.20 | 502.01 |
gcc-3.0.tar | 86630400 | 485.38 | 415.87 | 709.50 | 660.94 | 413.09 |
howto | 39422105 | 220.88 | 188.45 | 331.54 | 300.77 | 187.98 |
jdk13c | 69728899 | 390.68 | 333.40 | 609.71 | 531.99 | 332.50 |
linux-2.4.5.tar | 116254720 | 651.36 | 555.76 | 977.81 | 886.95 | 554.35 |
rctail96 | 114711151 | 642.71 | 548.32 | 1004.98 | 875.18 | 546.99 |
rfc | 116421901 | 652.29 | 556.53 | 956.52 | 888.23 | 555.15 |
sprot34.dat | 109617186 | 614.17 | 524.03 | 930.06 | 836.31 | 522.70 |
w3c2 | 104201579 | 583.82 | 498.09 | 912.00 | 795.00 | 496.88 |
abac | 200000 | 1.12 | 0.98 | 1.75 | 1.53 | 0.96 |
abba | 10500600 | 58.83 | 50.21 | 86.20 | 80.11 | 50.07 |
book1x20 | 15375420 | 86.15 | 73.52 | 132.42 | 117.31 | 73.32 |
fib_s14930352 | 14930352 | 83.65 | 71.71 | 117.16 | 113.91 | 71.20 |
fss10 | 12078908 | 67.68 | 58.05 | 107.05 | 92.16 | 57.60 |
fss9 | 2851443 | 15.98 | 13.71 | 25.27 | 21.76 | 13.60 |
houston | 3840000 | 21.52 | 18.46 | 28.79 | 29.30 | 18.31 |
paper5x80 | 981924 | 5.50 | 4.72 | 8.59 | 7.49 | 4.69 |
test1 | 2097152 | 11.75 | 10.10 | 18.34 | 16.00 | 10.00 |
test2 | 2097152 | 11.75 | 10.10 | 18.34 | 16.00 | 10.00 |
test3 | 2097152 | 11.75 | 10.05 | 18.34 | 16.00 | 10.13 |