sais is a very simple and small library that provides an implementation of the induced sorting [1] based suffix array construction algorithm. The algorithm runs in O(n) worst-case time, and MAX(2n, 4k) worst-case extra working space, where n and k are the length of the input string and the number of alphabets.
Source codes:
These packages are released under the MIT/X11 license.
- C/C++ (GNU Make) version
-
File: sais-lite-2.4.1.zip
Size: 17,097 bytes
SHA1: b3b7396da40022ed7dd7c19ffba81587020750c2
Last update: 2010/08/07 Improved performance
- C/C++ (CMake) version
-
File: sais-2.4.1.zip
Size: 46,304 bytes
SHA1: cccf8d8bd18a893c58527d16ce6613923e22b411
Last update: 2010/08/07 Improved performance
- Java version
-
File: sais-java.zip
Size: 19,603 bytes
SHA1: 79828ab134408a8b414aa517a1885cebf43418fe
Last update: 2010/09/17 Improved performance.
- C# version
-
File: SAIS-CSharp.zip
Size: 29,875 bytes
SHA1: e1d52fb74c8551cb2881fc2ff3d92ad19a86db22
Last update: 2010/09/17 First release.
Reference:
- Ge Nong, Sen Zhang and Wai Hong Chan, Two Efficient Algorithms for Linear Suffix Array Construction, 2008.
Benchmarks:
Environment
| CPU |
2.66 GHz Intel Core 2 Duo E6750 |
| RAM |
2 Gb main memory |
| OS |
Ubuntu 10.04 x86_64 |
| Compiler |
gcc version 4.4.3 |
All programs were compiled with gcc/g++ using '-O3 -fomit-frame-pointer -DNDEBUG' optimization options. The times are the average of five runs, in seconds, and were measured using the standard Unix 'time' command. (user + system). The spaces were measured using the 'memusage' command.
SACA programs
Testfiles
- Manzini's Large Corpus
- http://people.unipmn.it/manzini/lightweight/corpus/
- Maniscalco's Gauntlet Corpus
- http://www.michael-maniscalco.com/testset/gauntlet/
Results
Time in sec.
| Files |
Size |
DC |
DS |
KA |
LS |
sais |
| totals |
963869142 |
735.966 |
999.802 |
547.178 |
526.732 |
194.920 |
| chr22.dna |
34553758 |
20.112 |
7.062 |
19.378 |
9.474 |
9.076 |
| etext99 |
105277340 |
69.886 |
35.544 |
80.868 |
58.282 |
31.430 |
| gcc-3.0.tar |
86630400 |
53.986 |
42.578 |
43.196 |
36.760 |
15.194 |
| howto |
39422105 |
21.976 |
7.776 |
21.740 |
14.936 |
8.422 |
| jdk13c |
69728899 |
64.454 |
33.744 |
37.912 |
38.900 |
11.012 |
| linux-2.4.5.tar |
116254720 |
74.984 |
24.176 |
61.260 |
49.140 |
21.646 |
| rctail96 |
114711151 |
114.886 |
55.872 |
73.992 |
63.634 |
23.014 |
| rfc |
116421901 |
80.100 |
31.028 |
66.692 |
63.148 |
23.554 |
| sprot34.dat |
109617186 |
83.530 |
29.900 |
68.596 |
51.742 |
24.506 |
| w3c2 |
104201579 |
91.496 |
60.716 |
52.586 |
70.746 |
19.238 |
| abac |
200000 |
0.066 |
26.446 |
0.020 |
0.022 |
0.010 |
| abba |
10500600 |
10.324 |
29.356 |
2.650 |
12.532 |
1.024 |
| book1x20 |
15375420 |
12.820 |
116.468 |
8.444 |
18.772 |
2.756 |
| fib_s14930352 |
14930352 |
15.954 |
223.756 |
3.910 |
16.234 |
1.828 |
| fss10 |
12078908 |
13.370 |
98.098 |
3.608 |
13.136 |
1.446 |
| fss9 |
2851443 |
1.804 |
7.170 |
0.580 |
2.506 |
0.218 |
| houston |
3840000 |
1.852 |
152.070 |
0.490 |
1.104 |
0.138 |
| paper5x80 |
981924 |
0.406 |
0.596 |
0.206 |
0.386 |
0.054 |
| test1 |
2097152 |
1.978 |
8.034 |
0.338 |
2.246 |
0.118 |
| test2 |
2097152 |
0.952 |
8.076 |
0.342 |
2.258 |
0.122 |
| test3 |
2097152 |
1.030 |
1.336 |
0.370 |
0.774 |
0.114 |
Space in MiB
| Files |
Size |
DC |
DS |
KA |
LS |
sais |
| totals |
963869142 |
5400.42 |
4610.47 |
8191.68 |
7353.76 |
4596.3 |
| mean |
- |
5.88 |
5.02 |
8.91 |
8.00 |
5.00 |
| chr22.dna |
34553758 |
193.60 |
165.18 |
289.97 |
263.62 |
164.77 |
| etext99 |
105277340 |
589.85 |
503.23 |
907.34 |
803.20 |
502.01 |
| gcc-3.0.tar |
86630400 |
485.38 |
415.87 |
709.50 |
660.94 |
413.09 |
| howto |
39422105 |
220.88 |
188.45 |
331.54 |
300.77 |
187.98 |
| jdk13c |
69728899 |
390.68 |
333.40 |
609.71 |
531.99 |
332.50 |
| linux-2.4.5.tar |
116254720 |
651.36 |
555.76 |
977.81 |
886.95 |
554.35 |
| rctail96 |
114711151 |
642.71 |
548.32 |
1004.98 |
875.18 |
546.99 |
| rfc |
116421901 |
652.29 |
556.53 |
956.52 |
888.23 |
555.15 |
| sprot34.dat |
109617186 |
614.17 |
524.03 |
930.06 |
836.31 |
522.70 |
| w3c2 |
104201579 |
583.82 |
498.09 |
912.00 |
795.00 |
496.88 |
| abac |
200000 |
1.12 |
0.98 |
1.75 |
1.53 |
0.96 |
| abba |
10500600 |
58.83 |
50.21 |
86.20 |
80.11 |
50.07 |
| book1x20 |
15375420 |
86.15 |
73.52 |
132.42 |
117.31 |
73.32 |
| fib_s14930352 |
14930352 |
83.65 |
71.71 |
117.16 |
113.91 |
71.20 |
| fss10 |
12078908 |
67.68 |
58.05 |
107.05 |
92.16 |
57.60 |
| fss9 |
2851443 |
15.98 |
13.71 |
25.27 |
21.76 |
13.60 |
| houston |
3840000 |
21.52 |
18.46 |
28.79 |
29.30 |
18.31 |
| paper5x80 |
981924 |
5.50 |
4.72 |
8.59 |
7.49 |
4.69 |
| test1 |
2097152 |
11.75 |
10.10 |
18.34 |
16.00 |
10.00 |
| test2 |
2097152 |
11.75 |
10.10 |
18.34 |
16.00 |
10.00 |
| test3 |
2097152 |
11.75 |
10.05 |
18.34 |
16.00 |
10.13 |