Entropy Coder Benchmark

TurboBench : Single core in memory entropy coder benchmark
(
all entropy coders with latest version - Last update: 2017-05)

All libraries compiled with 64 bits gcc 6.3, Ubuntu OS 17.04
Uniform distribution:enwik9      (MB=1.000.000    bold/bold o1 = Pareto frontier on ratio %)

Size Ratio % C.MB/s D.MB/s Compressor CPU: Skylake 3.4 GHz
 634103911    63.4     11.70   1457.74   TurboANX,9     optimized
 634474177    63.4     33.52     20.98   subotin         
 635239420    63.5    559.23   1433.49   TurboANX        
 637630650    63.8    650.41    914.98   TurboHF         
 637963432    63.8    205.64    223.18   zlibh 13k
 639048689    63.9    375.96    502.42   fse             
 639740506    64.0    273.05    676.69   rans_static16   
 641412595    64.1    617.48    836.89   fsehuf
 642305248    64.2    459.58   1346.12   rans_static32

Size Ratio % C.MB/s D.MB/s Compressor CPU: i7-2600k 4.4 GHz
440163308 44.0 52.78 39.98 TurboAC bit o1 sliding context 477069134 47.7 145.88 191.55 TurboANXX o1 541944518 54.2 131.62 273.52 rans_static16o1 620044520 62.0 22.18 16.77 vecrc_sh 620061616 62.0 59.00 49.40 TurboAC bit o0 620097748 62.0 21.92 24.87 fpaq0p_sh 620278358 62.0 19.50 14.99 fpaqc_mm 625933204 62.6 47.26 31.58 TurboAC byte 634103875 63.4 11.87 1184.60 TurboANX,9 JIT switch scalar/sse/avx2 634474173 63.4 38.78 21.67 Subotin 635429360 63.5 412.97 1219.70 TurboANX 637630650 63.8 654.12 955.22 TurboHF 637813190 63.8 259.52 275.07 zlibh 639033488 63.9 184.66 54.54 FastAC 639048689 63.9 345.25 490.27 fse 639740506 64.0 338.37 597.83 rans_static16 640295271 64.0 711.43 1035.40 TurboHF 30K 640700579 64.1 208.21 170.72 fsc 641305739 64.1 246.75 358.98 rANS_ryg 641412595 64.1 633.27 876.62 fsehuf 642288251 64.3 140.46 634.95 rANS_ryg_sse4.1 643834665 64.4 129.31 100.62 FastHF 644090542 64.4 285.64 147.30 tornado huff 644883587 64.5 189.77 135.40 arith_static jamesB 648726912 64.9 84.87 48.89 polar tree 1000000000 100.0 7760.82 7832.90 memcpy
Skewed distribution: enwik9 bwt  generated w. libdivsufsort

Size Ratio % C.MB/s D.MB/s Compressor CPU: Skylake 3.4 GHz
 215874369    21.6     15.98   1267.40   TurboANX,9      optimized
 243033742    24.3    659.49   1675.86   TurboANX        
 265110032    26.5    402.88    537.83   fse            
 265538954    26.6    345.60    700.67   rans_static16  
 268113114    26.8    771.09   1078.33   TurboHF
 268176778    26.8    469.73   1465.28   rans_static32      
 284721687    28.5    255.95    275.84   zlibh 13k
 289879586    29.0    670.88   1035.05   fsehuf         

Size Ratio % C.MB/s D.MB/s Compressor CPU: i7-2600k 4.4 GHz

175943420 17.6 64.15 53.15 TurboAC bit o1 (sliding context) 183565924 18.4 37.41 49.21 fpaq0p_sh 183670473 18.4 22.38 14.95 fpaqc_mm 183896215 18.4 23.42 20.59 vecrc_sh 183901000 18.4 80.15 70.28 TurboAC bit o0 197653870 19.8 224.34 271.76 TurboANXXo1 215874337 21.6 15.59 1080.36 TurboANX,9 215877030 21.6 194.96 404.15 rans_static16o1 221038728 22.1 60.81 40.03 TurboAC byte 243033685 24.3 404.83 1625.87 TurboANX 247416080 24.7 260.13 412.21 rANS_Ryg 263031843 26.3 772.25 1100.41 TurboHF 8K 263945438 26.4 64.89 40.29 subotin 265110032 26.5 370.60 488.65 fse 265538954 26.6 404.19 615.34 rans_static16 266129293 26.6 220.00 137.00 fsc 268113114 26.8 783.67 1108.92 TurboHF 284721687 28.5 307.39 336.12 zlibh 285724002 28.6 233.46 179.11 arith_static jamesB 289879586 29.0 689.13 1079.40 fsehuf 297969227 29.8 247.55 72.38 FastAC 305848788 30.6 112.98 76.68 polar tree 335145308 33.5 170.13 123.45 FastHF 337038897 33.7 324.57 156.13 tornado huff 463797334 46.2 162.80 645.88 rANS_ryg_sse4.1 (SIMD) 1000000008 100.0 7943.18 7850.77 memcpy

Uniform distribution:enwik8    

Size Ratio % C.MB/s D.MB/s Compressor Hardware: ODROID C2-ARM 64bits-1.5GHz
44854452 44.9 9.74 8.97 TurboRC_o1 55333425 55.3 18.74 33.93 rans_static16o1 61245756 61.2 9.04 8.67 TurboRC 61571700 61.6 10.99 7.83 TurboAC_byte 62677385 62.7 10.20 6.63 subotin 62957995 63.0 116.58 164.92 TurboHF 63188022 63.2 31.23 18.03 FastAC 63193639 63.2 50.88 85.14 zlibh 63202025 63.2 93.46 121.08 fse 63292098 63.3 55.85 97.30 rans_static16 63420890 63.4 124.50 130.52 fsehuf 63648861 63.6 30.47 21.87 FastHF 64138215 64.1 42.63 45.75 arith_static 100000004 100.0 1680.95 1699.70 memcpy

Entropy Coder Benchmark

Only the fastest, newest, most efficient or popular entropy coders are tested.
We are testing with 2 practical uniform and skewed distributions like that found in
lz77, bwt, image and video compression.

- ANS : Asymmetric Numeral Systems

FSE/FSC/rans_static tested with default block size = 32K and builtin header functions.
Other ANS entropy coders tested w/ block size = 12k (good avg for uniform and skewed distributions) and header functions similar to rans_static. Header size/timing are included. Using larger block size make all ANS entropy coders
faster, but the rate will be worse, so the advantage over huffman coding will disappear. Additionally much more memory is required.

Only TurboANX compress better than huffman for all distributions. - TurboANX in LzTurbo v1.2 (JIT scalar/sse/avx2) -
World's fastest and most efficient entropy coder. Automatic JIT switching between Scalar/SSE/AVX2 depending on the detected CPU. All modes 32/64 bits scalar, SSE and AVX2 are binary compatible.
-  TurboANXX order o1 context - True order o1 context entropy coder 
  (context must not  be the previous character)
- rANS interleaved v14.02 - Range Asymmetric Nummeral System 
  (no header i/o functions) (13K)
- FSC Finite State Coder v15.04 - Table Asymmetric Numeral Systems (32K)
- FSE Finite State Entropy v16.09 & zstd 1.0 - Table Asymmetric Numeral Systems (32K)
- rANS Static v16.09 - Range Asymmetric Numeral Systems interleaved (32K).
  Note: rans_static o1 order can only use the prev. character in the input buffer as context!

All ANS are block based entropy coders: each block is encoded separately using the statistics from this block only

References: - Asymmetric Numeral Systems - The use of Asymmetric Numeral Systems as an accurate replacement for Huffman Coding - Models for adaptive arithmetic coding - List of Asymmetric Numeral Systems implementations - Wikipedia: Asymmetric Numeral Systems

- ABS : Asymmetric Binary Systems

- fpaqc_mm (Asymmetric Binary System) adaptive bitwise entropy coder
References:
-
Asymmetric Binary Coding

- Huffman Coding

- optimized TurboHF from LzTurbo v1.1 - Block based Huffman Coding. Decoding one Billion Symbols per second. - zlibh - Huffman Coding (tested w/ blocksize 13k) - FastHF - quasi-adaptive Huffman Coding - Tornado v0.6 - semi-adaptive Huffman Coder - FSE Huffman Coding v16.09 &
zstd 1.0 - block based Huffman Coding - Polar Tree - Non-Huffman binary tree

*quasi-adaptive = semi-adaptive model with "periodic rescale" or "deferred summation".
The benchmark shows that block based coders are more efficient
References:
-
It's Been 1,000,000 Years Since Huffman + Video - In-place calculation of minimum-redundancy codes + Software - A Memory-efficient Huffman Decoding Algorithm + Sildes - Huffman coding revisited part 1-5: [1],[2],[3],[4],[5]

- Arithmetic Coding / Range Coder

- TurboAC bit in LzTurbo v1.2 - bitwise Range Coder (Note: the first bitwise coder faster than bytewise RC!!! - vecrc_sh: vectorized range coder" - bitwise Range Coder - fpaq0p_sh: simple binary rangecoder - bitwise Range Coder
- TurboAC byte in LzTurbo - bytewise Range Coder
- Subotin - bytewise Range Coder
- FastAC - quasi-adaptive (or semi-adaptive) Range Coder

References:
-
Arithmetic Coding revealed A guided tour from theory to praxis incl. c++ code - Introduction to Arithmetic Coding - Theory and Practice (FastAC) - Context Adaptive Binary Arithmetic Coding - Arithmetic Coding - Data Compression With Arithmetic Coding - Entrop Coding/Non-binary Arithmetic Coding
Benchmark Forum
Last update: 16 JUL 2017
Comments