entropy-coding

TurboBench : Single core in memory entropy coder benchmark

All libraries compiled with 64 bits gcc 7.3

Uniform distribution: enwik9 Skewed distribution: enwik9 bwt generated w/ libdivsufsort

(MB=1.000.000 bold/bold o1 = Pareto frontier on ratio %)

Size Ratio % C.MB/s D.MB/s Compressor enwik9 CPU:Skylake 3.4 GHz 2018-05

620061620 62.0 49.23 43.64 TurboRC bit o0 621704996 62.2 74.89 101.36 TurboANXN (nibble) o0 *NEW* 622002650 62.2 17.94 14.34 AOMedia AV1 nibble o0 *NEW* 630257110 63.0 8.63 981.57 TurboHF 0 optimal 631936952 63.2 9.96 1457.74 TurboANX 0 optimal 634474177 63.4 33.52 20.98 subotin ac 635239420 63.5 559.23 1433.49 TurboANX 635943929 63.6 174.26 847.66 fpc 0 637466340 63.7 774.73 1077.41 TurboHF 12 637963432 63.8 205.64 223.18 zlibh 13k 639048689 63.9 379.92 501.70 fse 32 639740506 64.0 273.05 676.69 rans_static16 32 640334763 64.0 873.76 1195.74 TurboHF 32 641336375 64.1 601.32 928.53 fpc 32 641412595 64.1 604.16 831.45 fsehuf 32 642305248 64.2 459.58 1346.12 rans_static32 32

Size Ratio % C.MB/s D.MB/s Compressor enwik9bwt CPU: Skylake 3.4 GHz 2018-05

183901004 18.4 73.19 65.57 TurboRC bit o0 194954146 19.5 88.84 100.37 TurboANXN (nibble) o0 *NEW* 196471059 19.6 22.57 18.58 AOMedia AV1 nibble o0 *NEW* 215874369 21.6 15.98 1267.40 TurboANX,0 optimized 243033742 24.3 659.49 1675.86 TurboANX 246878022 24.7 4.96 1085.20 TurboHF 0 249576054 25.0 163.24 920.69 fpc 0 262677974 26.3 885.23 1240.18 TurboHF 8 263945442 26.4 64.14 38.52 subotin ac 265110032 26.5 402.88 537.83 fse 32 265538954 26.6 345.60 700.67 rans_static16 32 268176778 26.8 469.73 1465.28 rans_static32 32 284721687 28.5 255.95 275.84 zlibh 13k 289217878 28.9 974.10 1321.99 TurboHF 32 289879586 29.0 655.22 1017.13 fsehuf 32 289892417 29.0 667.01 1022.59 fpc 32

Size Ratio % C.MB/s D.MB/s Compressor enwik9 CPU: i7-2600k 4.4 GHz 2017-05

440163308 44.0 52.78 39.98 TurboAC bit o1 sliding context 477069134 47.7 145.88 191.55 TurboANXX o1 541944518 54.2 131.62 273.52 rans_static16o1 620044520 62.0 22.18 16.77 vecrc_sh 620061616 62.0 59.00 49.40 TurboRC bit o0 620097748 62.0 21.92 24.87 fpaq0p_sh 620278358 62.0 19.50 14.99 fpaqc_mm 625933204 62.6 47.26 31.58 TurboAC byte 634103875 63.4 11.87 1184.60 TurboANX,9 JIT switch scalar/sse/avx2 634474173 63.4 38.78 21.67 Subotin 635429360 63.5 412.97 1219.70 TurboANX 637630650 63.8 654.12 955.22 TurboHF 637813190 63.8 259.52 275.07 zlibh 13k 639033488 63.9 184.66 54.54 FastAC 639048689 63.9 345.25 490.27 fse 32k 639740506 64.0 338.37 597.83 rans_static16 32k 640295271 64.0 711.43 1035.40 TurboHF 30K 640700579 64.1 208.21 170.72 fsc 32k 641305739 64.1 246.75 358.98 rANS_ryg 641412595 64.1 633.27 876.62 fsehuf 32k 642288251 64.3 140.46 634.95 rANS_ryg_sse4.1 643834665 64.4 129.31 100.62 FastHF 644090542 64.4 285.64 147.30 tornado huff 644883587 64.5 189.77 135.40 arith_static jamesB 648726912 64.9 84.87 48.89 polar tree 1000000000 100.0 7760.82 7832.90 memcpy

Size Ratio % C.MB/s D.MB/s Compressor enwik9bwt CPU: i7-2600k 4.4 GHz 2017-05

175943420 17.6 64.15 53.15 TurboRC bit o1 (sliding context) 183565924 18.4 37.41 49.21 fpaq0p_sh 183670473 18.4 22.38 14.95 fpaqc_mm 183896215 18.4 23.42 20.59 vecrc_sh 183901000 18.4 80.15 70.28 TurboAC bit o0 197653870 19.8 224.34 271.76 TurboANXXo1 215874337 21.6 15.59 1080.36 TurboANX,9 215877030 21.6 194.96 404.15 rans_static16o1 221038728 22.1 60.81 40.03 TurboAC byte 243033685 24.3 404.83 1625.87 TurboANX 247416080 24.7 260.13 412.21 rANS_Ryg 263031843 26.3 772.25 1100.41 TurboHF 8k 263945438 26.4 64.89 40.29 subotin 265110032 26.5 370.60 488.65 fse 32k 265538954 26.6 404.19 615.34 rans_static16 266129293 26.6 220.00 137.00 fsc 268113114 26.8 783.67 1108.92 TurboHF 284721687 28.5 307.39 336.12 zlibh 285724002 28.6 233.46 179.11 arith_static jamesB 289879586 29.0 689.13 1079.40 fsehuf 32k 297969227 29.8 247.55 72.38 FastAC 305848788 30.6 112.98 76.68 polar tree 335145308 33.5 170.13 123.45 FastHF 337038897 33.7 324.57 156.13 tornado huff 463797334 46.2 162.80 645.88 rANS_ryg_sse4.1 (SIMD) 1000000008 100.0 7943.18 7850.77 memcpy

Uniform distribution:enwik8

Size Ratio % C.MB/s D.MB/s Compressor RASPBERRY PI3 1.2GHz ARM gcc 6.3 64 bits 2018-04

61245756 61.2 9.46 8.52 TurboRC bit 61571700 61.6 9.68 7.32 TurboAC_byte 62677385 62.7 8.38 5.07 subotin 62753950 62.8 37.28 196.18 TurboANX 12 62782152 62.8 19.73 176.24 fpc 0 63188022 63.2 23.91 13.64 FastAC 63193639 63.2 41.94 67.29 zlibh 63202025 63.2 73.33 94.10 fse 32k 63287917 63.3 58.02 92.96 rans_static16 63376580 63.4 190.21 301.99 TurboHF 32 63415358 63.4 96.14 276.92 fpc 32 63420890 63.4 97.88 189.14 fsehuf 32 63648861 63.6 23.71 16.89 FastHF 100000004 100.0 1016.00 1016.00 memcpy

Entropy Coder Benchmark

Only the most efficient, fast or popular entropy coders are tested. 2 practical uniform and skewed distributions like that found in lz77, bwt, image and video compression.

- ANS : Asymmetric Numeral Systems All the following ANS entropy coders are block based : each block is encoded separately using the statistics from this block only

- TurboANX in LzTurbo v1.2 (JIT scalar/sse/avx2) - World's fastest and most efficient entropy coder. Automatic JIT switching between Scalar/SSE/AVX2. All modes 32/64 bits scalar, SSE and AVX2 are binary compatible. TurboANX optimal = Shortest path driven optimal partitioning (Variable length blocks).

- TurboANXX order o1 context - order o1 context entropy coder - rANS interleaved v14.02 - Range Asymmetric Nummeral System - FSC Finite State Coder v15.04 - Table Asymmetric Numeral Systems - FSE Finite State Entropy v18.04 - Table Asymmetric Numeral Systems - rANS Static v18.04 - Range Asymmetric Numeral Systems

References: - Asymmetric Numeral Systems - The use of Asymmetric Numeral Systems as an accurate replacement for Huffman Coding - List of Asymmetric Numeral Systems implementations - Wikipedia: Asymmetric Numeral Systems

- rANS : Adaptive Range Asymmetric Numeral Systems - TurboANXN - Full adaptive multisymbol scalar/SSE/AVX2 rANS

References:

- Models for adaptive arithmetic coding

- AOMedia AV1 entropy coder - entenc/entdec - Multisymbol (non-binary) adaptive entropy encoder

References: - Daala_EC in AV1

- ABS : Asymmetric Binary Systems - fpaqc_mm (Asymmetric Binary System) adaptive bitwise entropy coder

- Huffman Coding - TurboHF from LzTurbo v1.1 - Block based Huffman Coding. Decoding one Billion Symbols per second. TurboHF optimal = variable length block partitioning - FSEHUF Huffman Coding v18.05 - Block based Huffman Coding - FastHF - quasi-adaptive Huffman Coding - FPC - Fast Prefix Coder - Block based Huffman Coding. - Polar Tree - Non-Huffman binary tree - Tornado v0.6 - semi-adaptive Huffman Coder - zlibh - Huffman Coding

*quasi-adaptive = semi-adaptive model with "periodic rescale" or "deferred summation".

The benchmark shows that block based coders are more efficient

References:

- It's Been 1,000,000 Years Since Huffman + Video - In-place calculation of minimum-redundancy codes + Software - A Memory-efficient Huffman Decoding Algorithm + Sildes

- Arithmetic Coding / Range Coder - TurboRC bit in LzTurbo v1.2 - bitwise Range Coder - vecrc_sh: vectorized range coder" - bitwise Range Coder - fpaq0p_sh: simple binary rangecoder - bitwise Range Coder - TurboAC byte in LzTurbo - bytewise Range Coder - Subotin - bytewise Range Coder - FastAC - quasi-adaptive (or semi-adaptive) Range Coder

References:

- Arithmetic Coding revealed A guided tour from theory to praxis incl. c++ code - Introduction to Arithmetic Coding - Theory and Practice (FastAC) - Context Adaptive Binary Arithmetic Coding - Arithmetic Coding - Data Compression With Arithmetic Coding - Entrop Coding/Non-binary Arithmetic Coding

Benchmark ForumLast update: 17 MAY 2018