ASYMMETRIC NUMERAL SYSTEMS: ADDING FRACTIONAL BITS TO HUFFMAN CODER Huffman coding – fast, but operates on integer number of bits: approximates probabilities with powers of ½, getting inferior compression rate Arithmetic coding – accurate, but many times slower (computationally more demanding) Asymmetric Numeral Systems (ANS) – accurate and faster than Huffman we construct low state automaton optimized for given probability distribution Jarek Duda Kraków, 28-04-2014 1 We need π bits of information to choose one of ππ possibilities. For length π 0/1 sequences with ππ of “1”, how many bits we need to choose one? Entropy coder: encodes sequence with (ππ )π=1..π probability distribution using asymptotically at least π― = ∑π π=π ππ π₯π (π/ππ ) bits/symbol (π» ≤ lg(π)) Seen as weighted average: symbol of probability π contains π₯π (π/π) bits. Encoding (ππ ) distribution with entropy coder optimal for (ππ ) distribution costs Δπ» = ∑π ππ lg(1/ππ ) − ∑π ππ lg(1/ππ ) = ππ ∑π ππ lg ( ) ππ ≈ 1 ln(4) ∑π (ππ −ππ )2 ππ more bits/symbol - so called Kullback-Leibler “distance” (not symmetric). 2 Symbol frequencies from a version of the British National Corpus containing 67 million words ( http://www.yorku.ca/mack/uist2011.html#f1 ): Uniform: π₯π (ππ) ≈ π. ππ bits/symbol (π₯ → 27π₯ + π ) Huffman uses: π» ′ = ∑π ππ ππ ≈ π. ππ bits/symbol Shannon: π» = ∑π ππ π₯π (π/ππ ) ≈ π. ππ bits/symbol We can theoretically improve by ≈ π% here π«π― ≈ π. ππ bits/symbol order 1 Markov: ~3.3 bits/symbol order 2: ~3.1 bits/symbol, word: ~2.1 bits/symbol Currently the best text compression: “durlica’kingsize” (http://mattmahoney.net/dc/text.html ) 109 bytes of text from Wikipedia (enwik9) into 127784888 bytes: ≈ 1bit/symbol … (lossy) video compression: ~ 1000× reduction 3 Huffman codes – encode symbols as bit sequences Perfect if symbol probabilities are powers of 2 e.g. p(A) =1/2, p(B)=1/4, p(C)=1/4 0 A 1 0 1 B C A C 0 11 B 10 We can reduce Δπ» by grouping π symbols together (alphabet size 2π or 3π ): Generally, the maximal depth (π = max ππ ) grows proportionally to π here, π Δπ» drops approximately proportionally to 1/π, π«π― ∝ π/πΉ (= 1/lg(πΏ)) for ANS: π«π― ∝ π−πΉ (= 1/πΏ2 for πΏ = 2R is the number of states) 4 Fast Huffman decoding step for maximal depth π : let π contain last π bits (requires πππππππππππππ of size proportional to π³ = ππΉ ): π‘ = πππππππππππππ[π]; // π ∈ {0, … , 2π − 1} is current state useSymbol(π‘. π π¦ππππ); // use or store decoded symbol π = π‘. πππ€π + readBits(π‘. πππ΅ππ‘π ); // state transition where π‘. πππ€π for Huffman are unused bits of the state, shifted to oldest position: π‘. πππ€π = (π βͺ πππ΅ππ‘π ) & (2π − 1) (mod(π, 2π ) = π & (2π − 1)) tANS: the same decoder, different π‘. πππ€π: not only shifts the unused bits, but also modifies them accordingly to the remaining fractional bits (lg(1/π)): π = πΏ + π³ ∈ {π³, … , ππ³ − π} buffer containing ≈ π₯π (π) ∈ [πΉ, πΉ + π) bits: 5 Operating on fractional number of bits restricting range πΏ to length π subrange number π₯ contains lg(πΏ/π) bits contains lg(π₯) bits adding symbol of probability π - containing lg(1/π) bits lg(πΏ/π) + lg(1/π) ≈ lg(πΏ/π′ ) bits lg(π₯) + lg(1/π) = lg(π₯′) for π′ ≈ π ⋅ π for π′ ≈ π/π 6 Asymmetric numeral systems – redefine even/odd numbers: symbol distribution: π : β → π (π – alphabet, e.g. {0,1}) ( π (π₯ ) = mod(π₯, π) for base π numeral system: πΆ (π , π₯ ) = ππ₯ + π ) Should be still uniformly distributed – but with density ππ : # { 0 ≤ π¦ < π₯: π (π¦) = π } ≈ π₯ππ then π₯ becomes π₯-th appearance of given symbol: π·(π₯′) = (π (π₯ ′ ), # {0 ≤ π¦ < π₯ ′ : π (π¦) = π (π₯ ′ )}) πΆ(π·(π₯ ′ )) = π₯′ π·(πΆ (π , π₯)) = (π , π₯) π₯ ′ ≈ π₯/ππ example: range asymmetric binary systems (rABS) Pr(0) = 1/4 Pr(1) = 3/4 - take base 4 system and merge 3 digits, cyclic (0123) symbol distribution π becomes cyclic (0111): π (π₯) = 0 if mod(π₯, 4) = 0, else 1 to decode or encode 1, localize quadruple (⌊π₯/4⌋ or ⌊π₯/3⌋) ππ π (π₯) = 0, π·(π₯) = (0, ⌊π₯/4⌋) else π·(π₯) = (1,3⌊π₯/4⌋ + mod(π₯, 4) − 1) πΆ (0, π₯) = 4π₯ πΆ (1, π₯) = 4⌊π₯/3⌋ + mod(π₯, 3) + 1 7 rANS - range variant for large alphabet π = {0, . . , m − 1} assume Pr(π ) = ππ /2π ππ : = π0 + π1 + β― + ππ −1 start with base 2π numeral system and merge length ππ ranges for π₯ ∈ {0,1, … , 2π − 1}, π(π) = π¦ππ±{π: ππ ≤ π} encoding: πΆ (π , π₯ ) = ⌊π₯/ππ ⌋ βͺ π + mod(π₯, ππ ) + ππ decoding: π = π (π₯ & (2π − 1)) (e.g. tabled, alias method) π·(π₯ ) = (π , ππ ⋅ (π₯ β« π) + ( π₯ & (2π − 1)) − ππ ) Similar to Range Coding, but decoding has 1 multiplication (instead of 2), and state is 1 number (instead of 2), making it convenient for SIMD vectorization. ( https://github.com/rygorous/ryg_rans ). Additionally, we can use alias method to store π for very precise probabilities. It is also convenient for dynamical updating. ‘Alias’ method: rearrange probability distribution into π buckets: containing the primary symbol and eventually a single ‘alias’ symbol 8 uABS - uniform binary variant (π = {0,1}) - extremely accurate Assume binary alphabet, π β ππ«(π), denote ππ = {π < π: π(π) = π} ≈ πππ For uniform symbol distribution we can choose: ππ = ⌈ππ⌉ ππ = π − ππ = π − ⌈ππ⌉ π (π₯) = 1 if there is jump on next position: π = π (π₯) = ⌈(π₯ + 1)π⌉ − ⌈π₯π⌉ decoding function: π· (π₯) = (π , π₯π ) its inverse – coding function: π₯+1 πΆ (0, π₯) = ⌈ ⌉ − 1 1−π π₯ πΆ (1, π₯) = ⌊ ⌋ π For π = Pr(1) = 1 − Pr(0) = 0.3: 9 Stream version – renormalization Currently: we encode using succeeding πΆ functions into huge number π₯, then decode (in reverse direction!) using succeeding π·. Like in arithmetic coding, we need renormalization to limit working precision enforce π ∈ π° = {π³, … , ππ³ − π} by transferring base π youngest digits: ANS decoding step from state π₯ (π , π₯) = π·(π₯); useSymbol(π ); while π < π³, π = ππ + π«πππππ’π π’π(); encoding step for symbol π from state π₯ while π > ππππΏ[π] // = ππ³π − π {writeDigit(mod(π, π)); π = ⌊π/π⌋}; π₯ = πΆ(π , π₯); For unique decoding, we need to ensure that there is a single way to perform above loops: πΌ = {πΏ, … , ππΏ − 1}, πΌπ = {πΏπ , … , ππΏπ − 1} where πΌπ = {π₯: πΆ (π , π₯) ∈ πΌ} Fulfilled e.g. for - rABS/rANS when ππ = ππ /2π has 1/πΏ accuracy: 2π divides πΏ, - uABS when π has 1/πΏ accuracy: π⌈πΏπ⌉ = ⌈ππΏπ⌉, - in tabled variants (tABS/tANS) it will be fulfilled by construction. 10 Single step of stream version: to get π₯ ∈ πΌ to πΌπ = {πΏπ , … , ππΏπ − 1}, we need to transfer π digits: π π₯ → (πΆ(π , ⌊π₯/π π ⌋), mod(π₯, π π )) where π = ⌊log π (π₯/πΏπ )⌋ for given π , observe that π can obtain one of two values: π = ππ or π = ππ − 1 for ππ = −⌊π₯π¨π π (ππ )⌋ = −⌊log π (πΏπ /πΏ)⌋ e. g. : π = 2, ππ = 3, πΏπ = 13, πΏ = 66, πππ π₯ + 3 = 115, π₯ = 14, ππ = 13/66: 11 General picture: encoder prepares before consuming succeeding symbol decoder produces symbol, then consumes succeeding digits Decoding is in reversed direction: we have stack of symbols (LIFO) - the final state has to be stored, but we can write information in initial state - encoding should be made in backward direction, but use context from perspective of future forward decoding. 12 In single step (πΌ = {πΏ, … , ππΏ − 1}): π₯π (π) → ≈ π₯π (π) + π₯π (π/π) modulo π₯π (π) Three sources of unpredictability/chaosity: 1) Asymmetry: behavior strongly dependent on chosen symbol – small difference changes decoded symbol and so the entire behavior. 2) Ergodicity: usually log π (1/π) is irrational – succeeding iterations cover entire range. 3) Diffusivity: πΆ(π , π₯) is close but not exactly π₯/ππ – there is additional ‘diffusion’ around expected value So lg(π₯) ∈ [lg(πΏ) , lg(πΏ) + lg(π)) has nearly uniform distribution – π₯ has approximately: ππ«(π) ∝ π/π probability distribution – contains lg(1/Pr(π₯)) ≈ lg(π₯) + const bits of information. 13 Redundancy: instead of ππ we use ππ = π₯/πΆ(π , π₯) probability: 2 2 ( ) (ππ − ππ ) (ππ π₯ ) 1 1 Δπ» ≈ ∑ … Δπ» ≈ ∑ Pr(π₯) ln(4) ππ ln(4) ππ π π ,π₯ where inaccuracy: ππ (π₯) = ππ − π₯/πΆ(π , π₯) drops like 1/πΏ: Δπ» ≤ 1 ln(4)⋅π³π 1 1 π 1−π ( + ) for uABS, Δπ» ≤ π ln(4)⋅π³π ∑π 1 ππ for rANS Denoting π³ = ππΉ , π«π― ≈ π−πΉ and grows proportionally to (ππ₯π©π‘ππππ π¬π’π³π)π . for uABS: 14 Tabled variants: tANS, tABS (choose πΏ = 2π ) πΌ = {πΏ, … ,2πΏ − 1}, πΌπ = {πΏπ , … ,2πΏπ − 1} where πΌπ = {π₯: πΆ (π , π₯) ∈ πΌ} we will choose π symbol distribution (symbol spread): #{π ∈ π°: π(π) = π} = π³π approximating probabilities: ππ ≈ πΏπ /πΏ Fast pseudorandom spread (https://github.com/Cyan4973/FiniteStateEntropy ): π π‘ππ = 5/8 πΏ + 3; // π π‘ππ chosen to cover the whole range π = 0; // πΏ = π − π³ ∈ {π, … , π³ − π} for table handling for π = 0 to π − 1 do {π π¦ππππ[π] = π ; π = mod(π + π π‘ππ, πΏ); } // ππππππ[πΏ] = π(πΏ + π³) Then finding table for {π‘ = πππππππππππππ(π); useSymbol(π‘. π π¦ππππ ); π = π‘. πππ€π + readBits(π‘. πππ΅ππ‘π )} decoding: for π = 0 to π − 1 do πππ₯π‘[π ] = πΏπ ; // symbol appearance number for π = 0 to πΏ − 1 do // fill all positions {π‘. π π¦ππππ = π π¦ππππ[π]; // from symbol spread π₯ = πππ₯π‘[π‘. π π¦ππππ] + +; // use symbol and shift appearance π‘. πππ΅ππ‘π = π − ⌊lg(π₯ )⌋; // π³ = ππΉ π‘. πππ€π = (π₯ βͺ π‘. πππ΅ππ‘π ) − πΏ; πππππππππππππ[π] = π‘; } 15 Precise initialization (heuresis) 0.5+π ππ = { : π = 0, … , πΏπ − 1} ππ are uniform – we need to shift them to natural numbers. (priority queue with put, getminπ£) for π = 0 to π − 1 do put((0.5/ππ , π )); for π = 0 to πΏ − 1 do {(π£, π ) = getminπ£; put((π£ + 1/ππ , π )); π π¦ππππ [π] = π ; } π«π― drops like π/π³π L ∝ alphabet size 16 tABS and tuning test all possible symbol distributions for binary alphabet store tables for quantized probabilities (p) e.g. 16 ⋅ 16 = 256 bytes ← for 16 state, Δπ» ≈ 0.001 H.264 “M decoder” (arith. coding) tABS π‘ = πππππππππππππ[π][π]; π = π‘. πππ€π + readBits(π‘. πππ΅ππ‘π ); useSymbol(π‘. π π¦ππππ); no branches, no bit-by-bit renormalization state is single number (e.g. SIMD) 17 Additional tANS advantage – simultaneous encryption we can use huge freedom while initialization: choosing symbol distribution – slightly disturb π(π) using PRNG initialized with cryptographic key ADVANTAGES comparing to standard (symmetric) cryptography: standard, e.g. DES, AES ANS based cryptography (initialized) based on XOR, permutations highly nonlinear operations bit blocks fixed length pseudorandomly varying lengths “brute force” just start decoding perform initialization first for new or QC attacks to test cryptokey cryptokey, fixed to need e.g. 0.1s speed online calculation most calculations while initialization entropy operates on bits operates on any input distribution 18 Summary – we can construct accurate and extremely fast entropy coders: - accurate (and faster) replacement for Huffman coding, - many times faster replacement for Range coding, - faster decoding in adaptive binary case (but more complex encoding), - can simultaneously encrypt the message. Perfect for: DCT/wavelet coefficients, lossless image compression Perfect with: Lempel-Ziv, Burrows-Wheeler Transform ... and many others ... Further research: - finding symbol distribution π (π₯): with minimal Δπ» (and quickly), - tune accordingly to ππ ≈ πΏπ /πΏ approximation (e.g. very low probable symbols should have single appearance at the end of table), - optimal compression of used probability distribution to reduce headers, - maybe finding other low state entropy coding family, like forward decoded, - applying cryptographic capabilities (also without entropy coding), - ... ? 19