Asymmetric Numeral Systems (ANS)

ASYMMETRIC NUMERAL SYSTEMS: ADDING FRACTIONAL BITS TO HUFFMAN CODER Huffman coding – fast, but operates on integer number of bits: approximates probabilities with powers of ½, getting inferior compression rate Arithmetic coding – accurate, but many times slower (computationally more demanding) Asymmetric Numeral Systems (ANS) – accurate and faster than Huffman we construct low state automaton optimized for given probability distribution Jarek Duda Kraków, 28-04-2014 1 We need 𝒏 bits of information to choose one of 𝟐𝒏 possibilities. For length 𝑛 0/1 sequences with 𝑝𝑛 of “1”, how many bits we need to choose one? Entropy coder: encodes sequence with (𝑝𝑖 )𝑖=1..𝑚 probability distribution using asymptotically at least 𝑯 = ∑𝒎 𝒊=𝟏 𝒑𝒊 𝐥𝐠(𝟏/𝒑𝒊 ) bits/symbol (𝐻 ≤ lg(𝑚)) Seen as weighted average: symbol of probability 𝒑 contains 𝐥𝐠(𝟏/𝒑) bits. Encoding (𝑝𝑖 ) distribution with entropy coder optimal for (𝑞𝑖 ) distribution costs Δ𝐻 = ∑𝑖 𝑝𝑖 lg(1/𝑞𝑖 ) − ∑𝑖 𝑝𝑖 lg(1/𝑝𝑖 ) = 𝑝𝑖 ∑𝑖 𝑝𝑖 lg ( ) 𝑞𝑖 ≈ 1 ln(4) ∑𝑖 (𝑝𝑖 −𝑞𝑖 )2 𝑝𝑖 more bits/symbol - so called Kullback-Leibler “distance” (not symmetric). 2 Symbol frequencies from a version of the British National Corpus containing 67 million words ( http://www.yorku.ca/mack/uist2011.html#f1 ): Uniform: 𝐥𝐠(𝟐𝟕) ≈ 𝟒. 𝟕𝟓 bits/symbol (𝑥 → 27𝑥 + 𝑠) Huffman uses: 𝐻 ′ = ∑𝑖 𝑝𝑖 𝒓𝒊 ≈ 𝟒. 𝟏𝟐 bits/symbol Shannon: 𝐻 = ∑𝑖 𝑝𝑖 𝐥𝐠(𝟏/𝒑𝒊 ) ≈ 𝟒. 𝟎𝟖 bits/symbol We can theoretically improve by ≈ 𝟏% here 𝚫𝑯 ≈ 𝟎. 𝟎𝟒 bits/symbol order 1 Markov: ~3.3 bits/symbol order 2: ~3.1 bits/symbol, word: ~2.1 bits/symbol Currently the best text compression: “durlica’kingsize” (http://mattmahoney.net/dc/text.html ) 109 bytes of text from Wikipedia (enwik9) into 127784888 bytes: ≈ 1bit/symbol … (lossy) video compression: ~ 1000× reduction 3 Huffman codes – encode symbols as bit sequences Perfect if symbol probabilities are powers of 2 e.g. p(A) =1/2, p(B)=1/4, p(C)=1/4 0 A 1 0 1 B C A C 0 11 B 10 We can reduce Δ𝐻 by grouping 𝑚 symbols together (alphabet size 2𝑚 or 3𝑚 ): Generally, the maximal depth (𝑅 = max 𝑟𝑖 ) grows proportionally to 𝑚 here, 𝑖 Δ𝐻 drops approximately proportionally to 1/𝑚, 𝚫𝑯 ∝ 𝟏/𝑹 (= 1/lg(𝐿)) for ANS: 𝚫𝑯 ∝ 𝟒−𝑹 (= 1/𝐿2 for 𝐿 = 2R is the number of states) 4 Fast Huffman decoding step for maximal depth 𝑅: let 𝑋 contain last 𝑅 bits (requires 𝑑𝑒𝑐𝑜𝑑𝑖𝑛𝑔𝑇𝑎𝑏𝑙𝑒 of size proportional to 𝑳 = 𝟐𝑹 ): 𝑡 = 𝑑𝑒𝑐𝑜𝑑𝑖𝑛𝑔𝑇𝑎𝑏𝑙𝑒[𝑋]; // 𝑋 ∈ {0, … , 2𝑅 − 1} is current state useSymbol(𝑡. 𝑠𝑦𝑚𝑏𝑜𝑙); // use or store decoded symbol 𝑋 = 𝑡. 𝑛𝑒𝑤𝑋 + readBits(𝑡. 𝑛𝑏𝐵𝑖𝑡𝑠); // state transition where 𝑡. 𝑛𝑒𝑤𝑋 for Huffman are unused bits of the state, shifted to oldest position: 𝑡. 𝑛𝑒𝑤𝑋 = (𝑋 ≪ 𝑛𝑏𝐵𝑖𝑡𝑠) & (2𝑅 − 1) (mod(𝑎, 2𝑅 ) = 𝑎 & (2𝑅 − 1)) tANS: the same decoder, different 𝑡. 𝑛𝑒𝑤𝑋: not only shifts the unused bits, but also modifies them accordingly to the remaining fractional bits (lg(1/𝑝)): 𝒙 = 𝑿 + 𝑳 ∈ {𝑳, … , 𝟐𝑳 − 𝟏} buffer containing ≈ 𝐥𝐠(𝒙) ∈ [𝑹, 𝑹 + 𝟏) bits: 5 Operating on fractional number of bits restricting range 𝐿 to length 𝑙 subrange number 𝑥 contains lg(𝐿/𝑙) bits contains lg(𝑥) bits adding symbol of probability 𝑝 - containing lg(1/𝑝) bits lg(𝐿/𝑙) + lg(1/𝑝) ≈ lg(𝐿/𝑙′ ) bits lg(𝑥) + lg(1/𝑝) = lg(𝑥′) for 𝒍′ ≈ 𝒑 ⋅ 𝒍 for 𝒙′ ≈ 𝒙/𝒑 6 Asymmetric numeral systems – redefine even/odd numbers: symbol distribution: 𝑠: ℕ → 𝒜 (𝒜 – alphabet, e.g. {0,1}) ( 𝑠(𝑥 ) = mod(𝑥, 𝑏) for base 𝑏 numeral system: 𝐶 (𝑠, 𝑥 ) = 𝑏𝑥 + 𝑠) Should be still uniformly distributed – but with density 𝒑𝒔 : # { 0 ≤ 𝑦 < 𝑥: 𝑠(𝑦) = 𝑠} ≈ 𝑥𝑝𝑠 then 𝑥 becomes 𝑥-th appearance of given symbol: 𝐷(𝑥′) = (𝑠(𝑥 ′ ), # {0 ≤ 𝑦 < 𝑥 ′ : 𝑠(𝑦) = 𝑠(𝑥 ′ )}) 𝐶(𝐷(𝑥 ′ )) = 𝑥′ 𝐷(𝐶 (𝑠, 𝑥)) = (𝑠, 𝑥) 𝑥 ′ ≈ 𝑥/𝑝𝑠 example: range asymmetric binary systems (rABS) Pr(0) = 1/4 Pr(1) = 3/4 - take base 4 system and merge 3 digits, cyclic (0123) symbol distribution 𝑠 becomes cyclic (0111): 𝑠(𝑥) = 0 if mod(𝑥, 4) = 0, else 1 to decode or encode 1, localize quadruple (⌊𝑥/4⌋ or ⌊𝑥/3⌋) 𝑖𝑓 𝑠(𝑥) = 0, 𝐷(𝑥) = (0, ⌊𝑥/4⌋) else 𝐷(𝑥) = (1,3⌊𝑥/4⌋ + mod(𝑥, 4) − 1) 𝐶 (0, 𝑥) = 4𝑥 𝐶 (1, 𝑥) = 4⌊𝑥/3⌋ + mod(𝑥, 3) + 1 7 rANS - range variant for large alphabet 𝒜 = {0, . . , m − 1} assume Pr(𝑠) = 𝑓𝑠 /2𝑛 𝑐𝑠 : = 𝑓0 + 𝑓1 + ⋯ + 𝑓𝑠−1 start with base 2𝑛 numeral system and merge length 𝑓𝑠 ranges for 𝑥 ∈ {0,1, … , 2𝑛 − 1}, 𝒔(𝒙) = 𝐦𝐚𝐱{𝒔: 𝒄𝒔 ≤ 𝒙} encoding: 𝐶 (𝑠, 𝑥 ) = ⌊𝑥/𝑓𝑠 ⌋ ≪ 𝑛 + mod(𝑥, 𝑓𝑠 ) + 𝑐𝑠 decoding: 𝑠 = 𝑠(𝑥 & (2𝑛 − 1)) (e.g. tabled, alias method) 𝐷(𝑥 ) = (𝑠, 𝑓𝑠 ⋅ (𝑥 ≫ 𝑛) + ( 𝑥 & (2𝑛 − 1)) − 𝑐𝑠 ) Similar to Range Coding, but decoding has 1 multiplication (instead of 2), and state is 1 number (instead of 2), making it convenient for SIMD vectorization. ( https://github.com/rygorous/ryg_rans ). Additionally, we can use alias method to store 𝑠 for very precise probabilities. It is also convenient for dynamical updating. ‘Alias’ method: rearrange probability distribution into 𝑚 buckets: containing the primary symbol and eventually a single ‘alias’ symbol 8 uABS - uniform binary variant (𝒜 = {0,1}) - extremely accurate Assume binary alphabet, 𝒑 ≔ 𝐏𝐫(𝟏), denote 𝒙𝒔 = {𝒚 < 𝒙: 𝒔(𝒚) = 𝒔} ≈ 𝒙𝒑𝒔 For uniform symbol distribution we can choose: 𝒙𝟏 = ⌈𝒙𝒑⌉ 𝒙𝟎 = 𝒙 − 𝒙𝟏 = 𝒙 − ⌈𝒙𝒑⌉ 𝑠(𝑥) = 1 if there is jump on next position: 𝑠 = 𝑠(𝑥) = ⌈(𝑥 + 1)𝑝⌉ − ⌈𝑥𝑝⌉ decoding function: 𝐷 (𝑥) = (𝑠, 𝑥𝑠 ) its inverse – coding function: 𝑥+1 𝐶 (0, 𝑥) = ⌈ ⌉ − 1 1−𝑝 𝑥 𝐶 (1, 𝑥) = ⌊ ⌋ 𝑝 For 𝑝 = Pr(1) = 1 − Pr(0) = 0.3: 9 Stream version – renormalization Currently: we encode using succeeding 𝐶 functions into huge number 𝑥, then decode (in reverse direction!) using succeeding 𝐷. Like in arithmetic coding, we need renormalization to limit working precision enforce 𝒙 ∈ 𝑰 = {𝑳, … , 𝒃𝑳 − 𝟏} by transferring base 𝒃 youngest digits: ANS decoding step from state 𝑥 (𝑠, 𝑥) = 𝐷(𝑥); useSymbol(𝑠); while 𝒙 < 𝑳, 𝒙 = 𝒃𝒙 + 𝐫𝐞𝐚𝐝𝐃𝐢𝐠𝐢𝐭(); encoding step for symbol 𝑠 from state 𝑥 while 𝒙 > 𝒎𝒂𝒙𝑿[𝒔] // = 𝒃𝑳𝒔 − 𝟏 {writeDigit(mod(𝒙, 𝒃)); 𝒙 = ⌊𝒙/𝒃⌋}; 𝑥 = 𝐶(𝑠, 𝑥); For unique decoding, we need to ensure that there is a single way to perform above loops: 𝐼 = {𝐿, … , 𝑏𝐿 − 1}, 𝐼𝑠 = {𝐿𝑠 , … , 𝑏𝐿𝑠 − 1} where 𝐼𝑠 = {𝑥: 𝐶 (𝑠, 𝑥) ∈ 𝐼} Fulfilled e.g. for - rABS/rANS when 𝑝𝑠 = 𝑓𝑠 /2𝑛 has 1/𝐿 accuracy: 2𝑛 divides 𝐿, - uABS when 𝑝 has 1/𝐿 accuracy: 𝑏⌈𝐿𝑝⌉ = ⌈𝑏𝐿𝑝⌉, - in tabled variants (tABS/tANS) it will be fulfilled by construction. 10 Single step of stream version: to get 𝑥 ∈ 𝐼 to 𝐼𝑠 = {𝐿𝑠 , … , 𝑏𝐿𝑠 − 1}, we need to transfer 𝑘 digits: 𝑠 𝑥 → (𝐶(𝑠, ⌊𝑥/𝑏 𝑘 ⌋), mod(𝑥, 𝑏 𝑘 )) where 𝑘 = ⌊log 𝑏 (𝑥/𝐿𝑠 )⌋ for given 𝑠, observe that 𝑘 can obtain one of two values: 𝑘 = 𝑘𝑠 or 𝑘 = 𝑘𝑠 − 1 for 𝒌𝒔 = −⌊𝐥𝐨𝐠 𝒃 (𝒑𝒔 )⌋ = −⌊log 𝑏 (𝐿𝑠 /𝐿)⌋ e. g. : 𝑏 = 2, 𝑘𝑠 = 3, 𝐿𝑠 = 13, 𝐿 = 66, 𝑏𝑘𝑠 𝑥 + 3 = 115, 𝑥 = 14, 𝑝𝑠 = 13/66: 11 General picture: encoder prepares before consuming succeeding symbol decoder produces symbol, then consumes succeeding digits Decoding is in reversed direction: we have stack of symbols (LIFO) - the final state has to be stored, but we can write information in initial state - encoding should be made in backward direction, but use context from perspective of future forward decoding. 12 In single step (𝐼 = {𝐿, … , 𝑏𝐿 − 1}): 𝐥𝐠(𝒙) → ≈ 𝐥𝐠(𝒙) + 𝐥𝐠(𝟏/𝒑) modulo 𝐥𝐠(𝒃) Three sources of unpredictability/chaosity: 1) Asymmetry: behavior strongly dependent on chosen symbol – small difference changes decoded symbol and so the entire behavior. 2) Ergodicity: usually log 𝑏 (1/𝑝) is irrational – succeeding iterations cover entire range. 3) Diffusivity: 𝐶(𝑠, 𝑥) is close but not exactly 𝑥/𝑝𝑠 – there is additional ‘diffusion’ around expected value So lg(𝑥) ∈ [lg(𝐿) , lg(𝐿) + lg(𝑏)) has nearly uniform distribution – 𝑥 has approximately: 𝐏𝐫(𝒙) ∝ 𝟏/𝒙 probability distribution – contains lg(1/Pr(𝑥)) ≈ lg(𝑥) + const bits of information. 13 Redundancy: instead of 𝑝𝑠 we use 𝑞𝑠 = 𝑥/𝐶(𝑠, 𝑥) probability: 2 2 ( ) (𝑝𝑠 − 𝑞𝑠 ) (𝜖𝑠 𝑥 ) 1 1 Δ𝐻 ≈ ∑ … Δ𝐻 ≈ ∑ Pr(𝑥) ln(4) 𝑝𝑠 ln(4) 𝑝𝑠 𝑠 𝑠,𝑥 where inaccuracy: 𝜖𝑠 (𝑥) = 𝑝𝑠 − 𝑥/𝐶(𝑠, 𝑥) drops like 1/𝐿: Δ𝐻 ≤ 1 ln(4)⋅𝑳𝟐 1 1 𝑝 1−𝑝 ( + ) for uABS, Δ𝐻 ≤ 𝑚 ln(4)⋅𝑳𝟐 ∑𝑠 1 𝑝𝑠 for rANS Denoting 𝑳 = 𝟐𝑹 , 𝚫𝑯 ≈ 𝟒−𝑹 and grows proportionally to (𝐚𝐥𝐩𝐡𝐚𝐛𝐞𝐭 𝐬𝐢𝐳𝐞)𝟐 . for uABS: 14 Tabled variants: tANS, tABS (choose 𝐿 = 2𝑅 ) 𝐼 = {𝐿, … ,2𝐿 − 1}, 𝐼𝑠 = {𝐿𝑠 , … ,2𝐿𝑠 − 1} where 𝐼𝑠 = {𝑥: 𝐶 (𝑠, 𝑥) ∈ 𝐼} we will choose 𝑠 symbol distribution (symbol spread): #{𝒙 ∈ 𝑰: 𝒔(𝒙) = 𝒔} = 𝑳𝒔 approximating probabilities: 𝑝𝑠 ≈ 𝐿𝑠 /𝐿 Fast pseudorandom spread (https://github.com/Cyan4973/FiniteStateEntropy ): 𝑠𝑡𝑒𝑝 = 5/8 𝐿 + 3; // 𝑠𝑡𝑒𝑝 chosen to cover the whole range 𝑋 = 0; // 𝑿 = 𝒙 − 𝑳 ∈ {𝟎, … , 𝑳 − 𝟏} for table handling for 𝑠 = 0 to 𝑚 − 1 do {𝑠𝑦𝑚𝑏𝑜𝑙[𝑋] = 𝑠; 𝑋 = mod(𝑋 + 𝑠𝑡𝑒𝑝, 𝐿); } // 𝒔𝒚𝒎𝒃𝒐𝒍[𝑿] = 𝒔(𝑿 + 𝑳) Then finding table for {𝑡 = 𝑑𝑒𝑐𝑜𝑑𝑖𝑛𝑔𝑇𝑎𝑏𝑙𝑒(𝑋); useSymbol(𝑡. 𝑠𝑦𝑚𝑏𝑜𝑙 ); 𝑋 = 𝑡. 𝑛𝑒𝑤𝑋 + readBits(𝑡. 𝑛𝑏𝐵𝑖𝑡𝑠)} decoding: for 𝑠 = 0 to 𝑚 − 1 do 𝑛𝑒𝑥𝑡[𝑠] = 𝐿𝑠 ; // symbol appearance number for 𝑋 = 0 to 𝐿 − 1 do // fill all positions {𝑡. 𝑠𝑦𝑚𝑏𝑜𝑙 = 𝑠𝑦𝑚𝑏𝑜𝑙[𝑋]; // from symbol spread 𝑥 = 𝑛𝑒𝑥𝑡[𝑡. 𝑠𝑦𝑚𝑏𝑜𝑙] + +; // use symbol and shift appearance 𝑡. 𝑛𝑏𝐵𝑖𝑡𝑠 = 𝑅 − ⌊lg(𝑥 )⌋; // 𝑳 = 𝟐𝑹 𝑡. 𝑛𝑒𝑤𝑋 = (𝑥 ≪ 𝑡. 𝑛𝑏𝐵𝑖𝑡𝑠) − 𝐿; 𝑑𝑒𝑐𝑜𝑑𝑖𝑛𝑔𝑇𝑎𝑏𝑙𝑒[𝑋] = 𝑡; } 15 Precise initialization (heuresis) 0.5+𝑖 𝑁𝑠 = { : 𝑖 = 0, … , 𝐿𝑠 − 1} 𝑝𝑠 are uniform – we need to shift them to natural numbers. (priority queue with put, getmin𝑣) for 𝑠 = 0 to 𝑛 − 1 do put((0.5/𝑝𝑠 , 𝑠)); for 𝑋 = 0 to 𝐿 − 1 do {(𝑣, 𝑠) = getmin𝑣; put((𝑣 + 1/𝑝𝑠 , 𝑠)); 𝑠𝑦𝑚𝑏𝑜𝑙 [𝑋] = 𝑠; } 𝚫𝑯 drops like 𝟏/𝑳𝟐 L ∝ alphabet size 16 tABS and tuning test all possible symbol distributions for binary alphabet store tables for quantized probabilities (p) e.g. 16 ⋅ 16 = 256 bytes ← for 16 state, Δ𝐻 ≈ 0.001 H.264 “M decoder” (arith. coding) tABS 𝑡 = 𝑑𝑒𝑐𝑜𝑑𝑖𝑛𝑔𝑇𝑎𝑏𝑙𝑒[𝑝][𝑋]; 𝑋 = 𝑡. 𝑛𝑒𝑤𝑋 + readBits(𝑡. 𝑛𝑏𝐵𝑖𝑡𝑠); useSymbol(𝑡. 𝑠𝑦𝑚𝑏𝑜𝑙); no branches, no bit-by-bit renormalization state is single number (e.g. SIMD) 17 Additional tANS advantage – simultaneous encryption we can use huge freedom while initialization: choosing symbol distribution – slightly disturb 𝒔(𝒙) using PRNG initialized with cryptographic key ADVANTAGES comparing to standard (symmetric) cryptography: standard, e.g. DES, AES ANS based cryptography (initialized) based on XOR, permutations highly nonlinear operations bit blocks fixed length pseudorandomly varying lengths “brute force” just start decoding perform initialization first for new or QC attacks to test cryptokey cryptokey, fixed to need e.g. 0.1s speed online calculation most calculations while initialization entropy operates on bits operates on any input distribution 18 Summary – we can construct accurate and extremely fast entropy coders: - accurate (and faster) replacement for Huffman coding, - many times faster replacement for Range coding, - faster decoding in adaptive binary case (but more complex encoding), - can simultaneously encrypt the message. Perfect for: DCT/wavelet coefficients, lossless image compression Perfect with: Lempel-Ziv, Burrows-Wheeler Transform ... and many others ... Further research: - finding symbol distribution 𝑠(𝑥): with minimal Δ𝐻 (and quickly), - tune accordingly to 𝑝𝑠 ≈ 𝐿𝑠 /𝐿 approximation (e.g. very low probable symbols should have single appearance at the end of table), - optimal compression of used probability distribution to reduce headers, - maybe finding other low state entropy coding family, like forward decoded, - applying cryptographic capabilities (also without entropy coding), - ... ? 19

Asymmetric Numeral Systems (ANS)

Related documents

Products

Support

Asymmetric Numeral Systems (ANS)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib