Asymmetric numeral systems

advertisement
ASYMMETRIC NUMERAL SYSTEMS
AS ACCURATE RELACEMENT FOR HUFFMAN CODING
Huffman coding
– fast, but operates on integer number of bits:
(or any prefix codes,
approximates probabilities with powers of ½,
Golomb, Elias, unary) getting suboptimal compression ratio
Arithmetic coding
(or Range coding)
– accurate probabilities, but many times slower
(needs more computation, battery drain)
Asymmetric Numeral Systems (ANS) – accurate and faster than Huffman:
we construct low state automaton optimized for given probability distribution
(also allows for simultaneous encryption)
JarosΕ‚aw Duda
Warszawa, 31-01-2015
1
Huffman vs ANS in compressors (LZ + entropy coder):
from Matt Mahoney benchmarks http://mattmahoney.net/dc/text.html
Enwiki8
πŸπŸŽπŸ– bytes
26,915,461
27,111,239
27,835,431
29,758,785
30,979,376
34,907,478
36,445,248
36,556,552
40,024,854
40,565,268
encode time decode time
[ns/byte]
[ns/byte]
582
2.8
378
10
1004
31
228
30
19
2.7
24
3.5
101
17
171
50
7.7
3.8
54
31
lzturbo 1.2 –39
LZA 0.70b –mx9 –b7 –h7
WinRAR 5.00 –ma5 –m5
WinRAR 5.00 –ma5 –m2
lzturbo 1.2 –32
zhuff 0.97 –c2
gzip 1.3.5 –9
pkzip 2.0.4 –ex
ZSTD 0.0.1
WinRAR 5.00 –ma5 –m1
zhuff, ZSTD (Yann Collet): LZ4 + tANS (switched from Huffman)
lzturbo (Hamid Bouzidi): LZ + tANS (switched from Huffman)
LZA (Nania Francesco): LZ + rANS
(switched from range coding)
e.g. lzturbo vs gzip: better compression, 5x faster encoding, 6x faster decoding
saving time and energy in very frequent task
2
We need 𝒏 bits of information to choose one of πŸπ’ possibilities.
For length 𝑛 0/1 sequences with 𝑝𝑛 of “1”, how many bits we need to choose one?
Entropy coder: encodes sequence with (𝑝𝑠 )𝑠=0..π‘š−1 probability distribution
using asymptotically at least 𝑯 = ∑𝒔 𝒑𝒔 π₯𝐠(𝟏/𝒑𝒔 ) bits/symbol (𝐻 ≤ lg(π‘š))
Seen as weighted average: symbol of probability 𝒑 contains π₯𝐠(𝟏/𝒑) bits.
Encoding (𝑝𝑠 ) distribution with entropy coder optimal for (π‘žπ‘  ) distribution costs
Δ𝐻 = ∑𝑠 𝑝𝑠 lg(1/π‘žπ‘  ) − ∑𝑠 𝑝𝑠 lg(1/𝑝𝑠 ) =
𝑝𝑠
∑𝑠 𝑝𝑠 lg ( )
π‘žπ‘ 
≈
1
ln(4)
∑𝑠
(𝑝𝑠 −π‘žπ‘  )2
𝑝𝑠
more bits/symbol - so called Kullback-Leibler “distance” (not symmetric).
3
Huffman coding (HC), prefix codes:
most of everyday compression,
e.g. zip, gzip, cab, jpeg, gif, png, tiff, pdf, mp3…
example of Huffman penalty
for truncated 𝜌(1 − 𝜌)𝑛 distribution
(can’t use less than 1 bits/symbol)
Zlibh (the fastest generally available implementation):
Encoding ~ 320 MB/s
Decoding ~ 300-350 MB/s
(/core, 3GHz)
Range coding (RC): large alphabet arithmetic
coding, needs multiplication, e.g. 7-Zip, VP Google
video codecs (e.g. YouTube, Hangouts).
Encoding ~ 100-150 MB/s
tradeoffs
Decoding ~ 80 MB/s
(binary) Arithmetic coding (AC):
H.264, H.265 video, ultracompressors e.g. PAQ
Encoding/decoding ~ 20-30MB/s
ratio →
𝜌 = 0.5
𝜌 = 0.6
𝜌 = 0.7
𝜌 = 0.8
𝜌 = 0.9
𝜌 = 0.95
𝜌 = 0.99
8/H
4.001
4.935
6.344
8.851
15.31
26.41
96.95
zlibh
3.99
4.78
5.58
6.38
7.18
7.58
7.90
FSE
4.00
4.93
6.33
8.84
15.29
26.38
96.43
Huffman: 1byte → at least 1 bit
ratio ≤ 8 here
Asymmetric Numeral Systems (ANS)
tabled (tANS) - without multiplication
FSE implementation of tANS: Encoding ~ 350 MB/s Decoding ~ 500 MB/s
RC → ANS: ~7x decoding speedup, no multiplication (switched e.g. in LZA compressor)
HC → ANS means better compression and ~ 1.5x decoding speedup (e.g. zhuff, lzturbo)
4
Operating on fractional number of bits
restricting range 𝑁 to length 𝐿 subrange
contains π₯𝐠(𝑡/𝑳) bits
number π‘₯
contains π₯𝐠(𝒙) bits
adding symbol of probability 𝑝 - containing π₯𝐠(𝟏/𝒑) bits
lg(𝑁/𝐿) + lg(1/𝑝) ≈ lg(𝐿/𝐿′ ) for 𝑳′ ≈ 𝒑 ⋅ 𝑳 lg(π‘₯) + lg(1/𝑝) = lg(π‘₯′) for 𝒙′ ≈ 𝒙/𝒑
5
uABS - uniform binary variant (π’œ = {0,1}) - extremely accurate
Assume binary alphabet, 𝒑 ≔ 𝐏𝐫(𝟏), denote 𝒙𝒔 = {π’š < 𝒙: 𝒔(π’š) = 𝒔} ≈ 𝒙𝒑𝒔
For uniform symbol distribution we can choose:
π’™πŸ = ⌈𝒙𝒑⌉
π’™πŸŽ = 𝒙 − π’™πŸ = 𝒙 − ⌈𝒙𝒑⌉
𝑠(π‘₯) = 1 if there is jump on next position: 𝑠 = 𝑠(π‘₯) = ⌈(π‘₯ + 1)𝑝⌉ − ⌈π‘₯𝑝⌉
decoding function: 𝐷(π‘₯) = (𝑠, π‘₯𝑠 )
its inverse – coding function:
π‘₯+1
𝐢 (0, π‘₯) = ⌈1−𝑝⌉ − 1
π‘₯
𝐢 (1, π‘₯) = ⌊ ⌋
𝑝
For 𝑝 = Pr(1) = 1 − Pr(0) = 0.3:
6
uABS
xp = (uint64_t)x * p0;
out[i]=((xp & mask) >= p0);
xp >>= 16;
x = out[i] ? xp : x - xp;
rABS
arithmetic coding
xf = x & mask;
//32bit split = low + ((uint64_t)
(hi - low) * p0 >> 16);
xn = p0 * (x >> 16);
out[i] = (x > split);
if (xfrac < p0)
{ x = xn + xf; out[i] = 0 } if (out[i]) {low = split + 1}
else {hi = split;}
else {x-=xn+p0;out[i] = 1 }
if (x < 0x10000) { x = (x << 16) | *ptr++;
//32bit x, 16bit renormalization, mask = 0xffff
}
if ((low ^ hi) < 0x10000)
{ x = (x << 16) | *ptr++;
low <<= 16;
hi = (hi << 16) | mask }
7
RENORMALIZATION to prevent π‘₯ → ∞
Enforce π‘₯ ∈ 𝐼 = {𝐿, … ,2𝐿 − 1} by
transmitting lowest bits to bitstream
“buffer” 𝒙 contains π₯𝐠(𝒙) bits of information
– produce bits when accumulated
Symbol of π‘ƒπ‘Ÿ = 𝑝 contains lg(1/𝑝) bits:
lg(π‘₯) → lg(π‘₯) + lg(1/𝑝)
modulo 1
8
9
tABS
test all possible
symbol distributions
for binary alphabet
store tables for
quantized probabilities (p)
e.g. 16 ⋅ 16 = 256 bytes
← for 16 state, Δ𝐻 ≈ 0.001
H.264 “M decoder” (arith. coding)
tABS
𝑑 = π‘‘π‘’π‘π‘œπ‘‘π‘–π‘›π‘”π‘‡π‘Žπ‘π‘™π‘’[𝑝][𝑋];
𝑋 = 𝑑. 𝑛𝑒𝑀𝑋 + readBits(𝑑. 𝑛𝑏𝐡𝑖𝑑𝑠);
useSymbol(𝑑. π‘ π‘¦π‘šπ‘π‘œπ‘™);
no branches,
no bit-by-bit renormalization
state is single number (e.g. SIMD)
10
Additional tANS advantage – simultaneous encryption
we can use huge freedom while initialization: choosing symbol distribution –
slightly disturb 𝒔(𝒙) using PRNG initialized with cryptographic key
ADVANTAGES comparing to standard (symmetric) cryptography:
standard, e.g. DES, AES ANS based cryptography (initialized)
based on
XOR, permutations
highly nonlinear operations
bit blocks
fixed length
pseudorandomly varying lengths
“brute force”
just start decoding
perform initialization first for new
or QC attacks
to test cryptokey
cryptokey, fixed to need e.g. 0.1s
speed
online calculation
most calculations while initialization
entropy
operates on bits
operates on any input distribution
11
Entropy coder is usually the bottleneck of compression
ANS offers fast accurate processing of large alphabet without multiplication
accurate and faster replacement for Huffman coding,
- many (~7) times faster replacement for Range coding,
- faster decoding in adaptive binary case,
- can include checksum for free (if final decoding state is right),
- can simultaneously encrypt the message.
Perfect for: DCT/wavelet coefficients, lossless image compression
Perfect with: Lempel-Ziv, Burrows-Wheeler Transform
Further research:
- applications – ANS improves both speed and compression ratio,
- finding symbol distribution 𝑠(π‘₯): with minimal Δ𝐻 (and quickly),
- optimal compression of used probability distribution to reduce headers,
- maybe finding other low state entropy coding family, like forward decoded,
- applying cryptographic capabilities (also without entropy coding):
cheap compression+encryption: HDD, wearables, sensors, IoT, RFIDs.
- ... ?
12
Download