First order context

advertisement
Why do we need compression?
 Why do we need image compression?
-Example: digital camera (4Mpixel)
Raw data – 24bits, 5.3M pixels  16M bytes
4G memory card ($10-30)  250 pictures
raw image
(16M bytes)
JPEG
encoder
compressed JPEG file
(1M bytes)
compression ratio=16  4000 pictures
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
1
Roadmap to Image Coding
• Introduction to data compression
– A modeling perspective
– Shannon’s entropy and Rate-Distortion Theory* (skipped)
– Arithmetic coding and context modeling
• Lossless image compression (covered in EE465)
– Spatially adaptive prediction algorithms
• Lossy image compression
– Before EZW era: first-generation wavelet coders
– After EZW era: second-generation wavelet coders
– A quick tour of JPEG2000
• New direction in image coding
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
2
Modeler’s View on Image Coding
Spatial-domain
models
Transform-domain
models
Stationary
process
Conventional
MED, GAP
Non-Stationary
process
Least-Square
Based Edge
Directed Prediction
Nonparametric
(patch-based)
Intra-coding in H.264
Stationary
GGD
First-generation
Wavelet coders
Non-Stationary
GGD
Second-generation
Wavelet coders
Patch-based
Transform models
Next-generation coders
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
3
Two Regimes
• Lossless coding
– No distortion is tolerable
– Decoded signal is mathematically identical to the
encoded one
• Lossy coding
– Distortion is allowed for the purpose of achieving
higher compression ratio
– Decoded signal should be perceptually similar to
the encoded one
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
4
Data Compression Basics
Discrete Source:
X is a discrete random variable
x  {1,2,..., N }
pi  prob ( x  i ), i  1,2,..., N
N
p
i 1
i
1
Shannon’s Source Entropy Formula
N
H ( X )    pi log 2 pi
i 1
(bits/sample)
or bps
weighting
coefficients
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
5
Code Redundancy
Theoretical bound
Practical performance
r  l  H (X )  0
N
Average code length:
l   pi li
i 1
N
1
H ( X )   pi log 2
pi
i 1
li: the length of
codeword assigned
to the i-th symbol
Note: if we represent each symbol by q bits (fixed length codes),
Then redundancy is simply q-H(X) bps
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
6
How to achieve source entropy?
discrete
source X
entropy
coding
binary
bit stream
P(X)
Note: The above entropy coding problem is based on simplified
assumptions are that discrete source X is memoryless and P(X)
is completely known. Those assumptions often do not hold for
real-world data such as images and we will discuss them later.
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
7
Two Goals of VLC design
• Achieve optimal code length (i.e., minimal redundancy)
For an event x with probability of p(x), the optimal
code-length is –log2p(x) , where x denotes the
smallest integer larger than x (e.g., 3.4=4 )
code redundancy: r  l  H ( X )  0
Unless probabilities of events are all power of 2,
we often have r>0
• Satisfy prefix condition
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
8
Prefix condition
No codeword is allowed to
be the prefix of any other
codeword.
1
11
0
10
1
01 00
……
codeword 1
0
11
111
10
110… …
codeword 2
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
9
Huffman Codes (Huffman’1952)
• Coding Procedures for an N-symbol source
– Source reduction
•
•
•
List all probabilities in a descending order
Merge the two symbols with smallest probabilities
into a new compound symbol
Repeat the above two steps for N-2 steps
– Codeword assignment
•
•
Start from the smallest source and work back to the
original source
Each merging point corresponds to a node in binary
codeword tree
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
10
A Toy Example
symbol x
e
a
i
o
u
p(x)
0.4 0.4
0.2 0.2
0.2
0.1
0.1
0.2
0.6 0
(aiou)
0.4
1
0.4
(iou)
0.2
0.4
0.2
(ou)
compound symbols
Source reduction
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
0
1
(aiou)
e
00
01
(iou) a
000
001
(ou)
i
0010 0011
o u
Codeword Assignment
11
Arithmetic Coding
• One of the major milestones in data compression (just
like Lempel-Ziv coding used in WinZIP)
• The building block of almost all existing compression
algorithms including text, audio, image and video
• Remarkably simple idea and ease of implementation
(especially computational efficiency in the special case
of binary arithmetic coding)
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
12
Basic Idea
• The input sequence will be mapped to a unique real
number on [0,1]
– The more symbols are coded, the smaller such interval
(therefore it takes more bits to represent the interval)
– The size of interval is proportional to the probability of the
whole sequence
• Note that we still assume source X is memoryless –
source with memory will be handled by context
modeling techniques next
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
13
Example
Alphabet: {E,Q,S,U,Z}
Input Sequence: SQUEEZ…
P(E)=0.429
P(Q)=0.142
P(S)=0.143
P(U)=0.143
P(Z)=0.143
P(SQUEEZ)=
P(S)P(Q)P(U)
P(E)2P(Z)
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
14
Example (Con’d)
Symbol sequence
P(X)
SQUEEZ…
Interval
[0.64769,0.64777]
The mapping of a real number to a binary bit stream is easy
0
Notes:
First bit
1
Second
bit
 Any number between 0.64769 and 0.64777 will produce a
sequence starting from SQUEEZ
How do we know when the sequence stops? I.e. how
can encoder distinguish between SQUEEZ and SQUEEZE?
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
15
Another Example
Solution: Use a special symbol to denote end of block (EOB)
For example: if we use “!” as EOB symbol, “eaii” becomes eaii!”
In other words, we will assign a nonzero probability for EOB symbol.
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
16
Implementation Issues
Witten, I. H., Neal, R. M., and Cleary, J. G. “Arithmetic coding for data compression”.
Commun. ACM 30, 6 (Jun. 1987), 520-540.
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
17
Arithmetic Coding Summary
• Based on the given probability model P(X), AC
maps the symbol sequence to a unique
number between 0 and 1, which can then be
conveniently represented by binary bits
• You will compare Huffman coding and
Arithmetic coding in your homework and
learn how to use it in the computer
assignment
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
18
Context Modeling
• Arithmetic coding (entropy coding) solves the
problem under the assumption that P(X) is
known
• In practice, we don’t know P(X) and have to
estimate it from the data
• More importantly, the memoryless
assumption with source X does not hold for
real-world data
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
19
Probability Estimation Problem
Given a sequence of symbols, how do we estimate the probability
of each individual symbol?
Forward solution:
Encoder counts the frequency of each symbol for the whole sequence
and transmit the frequency table to the decoder as the overhead
Backward solution (more popular in practice):
Both encoder and decoder count the frequency of each symbol
on-the-fly from the causal past only (so no overhead is needed)
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
20
Examples
For simplicity, we will consider binary symbol sequence
(M-ary sequence is conceptually similar)
S={0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1}
Forward approach: count 4 “1”s and 12 “0”s  P(0)=3/4, P(1)=1/4
Backward approach:
P(0)
P(1)
N(0)
N(1)
start
1/2
1/2
1
1
0
2/3
1/3
2
1
0
3/4
1/4
3
1
0
4/5
1/5
4
1
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
21
Backward Adaptive Estimation
The probability estimation will be based on the causal past with
a specified window length T (i.e., assume source is Markovian)
Such adaptive estimation is particularly effective for handling
sequence with dynamically varying statistics
Example:
0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0
T
P(0)=.6
P(1)=.4
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
22
Now Comes Context
• Importance of context
– Context is a fundamental concept to help us resolve ambiguity
• The best known example: By quoting Darwin "out of context"
creationists attempt to convince their followers that Darwin didn't
believe the eye could evolve by natural selection.
• Why do we need context?
– To handle the memory in the source
– Context-based modeling often leads to better estimation of
probability models
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
23
Order of Context
“q u o t e”
First order context
Note:
P(u)<<P(u|q)
“s h o c k w a v e”
Second order context
Context Dilution Problem:
If source X has N different symbols, K-th order context modeling
will define NK different contexts (e.g., consider N=256 for images)
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
24
Context-Adaptive Probability
Estimation
Thumb rule: in estimating probabilities, only those symbols with the same
context will be used in counting frequencies
1D Example
0,1,0,1,0,1,0,1, 0,1,0,1,0,1,0,1, 0,1,0,1,0,1,0,1,
zero-order (No) context
P(0)=P(1)1/2
first-order context
P(1|0)=P(0|1)=1, P(0|0)=P(1|1)=0
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
25
2D Example (Binary Image)
000000
011110
011110
011110
011110
000000
Zero-order context: P(0)=5/9, P(1)=4/9
First-order context:
W-X
P(X|W)=?
W=0 (total 20): P(0|0)=4/5, P(1|0)=1/5
W=1 (total 16): P(0|1)=1/4, P(1|1)=3/4
Fourth-order context:
NW N NE
W X
P(1|1111)=1
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
26
Data Compression Summary
• Entropy coding is solved by arithmetic coding
techniques
• Context plays an important role in statistical
modeling of source with memory (there exists a
problem of context dilution which can be handled by
quantizing the context information)
• Quantization of memoryless source is solved by
Lloyd-Max algorithm
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
27
Quantization Theory
(Rate-Distortion Theory)
Q
x
^
x
e  X  Xˆ
Quantization noise:
For a continuous random variable, distortion is defined by
D   f ( x)( x  xˆ ) 2 dx
probability distribution function
For a discrete random variable, distortion is defined by
N
D   pi ( xi  xˆi ) 2
i 1
probabilities
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
28
Recall: Quantization Noise of UQ




A
-A
f(e)
1/ 
- /2
e
/2
Quantization noise of UQ on uniform distribution is also uniformly distributed
Recall
Variance of U[- /2, /2] is
1 2
  
12
2
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
29
6dB/bit Rule of UQ
Signal: X ~ U[-A,A]
 s2 
Noise: e ~ U[- /2, /2]
Choose N=2n (n-bit)
codewords for X
1
1
(2 A) 2  A2
12
3
1 2
2
e  
12
2A

N
(quantization stepsize)
A2
 s2
SNR  10 log 10 2  10 log 10 2 3  20 log 10 N  6.02n(dB)

e
12
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
N=2n
30
Shannon’s R-D Function
D
R-D function of source X
determines the optimal
tradeoff between Rate and
Distortion
R
^
R(D)=min R s.t. E[(X-X)2]≤D
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
31
A Few Cautious Notes Regarding
Distortion
• Unlike rate (how many bits are used?), definition of
distortion is nontrivial at all
• Mean-Square-Error (MSE) is widely used and will be
our focus in this class
• However, for image signals, MSE has little correlation
with subjective quality (the design of perceptual image
coders is a very interesting research problem which is
still largely open)
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
32
Gaussian Random Variable
X: a given random variable with
Gaussian distribution N(0,σ2)
Its Rate-Distortion function is known as
1
2
 log
R( D)   2
D

0

0  D  2
D   2 2 2 R
D  2
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
33
Quantizer Design Problem
For a memoryless source X with pdf of P(X), how to design a
quantizer (i.e., where to put the L=2K codewords) to minimize
the distortion?
Solution: Lloyd-Max Algorithm minimized MSE (we will study
it in detail on the blackboard)
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
34
Rate Allocation Problem*
LL
HL
LH
HH
Given a quota of bits R, how should we
allocate them to each band to minimize
the overall MSE distortion?
Solution: Lagrangian Multiplier technique (we will study
it in detail on the blackboard)
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
35
Gap Between Theory and Practice
• Information theoretical results offer little help in the practice of
data compression
• What is the entropy of English texts, audio, speech or image?
– Curse of dimensionality
• Without exact knowledge about the subband statistics, how can we
solve the rate allocation problem?
– Image subbands are nonstationary and nonGaussian
• What is the class of image data we want to model in the first place?
– Importance of understanding the physical origin of data and its
implication into compression
EE565 Advanced Image Processing
Copyright Xin Li 2009-2012
36
Download