10110100 Compression This part of the course ...

advertisement
Compression
Kurs INFMKT
Compression
10110100
Ifi, UiO
Norsk Regnesentral
Vårsemester 2003
Wolfgang Leister
This part of the course ...
• ... is held at Ifi, UiO ...
(Wolfgang Leister)
• … and at University College Karlsruhe
(Peter Oel, Clemens Knoerzer)
Overview
• Information Theory
• Data Compression
Preview
• Image Formats
– JPEG / JIFF
– Wavelet Compression
– Fractal Compression
• Video Formats
– MJPEG
– MPEG
Multimedia data types
•
•
•
•
•
•
•
Images
Graphics
Video / Image sequences
Audio
3D-Data
Text / Documents
others
Multimedia data types
•
•
•
•
•
•
•
Images
Graphics
Video / Image sequences
Audio
3D-Data
Text / Documents
others
Information Theory
• Symbols: A = a0,a1 , ... aN-1
• Coding: C = c 0,c1, ... cN-1
– trivial Coding (constant Code Length)
• ci = b i0,bi1 , ... biM-1
• i∈[0,N-1]
• bij∈{0,1}
• M = log2 N
(binary)
(Code length, number of bits)
• Distribution: P = p 0,p1, ... pN-1
– probability for symbols
Information Theory
Information content: (Entropy)
N −1
H0 = −∑ p i log p i
i= 0
(Basis = 2 ⇒ bit/code word)
Example: Equal distribution
p i = N1
N −1
H0 = −∑ N1 log N1 =− log N1 =logN
i= 0
Information theory
• Coding: C = c 0,c1, ... cN-1
– ci = b i0,b i1, ... bili-1
• i∈[0,N-1]
• bij∈{0,1}
• li = length of code word c i
• average code length:
N−1
L = ∑ pi l i ≥ H 0
i= 0
Information Theory
• When is L = H0 ?
li = − log 2 p i
p i = 21k
p i ≠ 21k ?
• What if
– Group events into one code word:
(ai1,........,ain) → ci
– Every (ai)-combination must be available:
Nn code words
–Better: arithmetic coding
Intermezzo ...
Why Data compression?
• An image says more than a thousand
words …
• An image needs more space than a
thousand words …
• Data are contain redundancy ...
• Humans love redundancy …
lossless
run length encoding
optimal Codes
adaptive Codes
with loss
Quantizing
Clustering
Descriptive
Transformation
Compression techniques
Compression techniques
lossless
run length encoding
with loss
Compression techniques
lossless
run length encoding
run length encoding:
000011100000000110000101111110000000000
with loss (4x0)(3x1)(8x0)(2x1)(4x0)(1x1)(1x0)(6x1)(10x0)
4,3,8,2,4,1,1,6,10
Examples:
PCX
Fax
JPEG
Compression techniques
lossless
run length encoding
optimal Codes
with loss
Compression techniques
lossless
run length encoding
optimal Codes
Code words have different lengths
Length of code word dependent on probability:
with loss
(high probability à short Codewort)
(low probability à long Codewort)
Global and fixed code word table
ñ
ò
Code word table is part of coding
Compression techniques
Example: Huffman-Coding:
lossless
run
length
encoding
1.Step:
Events to be coded are
sorted
by probabilities
(rising order).
optimal Codes
2.Step: The events with least probabilities are removed from list,
united to one event, and added sorted into list with sum of both
probabilities.
with loss
3.Step: Repeat Step 2 until only one element is contained in list.
4.Step: Code words are generated by marking the edges in the
binary tree by 0 and 1. Read the code word from top to bottom.
Compression techniques
Example Huffman-Coding:
lossless
a
b
0.2
0.15
c
0.4
d
0.05
c
0.4
a
0.2
e
0.2
b
0.15
with loss
run
e length encoding
0.2
optimal Codes
d
0.05
Compression techniques
Example: Huffman-Coding:
lossless
a
b
0.2
0.15
c
0.4
d
0.05
c
0.4
c
0.4
a
0.2
a
0.2
e
0.2
e
0.2
b
0.15
bd
0.2
with loss
run
e length encoding
0.2
optimal Codes
d
0.05
b
0.15
d
0.05
Compression techniques
Example: Huffman-Coding:
lossless
a
b
0.2
c
0.4
c
0.4
with
c
0.4
c
0.4
d
0.05
a
e
0.2 0.2
a
e
0.2 0.2
loss
bde
a
0.4 0.2
b
0.15
bd
0.2
0.15
run
e length encoding
0.2
optimal Codes
d
0.05
e
0.2
b
0.15
d
0.05
Compression techniques
Example: Huffman-Coding:
lossless
a
b
0.2
c
0.4
d
0.05
a
e
0.2 0.2
a
e
0.2 0.2
loss
bde
a
0.4 0.2
c
0.4
b
0.15
bd
0.2
0.15
c
0.4
c
0.4
with
c
0.4
bdea
0.6
run
e length encoding
0.2
optimal Codes
d
0.05
e
0.2
b
0.15
d
0.05
a
0.2
Compression techniques
Example: Huffman-Coding:
lossless
a
b
0.2
c
0.4
c
0.4
with
c
0.4
bdea
0.6
bdeac
1.0
c
0.4
d
0.05
a
e
0.2 0.2
a
e
0.2 0.2
loss
bde
a
0.4 0.2
c
0.4
b
0.15
bd
0.2
0.15
run
e length encoding
0.2
optimal Codes
d
0.05
e
0.2
b
0.15
d
0.05
a
0.2
c
0.4
Compression techniques
Example: Huffman-Coding:
lossless
a
b
0.2
c
0.4
d
0.05
a
e
0.2 0.2
a
e
0.2 0.2
loss
bde
a
0.4 0.2
c
0.4
b
0.15
bd
0.2
0.15
c
0.4
c
0.4
with
c
0.4
bdea
0.6
bdeac
1.0
run
e length encoding
0.2
optimal Codes
d
0.05
0
1
0
1
0
e
0.2
1
0
1
b
0.15
d
0.05
a
0.2
c
0.4
Compression techniques
Example: Huffman-Coding:
lossless
a
b
0.2
0.15
c a a
0.4 b 0.2
c c a
0.4 d 0.2
with
bde
c e loss
0.4
0.4
bdea
c
0.6
0.4
bdeac
1.0
c
0.4
d
0.05
01e
b
0010
0.2 0.15
1e
bd
0011
0.2
0.2
000
a
0.2
run
e length encoding
0.2
optimal Codes
d
0.05
0
1
0
1
0
e
0.2
1
0
1
b
0.15
d
0.05
a
0.2
c
0.4
Compression techniques
lossless
run length encoding
optimal Codes
adaptive Codes
with loss
Compression techniques
lossless
run length encoding
optimal Codes
adaptive Codes
Method by Ziv and Lempel
with loss
Code tabelle is generated while coding
No need to transfer code tabelle
Compression techniques
lossless
Example:
with
1
2
loss3
4
5
6
7
8
...
a
b
c
ab
run length encoding
aabboptimal
c a b a b aCodes
c
adaptive Codes
w: a b
Output:
1
Compression techniques
lossless
Example:
with
1
2
loss3
4
5
6
7
8
...
a
b
c
ab
bc
run length encoding
abboptimal
cc a b a b aCodes
c
adaptive Codes
w: b c
Output:
12
Compression techniques
lossless
Example:
with
1
2
loss3
4
5
6
7
8
...
a
b
c
ab
bc
ca
run length encoding
optimal
a bc
caa b a b aCodes
c
adaptive Codes
w: c a
Output:
12 3
Compression techniques
lossless
Example:
with
1
2
loss3
4
5
6
7
8
...
a
b
c
ab
bc
ca
aba
run length encoding
a boptimal
caabbaa b aCodes
c
adaptive Codes
w: a b a
Output:
12 3 4
Compression techniques
lossless
Example:
with
1
2
loss3
4
5
6
7
8
...
run length encoding
a boptimal
c a baabbaaCodes
cc
adaptive Codes
a
b
c
ab
bc
ca
aba
abac
w: a b a c
Output:
12 3 4 7
Arithmetic Coding
• Symbols are represented by probability
intervals
• Symbol chains are represented by a
concatenation of their probability
intervals
Arithmetic Coding
1.0
0.95
#
A
a
b
c
d
e
#
P
0.2 0.15 0.35 0.05 0.2 0.05
e
0.75
0.7
d
c
# = [0.95, 1.0)
e = [0.75, 0.95)
d = [0.7, 0.75)
c = [0.35, 0.7)
b = [0.2, 0.35)
a = [0.0, 0.2)
0.35
b
0.2
a
0.96..
0.75
0.71..
0.5
0.25
0.125
0.11111
0.11
0.10111
0.1
0.01
0.001
0.0
Arithmetic Coding
1.0
0.95
#
e
0.75
0.7
d
c
0.35
b
0.2
a
0.0
a#
ae
ad
ac
ab
aa
A
a
b
c
d
e
#
P
0.2 0.15 0.35 0.05 0.2 0.05
Arithmetic Coding
1.0
0.95
0.75
0.7
0.35
#
b#
e
be
d
bd
c
bc
b
bb
a
ba
A
a
b
c
d
e
#
P
0.2 0.15 0.35 0.05 0.2 0.05
0.35
0.2
0.0
0.2
Arithmetic Coding
bca#
1.0
0.95
0.75
0.7
#
0.35
b#
0.305
bc#
0.263
bca#
e
be
bce
bcae
d
bd
bcd
bcad
c
bc
bcc
bcac
b
bb
bcb
bcab
a
ba
bca
bcaa
0.263
0.35
0.2
0.0
0.2
0.2525
0.2525
0.262475
Arithmetic Coding
0.263
0.01000011010100111
0.262475 0.01000011001100001
0.35
0.305
1.0
#
0.95
0.0100001101
b#
be
e
0100001101
0.75
Huffman:
bd
d 0110110(0000)
0.7
bc#
0.263
bca#
bce
bcae
bcd
bcad
c
bc
bcc
bcac
b
bb
bcb
bcab
a
ba
bca
bcaa
0.263
0.35
0.2
0.0
0.2
0.2525
0.2525
Compression techniques
lossless
Quantising:
with loss
Analogue
8 Bit
RGB
Clustering
run length encoding
optimal Codes
adaptive Codes
à
Digital
à
4 Bit
Quantisierung
à
Colour palette
Clustering
0.262475
Compression techniques
lossless
run length encoding
optimal Codes
adaptive Codes
with loss
Quantising
Clustering
lossless
run length encoding
optimal Codes
adaptive Codes
with loss
Quantising
Clustering
Descriptive
Transformation
Compression techniques
Image Data formats
lossless
PBM+
GIF
PNG
(JPEG)
with loss
?
JPEG
Wavelet Compr.
Fractal Compr.
The End of Lecture
Download