First - University of Surrey

advertisement
Image and Video Compression
Wenwu Wang
Centre for Vision Speech and Signal Processing
Department of Electronic Engineering
University of Surrey
Email: w.wang@surrey.ac.uk
1
Introduction
•
•
•
•
•
Course components
A brief history
Basic concepts
Coding performance limits
Coding of still image
2
Course Components
• Component Coding Algorithms I (By myself)





Fundamentals of Compression
Coding of Still image
JPEG standard
Vector Quantisation
Subband and Wavelet Coding
• Component Coding Algorithms II (By Dr Fernando)
 Coding of video sequence,
 H.261, 263, coding algorithms
 MPEG-1, -2, -4 coding algorithms
• Component Error Resilience in Video
Communications (By Prof. Kondoz)
3
Further Reading
•
•
•
•

•

Component Coding Algorithms
Ghanbari, M. Standard Codecs: Image Compression to Advanced Video Coding, IEE
Telecommunication Series 49, 2003. 0-85296-710-1 A
Clarke, R. J. Digital Compression of Still Images and Video, Academic Press, 1995. 0-12-175720-X B
Haskell, B. G., Puri, A. and Netravali, A. N. Digital Video: An Introduction to MPEG-2, Chapman and
Hall, 1997. 0-412-08411-2 B
Error Resilience
Sadka, A. H. Compressed Video Communications, J. Wiley and Co, 2001. 0-470843128 A
More References
4
A Brief History of Image Communication
• 1840
• 1895
• 1920s
Louis J.M. Daguerre, France,
William Henry Fox Talbot, USA, photographic film
First public motion picture presentation
First television experiment
British TV pioneer J.L. Baird with Nipkow Disc (around 1926)
5
A Brief History of Image Comm. (Cont)
•
•
•
•
1930s
1930-32
1935
1936
Color movies
First experimental television broadcasting in US
First German television broadcasting in Berlin
TV transmission during the Berlin Olympics
British TV pioneer J.L. Baird with Nipkow Disc (around 1926)
6
A Brief History of Image Comm. (Cont.)
•
•
•
•
•
•
•
1939
1952
1954
1967
1970s
1970s
1980s
Regular monochrome TV service in US
Regular TV service in Germany
NTSC Color television in US
PAL color television in Germany
Consumer video cassette recorder (VCR)
Fax machines
Digital TV studios (ITU-R rec. 601)
7
A Brief History of Image Comm. (Cont.)
• 1990s
JPEG and MPEG standards
Digital still cameras
Digital TV broadcasting
Digital video/versatile disk (DVD)
Integration of computers and video
World wide web
Internet video streaming
8
A Brief History of Image Comm. (Cont.)
Evolution of the video coding standard by the ITU-T and ISO/IEC committees
9
Fundamentals
10
What?
• The minimisation of the amount of information required
to represent an image/video signal
• The reduction of the overall signal bandwidth
11
Why?
• Applications for which bandwidth is a precious
commodity
• Storage applications:
Archiving, television production, home entertainment,
multimedia
• Transmission applications:
Radio and television broadcasting, internet video
streaming, multimedia for mobile phones
12
How?
• Image and video signals contain superfluous
(redundant) information
• Statistical redundancy associated with signal
predictability/correlation/smoothness:
Original signal can be recovered perfectly, therefore it is
called “lossless” or “information preserving” coding
• Subjective redundancy associated with the error
tolerance of human vision:
Original signal cannot be recovered perfectly, only an
approximate reconstruction is possible, therefore it is
called “lossy” or “error tolerant” coding
13
Performance Assessment
• Efficiency in image and video coding
(an indication of how much information has been reduced for the coded
signal)
 lossless systems: ratios of uncoded-to-coded
information, i.e. compression ratio
 lossy systems: the amount of coded information
expressed as a function of the distortion introduced by
the coding operation, i.e. rate/distortion function
• Distortion in image and video coding
(an indication of how close to the original is the coded signal)
 lossless systems: trivially zero distortion (infinite fidelity)
 lossy systems: distortion can be measured objectively
(computation of error between the original and the coded
representation) or subjectively (tests designed to
measure response of human vision to coding artefacts)
14
Coding Operation in the Image Chain
• Signal processing operations anywhere in the image chain
can be regarded as coding operations. Such operations
may be due to:
 Acquisition environment (such as lighting conditions and light
propagation, special effects in studio, and atmospheric conditions in
outside broadcasts)
 Acquisition systems (such as camera optics, scanning aperture and
field integration in electronic imaging, and chemical process in film)
 Post-production environment (such as special effects)
 Image/video display systems (such as display aperture in electronic
imaging, half-toning in printed media, and chemical process in film)
 Viewing environment (such as propagation of light, and optical paths)
 Human visual system (such as lens, and response of neurons to light
stimuli)
• We are not concerned with the above but need to be aware of their
coding effects.
• We are concerned with the processing of image/video signals after
acquisition/post-production and prior to display
15
Classification of Video Coding Sytems
• Analogue (signals predominantly in analogue form)




PAL (Phase Alternating Line, transmission of terrestrial television)
VHS (Video Home System, home video recording)
MAC (Multiplexed Analogue Component, satellite television transmission)
Betacam SP (Superior Performance, video recording in the studio)
• Digital (signals predominantly in digital form)






ITU-R Rec. 601 (BT.601, or CCIR-601) (professional video recording)
MPEG-1 (home video recording, CD-ROM)
MPEG-2 (television transmission)
MPEG-4 (multimedia)
H.261/3 (video conferencing)
JPEG (still images)
• We will be mainly concerned with digital signals in this module.
For more about analogue signals, please refer to some
textbooks. Here, we only introduce a few fundamentals about
analogue signals that closely related to digital signals.
16
Coding of Colour Signals
• One of the challenges facing the first of colour
television systems was the inclusion of colour
information without increasing the video bandwidth.
• Colour cameras operate in the space of R,G,B
primaries. Each of these component signals are fullbandwidth (i.e. 6.75MHz)
• Colour coding systems (i.e. PAL) typically involve the
conversion of component signals to composite by
means of the following processing operations:
 R,G,B to Y,U,V co-ordinate transformation
 Low-pass filtering of U and V components
 DSSC-AM modulation of U and V by two sub-carriers in phase
quadrature
 Sign alternation of modulated V at every other line
17
Coding of Colour Signals (Cont.)
Y  WR R  WG G  WB B
U  0.436 ( B  Y ) /(1  WB )
V  0.615 ( R  Y ) /(1  WR )
where,WR  0.299, WB  0.114,WG  0.587
18
Coding of Colour Signals (Cont.)
• Y – Luma component, representing the brightness of
an image (i.e. the “black and white” or achromatic
portion of the image).
• U – Blue difference chroma (B-Y)
• V – Red difference chroma (R-Y)
• “Luma” and “chroma” are usually used in video
engineering, while “luminance” and “chrominance” are
used in color science.
• In digital domain, YCbCr is used to represent the
coded color, where DSSC-AM modulation is replaced
by subsampling.
19
Consequence of Colour Coding
• Compression ration: 3:1
• Artefacts
 Visible line structure, and interline flicker,
 Combing (distortion of vertical detail moving horizontally due to interlace)
 Spatial aliasing (i.e. diagonal straight lines cause spatial “beat” frequencies
and jagged/staircase edges)
 Temporal aliasing (fast motion suffers from “judder”)
 Picture “softness” (aperture effects)
• Artefact frequency: low
• Artefact severity: high
• Remedies
 At the transmitter end, intelligent PAL encoding allowing better segregation
of colour and monochrome components with less crosstalk between them
 At the receiver end, intelligent PAL decoding possibly involving motion
adaptive filtering (may attenuate some frequency components)
20
Digital Video Formats – A Case Study of
Digital Television
•
•
This format is standardised and is described in the document
“Recommendation ITU-R BT.601”.
Source signals: Y,U,V (one luminance and two colour-difference
components, gamma pre-corrected and filtered)
Sampling structure (625 line/50Hz analogue system)





Orthogonal, line, field and frame repetitive
U,V samples co-sited with odd Y samples in each line
864 total (720 active) luminance samples per line
432 total (360 active) chrominance samples per line
625 total (576 active) lines
•
•
Sampling frequency (Y:135MHz, U,V:6.75MHz)
Quantisation







Uniformly quantised PCM
8 (optionally 10) bits per sample
Scale 0-255
Luminance black level defined as level 16
Luminance peak white level defined as level 235
Luminance total number of active levels 220
Chrominance total number of active levels 225 with zero corresponding to 128
•
21
Digital Video Formats – A Case Study of
Digital Television (Cont.)
• Total active bit-rate
 720 samples/line X 576 lines/frame X 25 frames/sec X 8
bits/sample/component X (1+0.5+0.5) components = 166 Mbits/sec
• Total raw bit-rate (Y:135MHz, U,V:6.75MHz)
 864 samples/line X 625 lines/frame X 25 frames/sec X 8
bits/sample/component X (1+0.5+0.5) components = 216 Mbits/sec
 For television transmission purposes this amount of
information may require (depending on the modulation
scheme) a bandwidth of 40 MHz upwards
 Today this corresponds to occupancy requirements of 6-7
analogue terrestrial television channels !! Therefore, to
make digital television transmission a practical proposition
compression in the digital domain is imperative.
22
Digital Video Formats – A Case Study of
Digital Television (Cont.)
• Note 1
 Unused samples and levels are actually used to convey auxiliary and
control information i.e. vertical and horizontal synchronisation
(blanking), colour reference (burst) etc. There are applications which
require this information in digital form
• Note 2
 The 601 standard is a specification of the output format only and is
not concerned with the practical implementation of the A/D
conversion. This is left to the system designer to implement but
should typically involve anti-aliasing pre-filtering and attention to the
effects of the non-ideal sampling aperture and pixel aspect ratio.
23
Digital Video Formats – Other Formats
• High-definition television (HDTV)
 1920 X 1152 X 50 Hz interlaced (16:9 aspect ratio)
 1440 X 1152 X 50 Hz interlaced (4:3 aspect ratio)
• Video-conferencing/Video-telephony
 352 X 288 X 30 Hz Progressive CIF (Common Interchange Format)
 352 X 288 (240) X 25 (30) Hz progressive SIF (Source Input Format-PAL
(NTSC))
 176 X 144 X 30 Hz Progressive QCIF (Quarter CIF)
• Composite (PAL) digital video (recording)
 922 X 576 X 50 Hz interlaced
 This results from sampling a composite (PAL) signal with a frequency which
is 4 times the colour subcarrier frequency and is used for the recording of
digital composite signals for studio applications
• Desktop
 800 X 600 Super VGA (Vector Graphic Array)
 640 X 480 VGA
24
The Hierarchy of Video Sampling Format
25
Sampling Formats for Chrominance
26
Coding Performance Limits and Assessment
27
Self-Information
• A discrete source X with a finite alphabet A can be
modelled as a discrete random process i.e. a sequence
of random variables xi , i  1,2,...
• Each random variable xi takes a value from the
alphabet
A  {ak | k  1,2,...}
• The information content of a symbol ak is related to the
degree that the symbol is unpredictable and
unexpected. Quantitatively this can be expressed by
means of the self-information I (ak ) of symbol ak
I (ak )   log2 ( p(ak ))
(bits)
28
Source Models
• Two useful source models are used for the studying the coding
performance limit:
 The Discrete Memoryless Source (DMS)
Successive symbols are statistically independent i.e. in a symbol
sequence the current symbol does not depend on any previous
one
 The Markov K-th order Source (MKS)
Successive symbols are statistically dependent i.e. in a symbol
sequence the current symbol depends on the K previous ones
The entropy of a DMS source X is defined as the average selfinformation:
H ( X )   p(ak ) I (ak )   p(ak ) log2 (ak )
k
k
The entropy is maximised for a uniform symbol distribution.
29
Markov-K Source
• The MKS model is a more realistic model for images and video
 Images (of natural scenes) are correlated in the spatial domain i.e. plain
areas (with little or no spatial detail)
 Video is correlated in the spatial domain as above and also in the temporal
domain i.e. static areas (with little or no motion)
• A MKS can be specified by the following conditional probabilities:
p( X i  ak | X i 1 ,..., X i k )
•
i, k
The entropy of a MKS source is defined as
H ( X )   p( X i  ak | X i 1,..., X i k ) H ( X | X i 1,..., X i k )
i, k
Sk
where
H ( X | X i 1 ,..., X i k )
 p( X
i
is the conditional entropy i.e.
 ak | X i 1 ,..., X i k ) log2 ( p( X i  ak | X i 1 ,..., X i k ))
i
and
Sk
denotes all possible realisations
{X i 1 ,..., X i k }
30
Coding Theorem
31
Coding Theorem (cont.)
A typical rate distortion curve
32
Practical Considerations
• Information rate for coded still images:
 Bits per pixel (bpp) i.e. the ratio of coded information in bits to the total
number of pixels
 Compression ratio (dimensionless) i.e. the ratio of uncoded-to-coded
information
• Information rate for coded moving sequences:
 Bits per second (b/s) and its multiples (kb/s, Mb/s) i.e. the rate of flow of the
coded information
• Distortion of coded-and-decoded image/video:
 Objectively using the Peak Signal-to-Noise Ratio (PSNR)
(# of active levels)2
PSNR  10log10
coding error variance
 Subjectively using quality and impairment scales designed to measure the
response of human vision. For television, subjective assessment
procedures are standardised and are described in Rec. ITU-R BT. 500
33
Subjective Picture Assessment for Television
34
Human Visual System
Plot of contrast sensitivity (just perceptual modulation) function
35
Human Visual System (Cont.)
36
Coding of Still Images
37
Classification of Compression Techniques
• Spatial (data) Domain
Elements are used “raw” in suitable combinations. The
frequency of occurrence of such combinations is used to
influence the design of the coder so that shorter
codewords are used for more frequent combinations and
vice versa (entropy coding).
• Transform Domain
Elements are mapped onto a different domain (i.e. the
frequency domain). The resulting coefficients are
quantised and entropy-coded.
• Hybrid
Combinations of the above.
38
Lossless Coding in the Spatial Domain
• Memoryless Coding
39
Lossless Coding in the Spatial Domain (Cont.)
• Conditional Coding
Construct “current” symbol histograms according to “previous”
symbols and use separate codebooks accordingly
40
Lossless Coding in the Spatial Domain (Cont.)
• Block (joint) Coding
 Define blocks of more than one symbols and record their occurrences using a
multi-dimensional histogram
 Code book grows exponentially with block size
 Useful when symbols in a block are correlated
An example using a block size
of 2 i.e. two consecutive
symbols.
41
Lossless Coding in the Spatial Domain (Cont.)
• Predictive Coding (previous symbol)
 “Previous” symbol used as a prediction of “current” symbol
 Prediction error coded in a memoryless fashion
 Prediction error alphabet and codebook have twice the size
i.e. symbol alphabet {1, 2, 3, 4} prediction alphabet {-3, -2, -1, 0, 1, 2, 3}
 A good predictor will minimise the error (most occurrence will be zero)
42
Lossless Coding in the Spatial Domain (Cont.)
• Predictive Coding (generalised)
 Prediction is based on combination of
previous symbols
 Prediction template needs to be “causal” i.e.
template should contain only “previous”
elements w.r.t the direction of scanning
(shown with arrows). This is important for
coding applications as the decoder will need
to have decoded the template elements first to
perform the prediction of the current element.
43
Lossless Coding in the Spatial Domain (Cont.)
• Run-length Coding
 Useful when consecutive symbols in a string are identical
 A symbol is followed by the number of its repetitions
A typical example
A general example
44
Lossless Coding in the Spatial Domain (Cont.)
• Zero Run-length Coding
 Useful for strings containing long runs of consecutive zeros and are
sparsely populated by non-zero symbols i.e. quantised frame
differences
 A non-zero symbol is followed by the number of consecutive zeros
A typical example
A general example
45
Entropy Coding (Variable Length Coding)
• Assignment of codewords to individual symbols or collections of symbols
according to likelihood
• More probable symbols or collections of symbols are assigned shorter
codewords and vice-versa, so called variable length coding (VLC)
• There are two types of VLC, which are employed in the standard video
codecs: Huffman coding and arithmetic coding.
• Huffman coding is a simple VLC code, and it is suboptimal since its
compression can never reach as low as the entropy due to the constraint
that the assigned symbols must have an integral number of bits. It is
employed in all standard codes.
• Arithmetic coding is an optimal coding method which can approach the
entropy since the symbols are coded collectively using a code string,
which represent a fractional value on the number line between 0 and 1. It
is employed in JPEG, JPEG2000, H.263 and MPEG-4, where extra
compression is demanded.
46
Huffman Coding
47
Huffman Coding (Cont.)
An example of Huffman
code for seven symbols
Average bit per symbol:
Entropy:
48
Arithmetic Coding
• Using a scale in which the coding intervals of real
numbers between 0 and 1 are represented. This is in
fact the cumulative probability density function of all the
symbols which add up to 1.
• The interval is partitioned according to symbol likelihood.
• The interval is iteratively reduced by retaining, at each
iteration, the sub-interval corresponding to the currently
encoded input symbol
49
Arithmetic Coding (cont.)
•
An example: suppose the alphabet is {a,e,i,o,u,!} and the fixed model is used with the
probabilities shown in the following table.
Each individual symbol needs to be assigned a portion of the [1,0) range that
corresponds to its probability of appearance in the cumulative density function. For
example, the alphabet u with probability 0.1 can, defined in the range of [0.8, 0.9),
can take any value from 0.8 to 0.89999…
•
•
Suppose a message eaii! needs to be coded. The first symbol to be encoded is e.
Hence, the final coded message e has to be a number in the range of [0.2, 0.5). The
second symbol is a which is in the range of [0.0, 0.2), but in the subrange of [0.2,
0.5), as it is not the first number to be encoded. Consequently, after the second
symbol, the number is restricted to the range of [0.2+0.0*(0.5-0.2) 0.2+0.2*(0.5-0.2))
= [0.2 0.26)
50
Arithmetic Coding (cont.)
•
The next symbol to be encoded is I, in the range of [0.5, 0.6), that corresponds to the
new subrange [0.2, 0.26). Hence, after this symbol, the coded number is restricted to
the range of [0.2+0.5*(0.26-0.2), 0.2+0.6*(0.26-0.2)) = [0.23, 0.236). Applying the
same rule to the successive symbols. We can obtain the following table:
•
The final range [0.23354, 0.2336) represents the message eaii!. This means if we
transmit any number in the range of [0.23354, 0.2336), that number represents the
whole message of eaii!.
51
Arithmetic Coding (cont.)
Representation of arithmetic coding process with the interval
scaled up at each stage for the message eaii!
52
Arithmetic Coding (cont.)
•
Decoding process
 For the previous example, suppose a number 0.23355 in the range of [0.23354,
0.2336) is transmitted. The decoder, using the same probability intervals as the
encoder, performs a similar procedure.
 Only the interval [0.2, 0.5) of e envelops the transmitted code of 0.23355. So the first
symbol can only be e. The new code for the second symbol is (0.23355-0.2)/(0.50.2)=0.11185, which is enveloped by interval [0.0, 0.2) of symbol a. The new code for
the third symbol is (0.11185-0.0)/(0.2-0.0) = 0.55925, which is enveloped by the range
of [0.5, 0.6) of symbol i. Followed by (0.55925-0.5)/(0.6-0.5) = 0.5925 in the range of
[0.5, 0.6) of symbol i. Further followed (0.5925-0.5)/(0.6-0.5) = 0.925, which is in the
range of [0.9, 1) of symbol !. Therefore, the decoded message is eaii!. The decoding
process is shown in the following table:
53
Lossless Coding in Transform Domain
• Transforms commonly refer to expansions of
signals to series of coefficients using sets of
appropriate (i.e. orthonormal) basis functions so
that the following are achieved.
 Decorrelation of input data
 Optimal distribution of energy (variance) into the smallest number
of coefficients
• The optimal transform according to the above is
the Karhunen-Loeve (KL) transform. This is not
used in practice:
 Its basis functions are the eigenvectors of the covariance matrix of the
input signal, and hence data-dependent, and therefore need to be
computed and transmitted for each data set.
 There are no fast implementations for the KL transform
54
Lossless Coding in Transform Domain (cont.)
• In practice, sub-optimal transforms are used whose basis
functions are data-independent and their performance is
close to the KL transform, such as
55
Lossless Coding in Transform Domain (cont.)
•
The DCT is the most widely used transform in image/video coding and is a
fundamental component of many standardised algorithms.
 KLT and DCT basis functions closely resemble each other for images modelled as firstorder Markov processes.
 A n-point DCT is equivalent to a 2n-point DFT obtained by reflection. This avoids
spurious harmonics due to discontinuities at the boundaries of the repetition period.
•
The following example visualises the decorrelation and energy compaction
properties of transforms:
56
Lossless Coding in Transform Domain (cont.)
57
Comparison of Various Transforms
58
Comparison of Various Transforms (cont.)
(1) Energy concentration measured typical natural images of block size 1-by-32.
(2) KLT is optimum and DCT performs slightly worse than KLT
59
Block Transform Coding
60
Block Transform Coding (cont.)
61
Block Transform Coding (cont.)
62
Lossy Coding
• For natural images the compression performacne of lossless
coding schemes is fairly modest
 Compression ratios of 3:1 or 4:1 can be achieved using the best of the
above mentioned schemes.
 This is comparable to the performance achieved by the general purpose
data compression algorithms i.e. Ziv-Lempel, which are not designed
specifically to exploit image structure.
• To improve performance some coding distortion will have to be
tolerated. The main aims of lossy coding are:
 To optimise rate/distortion performance i.e. achieve the best image quality
for a given target bit-rate
 To minimise the perceptual impact of distortion i.e. produce coding errors
that are likely to be imperceptible to the human viewer
63
Lossy Coding (cont.)
• The main tool for lossy coding is quantisation. This is
applicable to most domains:
 Spatial (data) domain: applicable to raw pixels, pixel differences (predictive
coding), conditional pixel occurrences (conditional coding), ensembles of
pixels (joint coding). This is a special case of so-called vector quantisation
which will be studied separately.
 Transform domain: applicable to transform coefficients and ensembles of
coefficients (vector quantisation).
• Another important tool is sampling
 This is usually applicable to the data domain.
64
Quantisation (scalar)
65
Lossy Predictive Coding
Open-loop encoder
(prediction based on
past inputs)
Closed-loop encoder
(prediction based on
past outputs)
Decoder (prediction
always based on past
outputs)
66
Lossy Transform Coding
Coder
Decoder
67
Sampling: One-dimensional sampling
68
Sampling: One-dimensional sampling (cont.)
69
Sampling: Two-dimensional sampling
70
Sampling: Two-dimensional sampling (cont.)
71
Sampling: Two-dimensional sampling (cont.)
72
Non-ideal Sampling
73
Interpolation
74
Non-ideal Interpolation (sample-and-hold)
75
Non-ideal Interpolation (bi-linear)
76
Example of Non-ideal Interpolation
77
Summary
 A brief history of image communication and
coding standard
 Coding performance theorem
 Some fundamental concepts of compression
 Coding methods for still images
(This is the most important part of this lecturing
session)
78
Acknowledgement
 Thanks to T. Vlachos, B. Girod for providing their
lecture notes that have been partly used in this
presentation.
 Thanks also to M. Ghanbari, and part of the
material used here is from his textbook.
79
Download