Intra-frame DCT

advertisement
Intra-frame DCT -coding
• A bit rate reduction system operates by removing
redundant information from the signal at the coder
prior to transmission and re-inserting it at the
decoder.
• A coder and decoder pair are referred to as a codec.
• In video signals, two distinct kinds of redundancy can
be identified, namely ’spatial and temporal
redundancy’ and ’psychovisual redundancy’.
Intra-frame DCT coding
tMyn
1
Spatial and temporal redundancy
• Pixel values are not independent, but are correlated
with their neighbours both within the same frame and
across frames.
• So, to some extent, the value of a pixel is predictable
given the values of neighbouring pixels.
Intra-frame DCT coding
tMyn
2
Psychovisual redundancy
• The human eye has a limited response to fine spatial
detail, and is less sensitive to detail near object
edges or around shot-changes.
• Consequently, controlled impairments introduced into
the decoded picture by the bit rate reduction process
should not be visible to a human observer.
Intra-frame DCT coding
tMyn
3
• Two key techniques employed in an MPEG codec are
intra-frame Discrete Cosine Transform (DCT) coding
and motion-compensated inter-frame prediction.
• DCT is an orthogonal mathematical transform that is
used to remove spatial redundancy in the sampled
signal components by concentration of signal energy
into only a few coefficients.
• The DCT and it’s inverse are easily implemented with
digital signal processing (DSP) technology.
Intra-frame DCT coding
tMyn
4
• The DCT does not directly reduce the number of bits
required to represent the block.
• The reduction in the number of bits follows from the
fact that, for typical blocks of natural images, the
distribution of coefficients is non-uniform – the
transform tends to concentrate the energy into the
low-frequency coefficients, and many of the other
coefficients are near zero.
Intra-frame DCT coding
tMyn
5
DCT (type II) compared to the DFT. For both transforms,
there is the magnitude of the spectrum on left and the
histogram on right; both spectra are cropped to 1/4,
to zoom the behaviour in the lower frequencies.
The DCT concentrates most of the power on the
lower frequencies.
Lähde: Wikipedia
Intra-frame DCT coding
tMyn
6
• The bit-rate reduction is achieved by not transmitting
the near-zero coefficients, and by quantizing and
coding the remaining coefficients.
• The distribution of the non-uniform coefficients is a
result of the spatial redundancy present in the original
image block.
Intra-frame DCT coding
tMyn
7
• Many different forms of transformation have been
investigated for bit-rate reduction.
• The best transforms are those which tend to
concentrate the energy of a picture block into a few
coefficients.
• The DCT is one of the best transforms in this respect.
• The choice of an 8*8 block-size is a trade-off
between the need to use a large picture area for the
transform, so the energy compaction described
above is most efficient, and the fact that the content
and movement of the picture varies spatially, which
would tend to support a smaller block-size.
Intra-frame DCT coding
tMyn
8
• When compressing videosignals it is useful to define
terms macroblock and block, Figures 1a, 1b and 1c.
Intra-frame DCT coding
tMyn
9
Figure 1a. One field has been divided into 16 samples by 16 lines macroblocks.
Intra-frame DCT coding
tMyn
10
Figure 1b. From the previous figure one macroblock is 16 samples by 16 lines,
so one macroblock takes 4 blocks.
Intra-frame DCT coding
tMyn
11
Figure 1c. From the previous figure one block is 8
samples by 8 lines.
Intra-frame DCT coding
tMyn
12
• In the standard digital TV environment horizontal
picture size is 720 pixels and vertical picture size is
576 lines.
• In one field there is (720:16)*(576:16)=45*36=1620
macroblocks, Figure 1d.
• The number of blocks in one field is 4*1620=6480.
• These 6480 two-dimensional (8*8) blocks are each
input into a DCT that maps the sampled values onto
corresponding values in the frequency domain.
Intra-frame DCT coding
tMyn
13
Figure 1d. Digital TV field, aspect ratio 16:9.
Intra-frame DCT coding
tMyn
14
• MPEG 2 uses the term slice.
• A slice is a collection of macroblocks in scan order.
• MPEG 2 requires that a slice must be contained
within a single row of macroblocks.
• In Figure 2 there is an illustration between slices and
macroblocks in MPEG 2.
Intra-frame DCT coding
tMyn
15
720 samples
Frame
Slice 1
Slice 2
576 lines
*
*
*
Slice 36
16 samples
16 lines
1
2
3
4
45
* * *
Slice
Macroblock
Figure 2. The relationship between slice and macroblock in MPEG 2.
Intra-frame DCT coding
tMyn
16
• For most MPEG 2 coding applications, 4:2:0
sampling is likely to be used rather than 4:2:2,
Figures 3a and 3b.
• In digital TV environment sampling takes place in
component YCbCrvideo signal format, Y being
luminance and Cb & Cr chrominance components.
• As was mentioned earlier, there is a difference
between component representation YUV and YCbCr.
Intra-frame DCT coding
tMyn
17
GREEN
720*576
RED
720*576
BLUE
720*576
MATRIX
Y
720*576
R-Y
720*576
B-Y
720*576
FILTER AND SCALE
Cb
Cr
360*288
MPEG-2
ENCODE
MPEG-2
ENCODE
360*288
MPEG-2
ENCODE
Figure 3a. MPEG 2, YCbCr 4:2:0 sampling.
Intra-frame DCT coding
tMyn
18
GREEN
720*576
RED
720*576
BLUE
720*576
MATRIX
Y
720*576
R-Y
720*576
B-Y
720*576
FILTER AND SCALE
Cb
Cr
360*576
MPEG-2
ENCODE
MPEG-2
ENCODE
360*576
MPEG-2
ENCODE
Figure 3b. MPEG 2, YCbCr 4:2:2 sampling.
Intra-frame DCT coding
tMyn
19
• Spatial redundancy is removed by processing the
digitized signals in 2-D blocks of 8 pixels by 8 lines,
Figures 4a and 4b.
Intra-frame DCT coding
tMyn
20
8
8
8
0
1
8
8
8
2
3
4
Cb
8
8
5
Cr
Y
Figure 4a. Colour subsampling 4:2:0, DCT blocks.
Intra-frame DCT coding
tMyn
21
8
8
8
0
1
8
4
8
5
8
2
3
8
6
8
7
8
Y
Cb
8
Cr
Figure 4b. Colour subsampling 4:2:2, DCT blocks.
Intra-frame DCT coding
tMyn
22
• DCT is a reversible process which maps between the
normal 2-D presentation of the image and one which
represents the same information in what may be
thought of as the frequency domain.
• Each coefficient in the 8*8 DCT domain block
indicates the contribution of a different DCT ”basis”
function to the original image block.
• The lowest frequency basis function (top-left) is
called the DC coefficient and may be thought of as
representing the average brightness of the block,
Figure 5.
Intra-frame DCT coding
tMyn
23
DC-COEFFICIENT
INCREASING HORIZONTAL FREQUENCY
INCREASING VERTICAL FREQUENCY
DCT
IDCT
Figure 5. Block-based 8*8 DCT transform pairs.
Intra-frame DCT coding
tMyn
24
• Next there is a generic example having one 8*8 DCT
block of luminance (Y) information.
• It has been assumed that Y can take a nominal 8-bit
range of 16-235, Figure 6.
Intra-frame DCT coding
tMyn
25
70
72
70
70
72
68
68
64
103
101
103
100
99
97
94
94
132
132
132
130
129
129
125
121
157
157
155
154
153
150
148
145
168
163
164
162
163
161
161
156
172
170
165
166
163
163
162
158
174
170
167
167
164
163
164
159
174
173
170
167
167
166
166
160
Figure 6. One 8*8 set of luminance information values after sampling.
Intra-frame DCT coding
tMyn
26
• White corresponds to a sample value of
approximately 235 and dark grey approximately 16.
Intra-frame DCT coding
tMyn
27
• One of the first steps in the DCT process involves
finding the average luminance value of the samples
and subtracting this value (in this example, let’s say,
128) from each of the 64 samples to eliminate the
average or DC component.
• Notice! The process can be more complex than pure
average taking business!
• This results in a new matrix of 64 integer values,
some of which are negative in value, Figure 7.
Intra-frame DCT coding
tMyn
28
-58
-56
-58
-58
-56
-60
-60
-64
-25
-27
-25
-28
-29
-31
-34
-34
4
4
4
2
1
1
-3
-7
29
29
27
26
25
22
20
17
40
35
36
34
35
33
33
28
44
42
37
38
35
35
34
30
46
42
39
39
36
35
36
31
46
45
42
39
39
38
38
32
Figure 7. The first step is to subtract the average value from the sample values.
Intra-frame DCT coding
tMyn
29
• DCT is characterized as a derivation of Fourier
analysis.
• Conventional Fourier analysis applies to a
continuous, periodic signal.
• In general the signals that now are in interest are not
periodic, and there is a need to deal with samples
rather than a continuous signal.
• Sampling presents no problen, the discrete Fourier
transform deals with a sampled signal.
• In the sampling domain the samples are meaningful
only if the frequencies represented are lower than or
equal to the Nyquist frequency.
Intra-frame DCT coding
tMyn
30
• Any amplitude and phase of a given frequency may
be obtained by summing appropriate amplitudes of
sine and cosine waves of that frequency.
• Mirroring a waveform fragment (center point being
the origin) creates even symmetry.
• Mirroring means that there are twice as many
samples than there were at the beginning (but of
course we can discard those not needed…).
• However, now the transform gives only cosine
components!
• This is the basis of DCT.
Intra-frame DCT coding
tMyn
31
• The DCT (and the inverse DCT) transform is defined
by rather fearsome looking equation:
C( u)C( v )  7 7
 ( 2 x  1)u   ( 2 y  1)v  
F ( u, v ) 
 cos
 ,
  f ( x , y )cos
 

4
16
16
 x 0 y 0
u, v  0,...7
1 7 7
 ( 2 x  1)u   ( 2 y  1)v  
f ( x, y )    C( u)C( v ) F ( u, v )cos
 cos
 ,




4  u 0 v  0
16
16
x, y  0,...7
Intra-frame DCT coding
tMyn
32
• In those equations the C terms are defined by
1
Cu 
for u  0, Cu  0 otherwise
2
1
Cv 
for v  0, Cv  0 otherwise
2
• The DCT equation can be illustrated with Figure 8.
Intra-frame DCT coding
tMyn
33
Column index y
Column index v
v=0
u=0 F00
F07
y=7
x=0
f 00
f 07
x=7
f 70
f 77
8*8 BLOCK
F77
Row index x
Row index u
DCT
u=7 F70
y=0
v=7
Figure 8. DCT visualized.
C( u)C( v )  7 7
 ( 2 x  1)u 
 ( 2 y  1)v  
F ( u, v ) 
 cos
 , u, v  0,...7
  f ( x , y )cos



4
16
16
 x 0 y 0
Intra-frame DCT coding
tMyn
34
• In this approach the DCT transforms a block of 64
intensity values into block of 64 coefficients.
• The top left coefficient represents the DC level, or
average intensity of the block.
• When moving to the right, the coefficients represent
higher horizontal frequencies, when moving down,
they represent higher vertical frequencies, from
Figure 7 to Figure 9:
Intra-frame DCT coding
tMyn
35
86
27
-3
6
-2
2
-2
2
-247
-4
-5
-3
0
-3
1
1
-117
-1
1
-1
-1
-1
-1
0
-40
-2
2
1
2
1
-2
0
-7
-2
-1
1
0
-1
-1
2
-6
1
0
0
0
0
-2
-1
-4
-1
-1
1
-2
-1
-1
-1
-3
-3
-1
1
0
1
1
1
Figure 9. The DCT is performed, yielding a single DC coefficient and 63 AC coefficients.
Intra-frame DCT coding
tMyn
36
Intra-frame DCT coding
tMyn
37
Intra-frame DCT coding
tMyn
38
• From Figure 9 it can be seen that DCT concentrates
energy into the top left corner.
• Many of the DCT coefficient have values close to
zero – in Figure 9 there are 56 AC coefficients with
magnitudes of 5 or less.
• The human visual system (HVS) is quite insensitive
to errors in these coefficients: values close to zero
may be set to zero with very little effect on image
quality.
• The HVS is – on the contrary – fairly sensitive even
to small errors in DC and low-frequency coefficients,
but much less sensitive to amplitude errors in higherfrequency coefficients.
Intra-frame DCT coding
tMyn
39
• We can afford to quantize those higher-frequency
coefficients much more coarsely.
• As stated earlier this not only results in using fewer
bits to transmit each of the non-zero values, but also
means that more coefficients can be treated as zero.
Intra-frame DCT coding
tMyn
40
Coefficient quantization
• After a block has been transformed, the transform
coefficients are quantized.
• Different quantization is applied to each coefficient
depending on the spatial frequency within the block
that it represents.
• The objective is to minimize the number of bits which
must be transmitted to the decoder, so that it can
perform the inverse transform and reconstruct the
image.
Intra-frame DCT coding
tMyn
41
• Reduced quantization accuracy reduces the number
of bits which need to be transmitted to represent a
given DCT coefficient, but increases the possible
quantization error for that coefficient.
• The quantization noise introduced by the coder is not
reversible in the decoder, so the coding and decoding
process is lossy.
• More quantization error can be tolerated in the highfrequency coefficients, because HF noise is less
visible than LF quantization noise.
• Also, quantization noise is less visible in the
chrominance components than in the luminance
component.
Intra-frame DCT coding
tMyn
42
• MPEG uses weighting matrices to define the relative
accuracy of the quantization of the different
coefficients.
• Different weighting matrices can be used for different
frames, depending on the prediction mode used.
• The weighted coefficients are then passed through a
fixed quantization law, which is usually a linear law.
• For some prediction modes there is an increased
threshold level around zero.
• The effect of this threshold is to maximize the number
of coefficients which are quantized to zero.
Intra-frame DCT coding
tMyn
43
• Quantization noise is more visible in some blocks
than in others, for example, in blocks which contain a
high-contrast edge between plain areas.
• After quite aggressive quantization the block in
Figure 9 could look like in Figure 10.
Intra-frame DCT coding
tMyn
44
5
2
0
0
0
0
0
0
-21
0
0
0
0
0
0
0
-8
0
0
0
0
0
0
0
-3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Figure 10. After the quantization the number of nonzero coefficients is
drastically reduced.
Intra-frame DCT coding
tMyn
45
• From Figure 10 it can be seen that the quantization
has reduced the number of values that must be
transmitted.
• DCT concentrates energy into top left coefficients.
• The quantization matrix emphasizes this trend, and
we can see that the probability of a coefficient’s being
nonzero is much higher in the top left than at the
bottom right.
Intra-frame DCT coding
tMyn
46
Zig-Zag coefficient scanning
• After quantization, the 8*8 blocks of DCT coefficients
are scanned in a zigzag pattern, Figure 11, to turn the
2-D array into a serial string of quantized coefficients.
• The pattern shown in Figure 11 is not the only one.
Intra-frame DCT coding
tMyn
47
Figure 11. Zigzag scanning pattern.
Intra-frame DCT coding
tMyn
48
• After scanning the quantized coefficients using the
pattern shown in Figure 11 results in the following
string of values:
• (5) 2 –21 –8 0 0 0 0 0 –3 (all zeros, 54 out of 64)
• The first value is the DC coefficient, and is shown in
parentheses because it will be separated from the AC
coefficients for entropy encoding.
• The most striking characteristic of the sequences is
that after a relatively small number of values, all
remaining values are zero.
Intra-frame DCT coding
tMyn
49
Run-length coding
• The strings of coefficients produced by the zigzag
scanning are coded by counting the number of zero
coefficients preceding a non-zero coefficient.
• RLE, [run, amplitude], run=how many zero values,
amplitude=the value which ended the zero value
string.
• The run-length value, and the value of the non-zero
coefficient which the run of zero coefficients
precedes, are then combined and coded using a
variable-length code (VLC).
Intra-frame DCT coding
tMyn
50
Variable-Length Coding, VLC
• The VLC exploits the fact that short runs of zeros are
more likely than long ones, and small coefficients are
more likely than large ones.
• The VLC allocates codes which have different
lengths, depending upon the expected frequency of
occurrance of each zero run-length/non-zero
coefficient value combination.
• Common combinations use short code words, less
common combinations use long code words.
Intra-frame DCT coding
tMyn
51
• All the VLCs are designed such that no complete
codeword is the prefix of any other codeword.
• They are similar to the well-known Huffman code.
• Thus the decoder can identify where one variablelength codeword ends and another starts.
Intra-frame DCT coding
tMyn
52
Download