Intra-frame DCT -coding • A bit rate reduction system operates by removing redundant information from the signal at the coder prior to transmission and re-inserting it at the decoder. • A coder and decoder pair are referred to as a codec. • In video signals, two distinct kinds of redundancy can be identified, namely ’spatial and temporal redundancy’ and ’psychovisual redundancy’. Intra-frame DCT coding tMyn 1 Spatial and temporal redundancy • Pixel values are not independent, but are correlated with their neighbours both within the same frame and across frames. • So, to some extent, the value of a pixel is predictable given the values of neighbouring pixels. Intra-frame DCT coding tMyn 2 Psychovisual redundancy • The human eye has a limited response to fine spatial detail, and is less sensitive to detail near object edges or around shot-changes. • Consequently, controlled impairments introduced into the decoded picture by the bit rate reduction process should not be visible to a human observer. Intra-frame DCT coding tMyn 3 • Two key techniques employed in an MPEG codec are intra-frame Discrete Cosine Transform (DCT) coding and motion-compensated inter-frame prediction. • DCT is an orthogonal mathematical transform that is used to remove spatial redundancy in the sampled signal components by concentration of signal energy into only a few coefficients. • The DCT and it’s inverse are easily implemented with digital signal processing (DSP) technology. Intra-frame DCT coding tMyn 4 • The DCT does not directly reduce the number of bits required to represent the block. • The reduction in the number of bits follows from the fact that, for typical blocks of natural images, the distribution of coefficients is non-uniform – the transform tends to concentrate the energy into the low-frequency coefficients, and many of the other coefficients are near zero. Intra-frame DCT coding tMyn 5 DCT (type II) compared to the DFT. For both transforms, there is the magnitude of the spectrum on left and the histogram on right; both spectra are cropped to 1/4, to zoom the behaviour in the lower frequencies. The DCT concentrates most of the power on the lower frequencies. Lähde: Wikipedia Intra-frame DCT coding tMyn 6 • The bit-rate reduction is achieved by not transmitting the near-zero coefficients, and by quantizing and coding the remaining coefficients. • The distribution of the non-uniform coefficients is a result of the spatial redundancy present in the original image block. Intra-frame DCT coding tMyn 7 • Many different forms of transformation have been investigated for bit-rate reduction. • The best transforms are those which tend to concentrate the energy of a picture block into a few coefficients. • The DCT is one of the best transforms in this respect. • The choice of an 8*8 block-size is a trade-off between the need to use a large picture area for the transform, so the energy compaction described above is most efficient, and the fact that the content and movement of the picture varies spatially, which would tend to support a smaller block-size. Intra-frame DCT coding tMyn 8 • When compressing videosignals it is useful to define terms macroblock and block, Figures 1a, 1b and 1c. Intra-frame DCT coding tMyn 9 Figure 1a. One field has been divided into 16 samples by 16 lines macroblocks. Intra-frame DCT coding tMyn 10 Figure 1b. From the previous figure one macroblock is 16 samples by 16 lines, so one macroblock takes 4 blocks. Intra-frame DCT coding tMyn 11 Figure 1c. From the previous figure one block is 8 samples by 8 lines. Intra-frame DCT coding tMyn 12 • In the standard digital TV environment horizontal picture size is 720 pixels and vertical picture size is 576 lines. • In one field there is (720:16)*(576:16)=45*36=1620 macroblocks, Figure 1d. • The number of blocks in one field is 4*1620=6480. • These 6480 two-dimensional (8*8) blocks are each input into a DCT that maps the sampled values onto corresponding values in the frequency domain. Intra-frame DCT coding tMyn 13 Figure 1d. Digital TV field, aspect ratio 16:9. Intra-frame DCT coding tMyn 14 • MPEG 2 uses the term slice. • A slice is a collection of macroblocks in scan order. • MPEG 2 requires that a slice must be contained within a single row of macroblocks. • In Figure 2 there is an illustration between slices and macroblocks in MPEG 2. Intra-frame DCT coding tMyn 15 720 samples Frame Slice 1 Slice 2 576 lines * * * Slice 36 16 samples 16 lines 1 2 3 4 45 * * * Slice Macroblock Figure 2. The relationship between slice and macroblock in MPEG 2. Intra-frame DCT coding tMyn 16 • For most MPEG 2 coding applications, 4:2:0 sampling is likely to be used rather than 4:2:2, Figures 3a and 3b. • In digital TV environment sampling takes place in component YCbCrvideo signal format, Y being luminance and Cb & Cr chrominance components. • As was mentioned earlier, there is a difference between component representation YUV and YCbCr. Intra-frame DCT coding tMyn 17 GREEN 720*576 RED 720*576 BLUE 720*576 MATRIX Y 720*576 R-Y 720*576 B-Y 720*576 FILTER AND SCALE Cb Cr 360*288 MPEG-2 ENCODE MPEG-2 ENCODE 360*288 MPEG-2 ENCODE Figure 3a. MPEG 2, YCbCr 4:2:0 sampling. Intra-frame DCT coding tMyn 18 GREEN 720*576 RED 720*576 BLUE 720*576 MATRIX Y 720*576 R-Y 720*576 B-Y 720*576 FILTER AND SCALE Cb Cr 360*576 MPEG-2 ENCODE MPEG-2 ENCODE 360*576 MPEG-2 ENCODE Figure 3b. MPEG 2, YCbCr 4:2:2 sampling. Intra-frame DCT coding tMyn 19 • Spatial redundancy is removed by processing the digitized signals in 2-D blocks of 8 pixels by 8 lines, Figures 4a and 4b. Intra-frame DCT coding tMyn 20 8 8 8 0 1 8 8 8 2 3 4 Cb 8 8 5 Cr Y Figure 4a. Colour subsampling 4:2:0, DCT blocks. Intra-frame DCT coding tMyn 21 8 8 8 0 1 8 4 8 5 8 2 3 8 6 8 7 8 Y Cb 8 Cr Figure 4b. Colour subsampling 4:2:2, DCT blocks. Intra-frame DCT coding tMyn 22 • DCT is a reversible process which maps between the normal 2-D presentation of the image and one which represents the same information in what may be thought of as the frequency domain. • Each coefficient in the 8*8 DCT domain block indicates the contribution of a different DCT ”basis” function to the original image block. • The lowest frequency basis function (top-left) is called the DC coefficient and may be thought of as representing the average brightness of the block, Figure 5. Intra-frame DCT coding tMyn 23 DC-COEFFICIENT INCREASING HORIZONTAL FREQUENCY INCREASING VERTICAL FREQUENCY DCT IDCT Figure 5. Block-based 8*8 DCT transform pairs. Intra-frame DCT coding tMyn 24 • Next there is a generic example having one 8*8 DCT block of luminance (Y) information. • It has been assumed that Y can take a nominal 8-bit range of 16-235, Figure 6. Intra-frame DCT coding tMyn 25 70 72 70 70 72 68 68 64 103 101 103 100 99 97 94 94 132 132 132 130 129 129 125 121 157 157 155 154 153 150 148 145 168 163 164 162 163 161 161 156 172 170 165 166 163 163 162 158 174 170 167 167 164 163 164 159 174 173 170 167 167 166 166 160 Figure 6. One 8*8 set of luminance information values after sampling. Intra-frame DCT coding tMyn 26 • White corresponds to a sample value of approximately 235 and dark grey approximately 16. Intra-frame DCT coding tMyn 27 • One of the first steps in the DCT process involves finding the average luminance value of the samples and subtracting this value (in this example, let’s say, 128) from each of the 64 samples to eliminate the average or DC component. • Notice! The process can be more complex than pure average taking business! • This results in a new matrix of 64 integer values, some of which are negative in value, Figure 7. Intra-frame DCT coding tMyn 28 -58 -56 -58 -58 -56 -60 -60 -64 -25 -27 -25 -28 -29 -31 -34 -34 4 4 4 2 1 1 -3 -7 29 29 27 26 25 22 20 17 40 35 36 34 35 33 33 28 44 42 37 38 35 35 34 30 46 42 39 39 36 35 36 31 46 45 42 39 39 38 38 32 Figure 7. The first step is to subtract the average value from the sample values. Intra-frame DCT coding tMyn 29 • DCT is characterized as a derivation of Fourier analysis. • Conventional Fourier analysis applies to a continuous, periodic signal. • In general the signals that now are in interest are not periodic, and there is a need to deal with samples rather than a continuous signal. • Sampling presents no problen, the discrete Fourier transform deals with a sampled signal. • In the sampling domain the samples are meaningful only if the frequencies represented are lower than or equal to the Nyquist frequency. Intra-frame DCT coding tMyn 30 • Any amplitude and phase of a given frequency may be obtained by summing appropriate amplitudes of sine and cosine waves of that frequency. • Mirroring a waveform fragment (center point being the origin) creates even symmetry. • Mirroring means that there are twice as many samples than there were at the beginning (but of course we can discard those not needed…). • However, now the transform gives only cosine components! • This is the basis of DCT. Intra-frame DCT coding tMyn 31 • The DCT (and the inverse DCT) transform is defined by rather fearsome looking equation: C( u)C( v ) 7 7 ( 2 x 1)u ( 2 y 1)v F ( u, v ) cos , f ( x , y )cos 4 16 16 x 0 y 0 u, v 0,...7 1 7 7 ( 2 x 1)u ( 2 y 1)v f ( x, y ) C( u)C( v ) F ( u, v )cos cos , 4 u 0 v 0 16 16 x, y 0,...7 Intra-frame DCT coding tMyn 32 • In those equations the C terms are defined by 1 Cu for u 0, Cu 0 otherwise 2 1 Cv for v 0, Cv 0 otherwise 2 • The DCT equation can be illustrated with Figure 8. Intra-frame DCT coding tMyn 33 Column index y Column index v v=0 u=0 F00 F07 y=7 x=0 f 00 f 07 x=7 f 70 f 77 8*8 BLOCK F77 Row index x Row index u DCT u=7 F70 y=0 v=7 Figure 8. DCT visualized. C( u)C( v ) 7 7 ( 2 x 1)u ( 2 y 1)v F ( u, v ) cos , u, v 0,...7 f ( x , y )cos 4 16 16 x 0 y 0 Intra-frame DCT coding tMyn 34 • In this approach the DCT transforms a block of 64 intensity values into block of 64 coefficients. • The top left coefficient represents the DC level, or average intensity of the block. • When moving to the right, the coefficients represent higher horizontal frequencies, when moving down, they represent higher vertical frequencies, from Figure 7 to Figure 9: Intra-frame DCT coding tMyn 35 86 27 -3 6 -2 2 -2 2 -247 -4 -5 -3 0 -3 1 1 -117 -1 1 -1 -1 -1 -1 0 -40 -2 2 1 2 1 -2 0 -7 -2 -1 1 0 -1 -1 2 -6 1 0 0 0 0 -2 -1 -4 -1 -1 1 -2 -1 -1 -1 -3 -3 -1 1 0 1 1 1 Figure 9. The DCT is performed, yielding a single DC coefficient and 63 AC coefficients. Intra-frame DCT coding tMyn 36 Intra-frame DCT coding tMyn 37 Intra-frame DCT coding tMyn 38 • From Figure 9 it can be seen that DCT concentrates energy into the top left corner. • Many of the DCT coefficient have values close to zero – in Figure 9 there are 56 AC coefficients with magnitudes of 5 or less. • The human visual system (HVS) is quite insensitive to errors in these coefficients: values close to zero may be set to zero with very little effect on image quality. • The HVS is – on the contrary – fairly sensitive even to small errors in DC and low-frequency coefficients, but much less sensitive to amplitude errors in higherfrequency coefficients. Intra-frame DCT coding tMyn 39 • We can afford to quantize those higher-frequency coefficients much more coarsely. • As stated earlier this not only results in using fewer bits to transmit each of the non-zero values, but also means that more coefficients can be treated as zero. Intra-frame DCT coding tMyn 40 Coefficient quantization • After a block has been transformed, the transform coefficients are quantized. • Different quantization is applied to each coefficient depending on the spatial frequency within the block that it represents. • The objective is to minimize the number of bits which must be transmitted to the decoder, so that it can perform the inverse transform and reconstruct the image. Intra-frame DCT coding tMyn 41 • Reduced quantization accuracy reduces the number of bits which need to be transmitted to represent a given DCT coefficient, but increases the possible quantization error for that coefficient. • The quantization noise introduced by the coder is not reversible in the decoder, so the coding and decoding process is lossy. • More quantization error can be tolerated in the highfrequency coefficients, because HF noise is less visible than LF quantization noise. • Also, quantization noise is less visible in the chrominance components than in the luminance component. Intra-frame DCT coding tMyn 42 • MPEG uses weighting matrices to define the relative accuracy of the quantization of the different coefficients. • Different weighting matrices can be used for different frames, depending on the prediction mode used. • The weighted coefficients are then passed through a fixed quantization law, which is usually a linear law. • For some prediction modes there is an increased threshold level around zero. • The effect of this threshold is to maximize the number of coefficients which are quantized to zero. Intra-frame DCT coding tMyn 43 • Quantization noise is more visible in some blocks than in others, for example, in blocks which contain a high-contrast edge between plain areas. • After quite aggressive quantization the block in Figure 9 could look like in Figure 10. Intra-frame DCT coding tMyn 44 5 2 0 0 0 0 0 0 -21 0 0 0 0 0 0 0 -8 0 0 0 0 0 0 0 -3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Figure 10. After the quantization the number of nonzero coefficients is drastically reduced. Intra-frame DCT coding tMyn 45 • From Figure 10 it can be seen that the quantization has reduced the number of values that must be transmitted. • DCT concentrates energy into top left coefficients. • The quantization matrix emphasizes this trend, and we can see that the probability of a coefficient’s being nonzero is much higher in the top left than at the bottom right. Intra-frame DCT coding tMyn 46 Zig-Zag coefficient scanning • After quantization, the 8*8 blocks of DCT coefficients are scanned in a zigzag pattern, Figure 11, to turn the 2-D array into a serial string of quantized coefficients. • The pattern shown in Figure 11 is not the only one. Intra-frame DCT coding tMyn 47 Figure 11. Zigzag scanning pattern. Intra-frame DCT coding tMyn 48 • After scanning the quantized coefficients using the pattern shown in Figure 11 results in the following string of values: • (5) 2 –21 –8 0 0 0 0 0 –3 (all zeros, 54 out of 64) • The first value is the DC coefficient, and is shown in parentheses because it will be separated from the AC coefficients for entropy encoding. • The most striking characteristic of the sequences is that after a relatively small number of values, all remaining values are zero. Intra-frame DCT coding tMyn 49 Run-length coding • The strings of coefficients produced by the zigzag scanning are coded by counting the number of zero coefficients preceding a non-zero coefficient. • RLE, [run, amplitude], run=how many zero values, amplitude=the value which ended the zero value string. • The run-length value, and the value of the non-zero coefficient which the run of zero coefficients precedes, are then combined and coded using a variable-length code (VLC). Intra-frame DCT coding tMyn 50 Variable-Length Coding, VLC • The VLC exploits the fact that short runs of zeros are more likely than long ones, and small coefficients are more likely than large ones. • The VLC allocates codes which have different lengths, depending upon the expected frequency of occurrance of each zero run-length/non-zero coefficient value combination. • Common combinations use short code words, less common combinations use long code words. Intra-frame DCT coding tMyn 51 • All the VLCs are designed such that no complete codeword is the prefix of any other codeword. • They are similar to the well-known Huffman code. • Thus the decoder can identify where one variablelength codeword ends and another starts. Intra-frame DCT coding tMyn 52