CM0340/CMT502 Solutions CARDIFF UNIVERSITY EXAMINATION PAPER Academic Year: 2012/2013 Examination Period: Spring Examination Paper Number: CM0340/CMT502 Solutions Examination Paper Title: Multimedia Duration: 2 hours Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are 14 pages. There are 4 questions in total. There are no appendices. The maximum mark for the examination paper is 81 and the mark obtainable for a question or part of a question is shown in brackets alongside the question. Students to be provided with: The following items of stationery are to be provided: ONE answer book. Instructions to Students: Answer 3 questions. The use of calculators is permitted in this examination. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination. 1 PLEASE TURN OVER CM0340/CMT502 Solutions Q1. (a) How does the human eye sense colour? What characteristics of the human visual system can be exploited for the compression of colour images and video? The eye is basically sensitive to colour and intensity • Retina of the eye has ‘neurons’ on which light is focus. Each neuron is either a rod or a cone. [1] • Rods are not sensitive to colour - sense intensity (monochrome). [1] • Cones come in 3 types: The first responds most to light of long wavelengths, red/yellowish colours. The second type responds most to light of mediumwavelength, peaking at a green colour, The third type responds most to shortwavelength light, of a bluish colour. [1] • Each responds differently — Non linearly and not equally for RGB differently to various frequencies of light. [1] • Compression in image video uses the fact that intensity (monochrome) can be modelled in high resolution and colour modelled in lower resolution and non-linearly w.r.t colour sensitivity. [1] 5 Marks - Bookwork (b) Different colour models are often used in different applications. What is the CMYK colour model? Give an application in which this colour model is mostly used and explain the reason. The CMYK colour model use Cyan, Magenta, Yellow and Black as primaries (components). [1] The CMYK colour model is mostly used in printing because the colour pigments on the paper absorb certain colours thus a subtractive model is suitable; black is used to produce darker black than simply mixing CMY. [2] 3 Marks — Bookwork Given a colour represented in RGB colour space as R = 0.2, G = 0.6, B = 0.3, what is its representation in the CMYK colour model? First convert to CMY as C̄ 1 R 0.8 M̄ = 1 − G = 0.4 Ȳ 1 B 0.7 Then K = min(C̄, M̄ , Ȳ ) = 0.4, C = C̄ − K = 0.4, M = M̄ − K = 0, Y = Ȳ − K = 0.3. [2] 2 Marks — Unseen problem 2 CM0340/CMT502 Solutions (c) What is a colour look-up table and how is it used to represent colour? Colour Look-Up Tables (LUTs) • Store only the index of the colour LUT for each pixel. • Look up the table to find the colour (RGB) for the index [1] [1] [3] 5 Marks - Bookwork Give an advantage and a disadvantage of this representation with respect to true colour (24-bit) colour. Advantage : Use up significantly less memory than full 24-bit colour. Disadvantage : Restricted number of colours available. [1] [1] 2 Marks - Bookwork How do you convert from 24-bit colour to an 8-bit colour look up table representation? • LUT needs to be built when converting 24-bit colour images to 8-bit: grouping similar colours (each group assigned a colour entry) [1] 1 Mark - Bookwork 3 PLEASE TURN OVER CM0340/CMT502 Solutions (d) What is chroma subsampling? Why is chroma subsampling meaningful? What is the benefit of doing chroma subsampling? Chroma subsampling is a method that stores colour information at lower resolution than intensity information. [1] Chroma subsampling is meaningful because human visual system is less sensitive to variations in colour than brightness. [1] Chroma subsampling can reduce the bandwidth for colour detail in almost no perceivable visual difference. [1] 3 Marks — Bookwork For the following array of colour values, give chroma subsampling results with 4:2:2, 4:1:1 and 4:2:0 schemes. Note: Listing the formulae to obtain the entries without calculating the final numbers is acceptable. 90 100 80 18 44 62 28 23 96 82 52 48 42 78 38 22 Chroma subsampling result for 4:2:2 scheme: 90 80 44 28 96 82 52 48 [2] Chroma subsampling result for 4:1:1 scheme: 90 80 44 28 [2] Chroma subsampling result for 4:2:0 scheme: (90 + 100+ 80 + 18)/4=72 (96 + 42 + 82 + 78)/4=75 (44 + 62 + 28 + 23)/4=39 (52 + 38 + 48 + 22)/4=40 [2] 6 Marks — Unseen problem Question 1 Total Marks 27 4 CM0340/CMT502 Solutions Q2. (a) GIF and JPEG are two commonly used image representations. Do they usually use lossless or lossy compression? State the major compression algorithm (if lossless) or the lossy steps of the algorithm (if lossy) for each representation. Lossless or lossy: GIF : Lossless. JPEG : Lossy. [1] [1] Key algorithms: GIF : Key algorithm is LZW (lossless) JPEG : Lossy steps involve quantisation and chroma subsampling [1] [1] 4 Marks — Bookwork (b) Briefly describe the four basic types of data redundancy that data compression algorithms can apply to audio, image and video signals. 4 Types of Compression: • Temporal – in 1D data, 1D signals (Audio), 3D temporal frames in Video. [2] • Spatial – correlation between neighbouring pixels or data items. [2] • Spectral – correlation between colour or luminescence components. This uses the frequency domain to exploit relationships between frequency of change in data. [2] • Psycho-visual, psycho-acoustic – exploit perceptual properties of the human visual system or aural system to compress data. [2] 8 Marks Bookwork 5 PLEASE TURN OVER CM0340/CMT502 Solutions (c) Given the following string as input, /TAN/HAN/HAN/AN/, with the initial dictionary below, encode the sequence with LZW algorithm, showing the intermediate steps. Index 1 2 3 4 5 Entry / H A N T RECAP: (Not explicitly required for solution) The LZW Compression Algorithm: w = NIL; while ( read a character k ) { if wk exists in the dictionary w = wk; else { add wk to the dictionary; output the code for w; w = k; } } The steps to encode above string are given as follows: • wk is: /, EXISTS w = wk / • wk is: /T, NEW add to table, w is k: T, Code is: Output is: 1 (/) New Table Entry, 6 : /T • wk is: TA, NEW add to table, w is k: A, Code is: Output is: 5 (T) New Table Entry, 7 : TA • wk is: AN, NEW add to table, w is k: N, Code is: Output is: 3 (A) New Table Entry, 8 : AN • wk is: N/, NEW add to table, w is k: /, Code is: Output is: 4 (N) New Table Entry, 9 : N/ • wk is: /H, NEW add to table, w is k: H, Code is: Output is: 1 (/) New Table Entry, 10 : /H • wk is: HA, NEW add to table, w is k: A, Code is: Output is: 2 (H) New Table Entry, 11 : HA • wk is: AN, EXISTS w = wk: AN • wk is: AN/, NEW add to table, w is k: /, Code is: Output is: 8 (AN) New Table Entry, 12 : AN/ • wk is: /H, EXISTS w = wk: /H • wk is: /HA, NEW add to table, w is k: A, Code is: Output is: 10 (/H) New Table Entry, 13 : /HA • wk is: AN, EXISTS w = wk: AN 6 CM0340/CMT502 Solutions • wk is: AN/, EXISTS w = wk: AN/ • wk is: AN/A, NEW add to table, w is k: A, Code is: Output is: 12 (AN/) New Table Entry, 14 : AN/A • wk is: AN, EXISTS w = wk: AN • wk is: AN/, EXISTS w = wk: AN/ Output final token which is 12 To Summarise, the output Table (New Elements)i: 6 : /T 7 : TA 8 : AN 9 : N/ 10 : /H 11 : HA 12 : AN/ 13 : /HA 14 : AN/A So the output will be 1 5 3 4 1 2 8 10 12 12 10 Marks — Unseen problem applying algorithms covered in lectures. 3 marks for keeping w, 2 marks for appropriate allocation of index, 3 marks for symbol table and 3 marks for output (d) Briefly describe the LZW decoding process, and illustrate your answer with the above string sequence. RECAP: (Not explicitly required for solution) The LZW Decompression Algorithm : read a character k; output k; w = k; while ( read a character k ) /* k could be a character or a code. */ { entry = dictionary entry for k; output entry; add w + entry[0] to dictionary; w = entry; } Decoding: Have sequence: 1 5 3 4 1 2 8 10 12 12 And Code Book: Index 1 2 3 4 5 7 Entry / H A N T PLEASE TURN OVER CM0340/CMT502 Solutions So we get: • • • • • • • • • • Input: (w=k) 1 : Output (k Table entry): / Input k: 5 : Output (k Table entry): T New Table Entry, 6 : /T Input k: 3: Output (k Table entry): A New Table Entry, 7 : TA Input k: 4 Output (k Table entry): N New Table Entry, 8 : AN Input k: 1 : Output (k Table entry): / New Table Entry, 9 : N/ Input k: 2 : Output (k Table entry): H New Table Entry, 10 : /H Input k: 8 : Output (k Table entry): AN New Table Entry, 11 : HA Input k: 10: Output (k Table entry): /H New Table Entry, 12 : AN/ Input k: 12 : Output (k Table entry): AN/ New Table Entry, 13 : /HA Input k: 12 : Output (k Table entry): AN/ New Table Entry, 14 : AN/A Decoded Stream is (as expected): /TAN/HAN/HAN/AN/ Note Output Table (New Elements) is as before: 6 : /T 7 : TA 8 : AN 9 : N/ 10 : /H 11 : HA 12 : AN/ 13 : /HA 14 : AN/A 5 Marks — Unseen problem Question 2 Total Marks 27 8 CM0340/CMT502 Solutions Q3. (a) Briefly outline, with the aid of suitable diagrams, the JPEG/MPEG I-Frame compression pipeline and list the constituent compression algorithms employed at each stage in the pipeline. The Major Steps in JPEG/MPEG Coding involve: JPEG: MPEG: [2] • • • • • • • Colour Space Transform and subsampling DCT (Discrete Cosine Transformation) Quantization Zigzag Scan Discrete Pulse Code Modulation (DPCM) on DC component (in JPEG), Run length encoding (RLE) on AC Components (JPEG), all of zig zag (MPEG). Entropy Coding — Huffman or Arithmetic [7] 9 Marks Bookwork 9 PLEASE TURN OVER CM0340/CMT502 Solutions What are the key differences between the JPEG and MPEG I-Frame compression pipelines? Four main differences for • JPEG uses YIQ whilst MPEG use YUV (YCrCb) colour space [1] • MPEG used larger block size DCT windows 16 even 32 as opposed to JPEG’s 8 [1] • Different quantisation — MPEG usually uses a constant quantisation value. [1] • Only Discrete Pulse Code Modulation (DPCM) on DC component in JPEG on zig zag scan. AC (JEPG) and complete zig zag scan get RLE. [1] 4 Marks Applied Bookwork: Some lateral thinking to compare JPEG and MPEG not directly compared in course notes at least (b) Motion JPEG (or M-JPEG) is a video format that uses JPEG picture compression for each frame of the video. Why is M-JPEG not widely used as a video compression standard? Compressing in just each frame does not yield a high enough compression ratio that is required for general video needs. Can exploit temporal aspect of video to get better compression. [2] 2 Marks Bookwork Briefly state what additional approaches are used by MPEG video compression algorithms to improve on M-JPEG. Adopt some form of temporal compression. Use P-frames and B-frames to to differencing between frames and also motion estimation. [2] 2 Marks Bookwork (c) What processes above give rise to the lossy nature of JPEG/MPEG video compression? Lossy steps: • Colour space subsampling in IQ or UV components. • Quantisation reduces bits needed for DCT components. 4 Marks Bookwork 10 [2] [2] CM0340/CMT502 Solutions (d) Given the following portion from a block (assumed to be 4x4 pixels to simplify the problem) from an image after the Discrete Cosine Transform stage of the compression pipeline has been applied: 118 42 100 44 42 32 60 39 54 150 30 34 43 98 40 31 i. What is the result of the quantisation step of the MPEG video compression method assuming that a constant quantisation value of 32 is used? Trick needed to be remembered from notes is that we divide the matrix by the quantisation table or in this case a constant. So in this case divide all values by 32 and round down (Integer division). 3 1 3 1 1 1 1 1 1 0 1 1 4 1 3 0 [3] ii. What is the output of the following zig-zag step being applied to the resulting quantised block? Trick needed to be remembered from notes is that Zig-zag reads of values from DCT in an increasing low frequency order (better that row by row). Create a vector rather than a matrix. So we get a vector from matrix above: 3113114111111310 6 Marks: Unseen Problem [3] Question 3 Total Marks 27 11 PLEASE TURN OVER CM0340/CMT502 Solutions Q4. (a) In MPEG audio compression, what is i. frequency masking? When an audio signal consists of multiple frequencies the sensitivity of the ear changes with the relative amplitude of the signals. If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard. [2] 2 Marks: Bookwork ii. temporal masking? After the ear hears a loud sound, consisting of multiple frequencies, it takes a further short while before it can hear a quieter sound close in frequency.[2] 2 Marks: Bookwork Briefly describe the cause of each kind of masking in the human auditory system? Frequency Masking: • Stereocilia in inner ear get excited as fluid pressure waves flow over them. [1] • Stereocilia of different length and tightness on Basilar membrane so resonate in sympathy to different frequencies of fluid waves (banks of stereocilia at each frequency band). . [1] • Stereocilia already excited by a frequency cannot be further excited by a lower amplitude near frequency wave. [1] 3 Marks: Bookwork Temporal Masking: • (Like frequency masking) Stereocilia in inner ear get excited as fluid pressure waves flow over them and respond to different frequencies. [1] • Stereocilia already excited by a certain frequency will take a while to return to rest state, as inner ear is a closed fluid chamber and pressure waves will eventually dampen down. [1] • Similar to frequency masking Stereocilia in a ’dampening state’ may not respond to a a lower amplitude near frequency wave. [1] 3 Marks: Bookwork 6 Marks: subtotal 10 Marks: Q4(a) Total 12 CM0340/CMT502 Solutions (b) Briefly describe, using a suitable diagram if necessary, the MPEG-1 audio compression algorithm, outlining how frequency masking and temporal masking are encoded. MPEG audio compression basically works by: • Dividing the audio signal up into a set of frequency subbands (Filtering) [1] • Use filter banks to achieve this. • Sub-bands approximate critical bands. • Each band quantised according to the audibility of quantisation noise. 27 [2] [1] [1] [1] Frequency masking and temporal masking are encoded by: Frequency Masking MPEG Audio encodes this by quantising each filter bank with adaptive values from neighbouring bands energy, defined by a look up table. [2] Temporal Masking — Not so easy to model as frequency masking. MP3 achieves this with a 50% overlap between successive transform windows gives window sizes of 36 or 12 and applies basic frequency masking as above. [2] 10 Marks: Bookwork 13 PLEASE TURN OVER CM0340/CMT502 Solutions (c) In MPEG-4 Audio an alternative synthesis-based approach may be adopted to achieve compression. Briefly discuss how the following may be compressed with MPEG-4 Audio: • Musical Audio Signals. • Spoken Word Audio. What are advantages and disadvantages of such approaches? • Musical Audio Signal — Use MIDI type Structured Audio facilities in MPEG4. ”Compose” music from Scratch using S/W tools or use pitch-to-MIDI or some transcription tools. [1] • Spoken Word Audio. Use Text-to-Speech (TTS) facilities in MPEG-4. Again could transcribe audio or use some text-to-speech analysis tools. [1] Advantages: Lose control of the true nature of sounds so audio won’t sound like given speaker or the source music. [1] Disadvantages: Very low bitrate streams/compression [1] 4 Marks: Applied Bookwork, Text-to-Speech UNSEEN (d) Assume that after analysis, the critical band filters of MPEG-1 Audio have output the levels of 3 consecutive critical bands as: Band 1 2 Level (dB) 20 90 3 55 Assuming that signal-to-mask ratios for bands 1, 2 and 3 are for signals above 80 dB in band 2 a masking of 30 dB in band 1 and 40 dB in band 3: Show how temporal masking is implemented in MPEG audio compression. What is the saving in bits to transmit the masked value in each masked band? Relies on simple thresholding above or below given values (look-up table) • In band 1 20 dB < 30 dB so ignore it, don’t send any bits, saving is clearly 4 bits.. [1] • In band 3 55 dB > 40 dB so ignore it, so send difference value above masking value: 15 dB (suitable coded). 4 bit instead of 6 bits: Saving of 2 bits (= 12 dB). [2] 3 Marks: Unseen problem Question 4 Total Marks 27 14X END OF EXAMINATION