Polytechnic University, Dept. Electrical and Computer Engineering EE4414 Multimedia Communication System II, Fall 2005, Yao Wang ___________________________________________________________________________________ First Exam (10/20/2005 11-12:50) Closed-book, 1 sheet of notes (single or double sided) allowed, no peeking into neighbors! Please write your answers directly on the provided space. SOLUTION Your Name: ______________________________________________ Prob 1 2 3 4 5 Total Score 1. (25 pt) Consider the following two raster scan formats: progressive scan using 30 frames/second, 400 lines/frame, and interlaced scan using 60 fields/second, 200 lines/field. For each scan format, determine a. The overall line rate (lines/second) b. The maximum possible temporal frequency (number of cycles per second) of the signal c. The maximum possible vertical frequency (number of cycles per picture height) of the signal d. The maximum possible frequency (the number of cycles per second) in the 1D waveform of the raster signal, assuming the image aspect ratio is 1:1. e. Based on your results, explain the pros and cons of these two scan formats. Solution: For progressive scan: a, 30*400=12000 lines/s. b. the maximum temporal frequency is when successive frames alternate between black and white, with temporal frequency = frame rate/2 =15 cycles/s. c. the maximum vertical frequency is when successive lines alternate between black and white, with a frequency 400/2=200 cylces/picture-height d. the maximum frequency of the 1D waveform = sampling rate/2=400*200*30/2=1200000 cycles/s. For interlaced scan: a, 60*200=12000 lines/s. b. the maximum temporal frequency is when successive fields alternate between black and white, with temporal frequency = field rate/2 =30 cycles/s. c. the maximum vertical frequency is when successive lines alternate between black and white, with a frequency 200/2=100 cylces/picture-height d. the maximum frequency of the 1D waveform = sampling rate/2=200*200*60/2=1200000 cycles/s. From answer above, we see that the raster scan can support higher temporal resolution, but lower vertical resolution. Note for answers to the b-d, we don’t need to use a Kell factor, as I am asking you about the possible maximum frequency of the signal, not the maximum frequency that can be supported by the system. 2. (25 pt) NTSC color TV system a. (10 pt) Figure 2(a) shows a simplified block diagram of an NTSC color TV transmitter. Briefly explain the function of each component. Draw a dashed box on this figure to include all components that comprises a QAM modulator that multiplexes I and Q signals together. Audio FM modulator 4.5MHz R(t) Y(t) LPF1 0-4.2MHz G(t) R G B to Y I Q I(t) LPF2 0-1.5MHz BPF -/2 B(t) Q(t) + 2-4.2MHz VSB To transmit antenna LPF3 0-0.5MHz Acos(2fct) Figure 2(a) The RGB to YIQ converter converts the original signal from RGB to YIQ coordinates. The three low pass filters (LPF1,LPF2, LPF3) bandlimit each signal according to its expected bandwidth and the allocated frequency band. The dashed box implements the QAM function, which multiplex the I and Q components into a single signal at a color sub-carrier frequency of fc. The BPF is a bandpass filter, which bandlimits the multiplexed I and Q signal to the range of 2-4.2 MHz. This is necessary because at fc=3.58 MHz, the multiplexed signal would have a bandwidth of 3.58-1.5=2.08 to 3.58+1.5=5.08MHz because the bandwidth of the I signal is 1.5 MHz. Without the BPF, the multiplexed signal will interfere with the audio signal. This BPF essentially leaves the Q signal as is, but cut off some of the upper sideband of the I signal. The FM modulator shifts the audio signal to the audio subcarrier frequency of 4.5 MHz. Then the modulated audio, the Y signal and the multiplexed I and Q signal are added together. Finally there go through the VSB modulator, which modulate the combined signal to a designated picture carrier frequency and removes most of the lower side band. Note that if you include LPF2 and LPF3 or BPF in the dashed box for QAM, it is OK, and no points are deducted. b. (10 pt) Figure 2(b) shows a simplified block diagram of an NTSC color TV receiver. Briefly explain the function of each component briefly. Draw a dashed box on the figure to include all components that comprises a QAM demodulator that separates the I and Q signals. Audio FM demodulator Composite video Y(t) LPF1 0-4.2MHz BPF, 4.44.6 MHz BPF, 04.2 MHz + _ LPF2 I(t) 0-1.5MHz -/2 To speaker R(t) Y I Q to R G B G(t) To monitor B(t) LPF3 0-0.5MHz Q(t) VSB Demodulator 2Acos(2fct) From antenna Figure 2(b) The VSB demodulator brings the received signal from its picture carrier frequency to the baseband. The BPF at left extracts the audio signal, the BPF at the right extracts the composite video signal. The extracted audio signal is still at the audio sub-carrier frequency. The FM demodulator brings the audio signal back to the baseband, which is then sent to the speaker. The LPF1 extracts the luminance signal (Y) from the composite video signal. The difference between the composite video and the extracted Y signal is the multiplexed I and Q signal. The dashed box implements QAM demodulator and separates the I and Q signal and bring each back to its baseband. The YIQ to RGB converter converts the YIQ components to RGB components, which are then sent to the monitor. c. (5 pt) In Figure 2(b), there are three low-pass filters (indicated by LPF1, LPF2, LPF3). To improve the received video quality, which filter should you change and how? Briefly explain why. LPF1 should be changed to a comb filter with a bandwidth of 0-4.2 MHz. This is because the spectra of both Y and the multiplexed I and Q signal have harmonic peaks, with the peaks of the I/Q signal sit in between the peaks of the Y signal. When a LPF is used to filter out the Y component, the extracted Y signal will contain high frequency portion of the I/Q signal, and the extracted I/Q signal will contain high frequency portion of the Y signal. By using a carefully designed comb filter, the Y signal can be more accurately extracted. Note that the LPF1’s passband should have been lower than 4.2 MHz. It was a typo. I meant to write 3 MHz. Some answers for this question suggest to reduce the bandwidth of LPF1, which is correct. I gave such answers a partial credit of 2 pt. 3. (15 pt) Figure 3(a) below shows two interlaced video frames. a. (3pt) Generate the field data associated with each frame. Write down your results in the graph provided in Figure 3(b) b. (3pt) Deinterlace field 1 of frame 2 using field averaging. Write down the deinterlaced field. To save time, you only need to fill in the second line in field 1 of frame 2. c. (3pt) Now try line averaging. Write down the deinterlaced field. 100 100 50 100 100 100 100 50 50 100 100 50 100 50 50 100 100 50 50 50 50 100 50 50 50 50 50 100 50 50 Frame1 Frame2 100 100 50 0 0 0 100 100 100 0 0 0 0 0 0 100 50 50 0 0 0 100 100 50 100 50 50 0 0 0 100 100 50 0 0 0 0 0 0 50 50 50 0 0 0 100 50 50 50 50 50 0 0 0 100 50 50 0 0 0 Field 1, frame1 Field 2, frame1 100 75 Field 1, frame2 Deinterlaced by field averaging 50 Field 1, frame2 100 Field 2, frame2 100 75 Field 1, frame2 Deinterlaced by line averaging d. (3pt) In general, which method is better for what type of scene content? In general, line averaging is better for fast moving scenes with vertical patterns, and field averaging is better for slow moving scenes. e. (3pt) Each of the proceeding deinterlacing methods requires one to store the available pixel values from one or more frames into a memory. State the number of video frames that each method has to store to perform deinterlacing. Based on your result, compare the complexity of the two methods. Field averaging requires storage of two frames (the current and past frame), whereas line averaging requires storage of only the current frame. Therefore, field average is more complex. 4. (15 pt ) For a video of fs frames/second, WxH pixels/frame, what is the number of operations needed per second to accomplish half-pel EBMA if we use block size of BxB, search range of –R to R in both horizontal and vertical directions? (count one subtraction and taking absolute value, and sum of two numbers as one operation. Ignore the computation necessary for interpolating the target frame initially. Please briefly explain your reasoning. Express your result in terms of parameters fs , W, H, B, R). What parameters (among fs , W, H, B, R) affect the accuracy of the predicted image and why? Comparison with each candidate block takes N1=B^2 operations. With a search range of –R to R in half pel accuracy, there are N2=((2R+1)*2)^2 candidates for each image block in a frame, with total operations for each image block being N3=((2R+1)*2)^2 B^2. A frame of size WxH has N4=W/B * H/B blocks. Therefore the total number of operation for one frame is N5=N4*N3=W*H*((2R+1)*2)^2. With frame rate of fs, the total number of operations per second is N5*fs=fs* W*H*((2R+1)*2)^2. The block size and the search range affects the prediction accuracy. Typically, a smaller block size (B) and a larger search range (R ) leads to more accurate prediction. But larger R also leads to more computation. The frame fs also affects the accuracy. When fs is high, motion between adjacent frames is small, and prediction is more accurate. 5. (20 pt) Video coding and motion estimation: a. (5 pt) What is the main benefit of using motion-compensated temporal prediction in video coding, compared to code a video frame directly? What are some of the problems due to motioncompensated temporal prediction? b. (5 pt) Propose some ways to reduce the computation for motion estimation using EBMA (possibly at the expense of the prediction accuracy c. (5 pt) The EBMA assumes the pixels in a block undergo the same translation from one frame to another. Give some examples when this assumption is inaccurate. d. (5 pt) What is the difference between unidirectional temporal prediction and bi-directional temporal prediction? What are the benefits from bi-directional temporal prediction? Give an example when bidirectional prediction from a past and a future frame will clearly outperform uni-directional prediction from a past frame. What are the disadvantages associated with bi-directional prediction? Answers: a. With motion compensated temporal prediction, we only code the prediction errors and the motion vectors. The prediction error generally has a variance significantly smaller than the original signal, and can be coded with fewer bits than the original signal. One major problem with using temporal prediction is if some of the encoded bits in one frame are corrupted during transmission, the decoded frame for this frame will be wrong. Even if the prediction error signal for following frames are received correctly, the decoded frames will be wrong because these are decoded based on a wrong reference frame. This is know as transmission error propagation. Some other problems includes high computation cost for motion estimation and difficulty in random access. Note that some of you say that if prediction is not accurate, there will be error propagation. This is not correct. Remember that prediction error will be coded and sent. When the prediction is not accurate, prediction error takes more bits to code (therefore a less efficient video encoder), but this will not lead to error propagation. b. Please note I am asking how will you reduce the complexity for a given search range and step size. Some of you suggested to reduce the search range and use an interger step size. Such answers get partial credit of 2 pts. For given search range and step size, one way to reduce the complexity is by first search with a large step size in the specified search range to find an initial candidate motion vector. Then search in a small neighborhood of this initial candidate with the specified stepsize. c. When a block containing parts of two different objects with different motions, this assumption is inaccurate. Some of you mentioned when scene change occurs. This gets a partial credit of 2 pts. d. Unidirection prediction predicts a current frame from a past frame. Bidirectional prediction predicts a current frame both from a past frame and a future frame (previously coded), and uses a weighted average of the two predictions. Bidirectional prediction generally is more accurate, as it contains uni-directional prediction as a special case (with weight 0 for the prediction from the past frame). One example where bidirectional prediction will outperform unidirectional prediction is when there is a scene change, so that the current frame is complete different from the past frames, but the future frame is similar to the current frame. The disadvantage of bi-directional predictions includes encoding delay and complexity.