Uploaded by adityafordownload

midterm F05 solution

advertisement
Polytechnic University, Dept. Electrical and Computer Engineering
EE4414 Multimedia Communication System II, Fall 2005, Yao Wang
___________________________________________________________________________________
First Exam (10/20/2005 11-12:50)
Closed-book, 1 sheet of notes (single or double sided) allowed, no peeking into neighbors!
Please write your answers directly on the provided space.
SOLUTION
Your Name: ______________________________________________
Prob
1
2
3
4
5
Total
Score
1. (25 pt) Consider the following two raster scan formats: progressive scan using 30 frames/second, 400
lines/frame, and interlaced scan using 60 fields/second, 200 lines/field. For each scan format, determine
a. The overall line rate (lines/second)
b. The maximum possible temporal frequency (number of cycles per second) of the signal
c. The maximum possible vertical frequency (number of cycles per picture height) of the signal
d. The maximum possible frequency (the number of cycles per second) in the 1D waveform of the
raster signal, assuming the image aspect ratio is 1:1.
e. Based on your results, explain the pros and cons of these two scan formats.
Solution:
For progressive scan:
a, 30*400=12000 lines/s.
b. the maximum temporal frequency is when successive frames alternate between black and white, with
temporal frequency = frame rate/2 =15 cycles/s.
c. the maximum vertical frequency is when successive lines alternate between black and white, with a
frequency 400/2=200 cylces/picture-height
d. the maximum frequency of the 1D waveform = sampling rate/2=400*200*30/2=1200000 cycles/s.
For interlaced scan:
a, 60*200=12000 lines/s.
b. the maximum temporal frequency is when successive fields alternate between black and white, with
temporal frequency = field rate/2 =30 cycles/s.
c. the maximum vertical frequency is when successive lines alternate between black and white, with a
frequency 200/2=100 cylces/picture-height
d. the maximum frequency of the 1D waveform = sampling rate/2=200*200*60/2=1200000 cycles/s.
From answer above, we see that the raster scan can support higher temporal resolution, but lower vertical
resolution.
Note for answers to the b-d, we don’t need to use a Kell factor, as I am asking you about the possible maximum
frequency of the signal, not the maximum frequency that can be supported by the system.
2. (25 pt) NTSC color TV system
a. (10 pt) Figure 2(a) shows a simplified block diagram of an NTSC color TV transmitter. Briefly
explain the function of each component. Draw a dashed box on this figure to include all
components that comprises a QAM modulator that multiplexes I and Q signals together.
Audio
FM modulator
4.5MHz
R(t)
Y(t)
LPF1
0-4.2MHz
G(t)
R
G
B
to
Y
I
Q
I(t)
LPF2
0-1.5MHz
BPF
-/2
B(t)
Q(t)
+
2-4.2MHz

VSB
To transmit
antenna
LPF3
0-0.5MHz
Acos(2fct)
Figure 2(a)
The RGB to YIQ converter converts the original signal from RGB to YIQ coordinates. The three low pass
filters (LPF1,LPF2, LPF3) bandlimit each signal according to its expected bandwidth and the allocated
frequency band. The dashed box implements the QAM function, which multiplex the I and Q components into a
single signal at a color sub-carrier frequency of fc. The BPF is a bandpass filter, which bandlimits the
multiplexed I and Q signal to the range of 2-4.2 MHz. This is necessary because at fc=3.58 MHz, the
multiplexed signal would have a bandwidth of 3.58-1.5=2.08 to 3.58+1.5=5.08MHz because the bandwidth of
the I signal is 1.5 MHz. Without the BPF, the multiplexed signal will interfere with the audio signal. This BPF
essentially leaves the Q signal as is, but cut off some of the upper sideband of the I signal. The FM modulator
shifts the audio signal to the audio subcarrier frequency of 4.5 MHz. Then the modulated audio, the Y signal
and the multiplexed I and Q signal are added together. Finally there go through the VSB modulator, which
modulate the combined signal to a designated picture carrier frequency and removes most of the lower side
band.
Note that if you include LPF2 and LPF3 or BPF in the dashed box for QAM, it is OK, and no points are
deducted.
b. (10 pt) Figure 2(b) shows a simplified block diagram of an NTSC color TV receiver. Briefly
explain the function of each component briefly. Draw a dashed box on the figure to include all
components that comprises a QAM demodulator that separates the I and Q signals.
Audio
FM demodulator
Composite
video
Y(t)
LPF1
0-4.2MHz
BPF,
4.44.6
MHz
BPF,
04.2
MHz
+
_
LPF2

I(t)
0-1.5MHz
-/2
To
speaker
R(t)
Y
I
Q
to
R
G
B
G(t)
To
monitor
B(t)
LPF3
0-0.5MHz
Q(t)
VSB
Demodulator
2Acos(2fct)
From antenna
Figure 2(b)
The VSB demodulator brings the received signal from its picture carrier frequency to the baseband. The BPF at
left extracts the audio signal, the BPF at the right extracts the composite video signal. The extracted audio signal
is still at the audio sub-carrier frequency. The FM demodulator brings the audio signal back to the baseband,
which is then sent to the speaker. The LPF1 extracts the luminance signal (Y) from the composite video signal.
The difference between the composite video and the extracted Y signal is the multiplexed I and Q signal. The
dashed box implements QAM demodulator and separates the I and Q signal and bring each back to its
baseband. The YIQ to RGB converter converts the YIQ components to RGB components, which are then sent
to the monitor.
c. (5 pt) In Figure 2(b), there are three low-pass filters (indicated by LPF1, LPF2, LPF3). To
improve the received video quality, which filter should you change and how? Briefly explain
why.
LPF1 should be changed to a comb filter with a bandwidth of 0-4.2 MHz. This is because the spectra of both Y
and the multiplexed I and Q signal have harmonic peaks, with the peaks of the I/Q signal sit in between the
peaks of the Y signal. When a LPF is used to filter out the Y component, the extracted Y signal will contain
high frequency portion of the I/Q signal, and the extracted I/Q signal will contain high frequency portion of the
Y signal. By using a carefully designed comb filter, the Y signal can be more accurately extracted.
Note that the LPF1’s passband should have been lower than 4.2 MHz. It was a typo. I meant to write 3 MHz.
Some answers for this question suggest to reduce the bandwidth of LPF1, which is correct. I gave such answers
a partial credit of 2 pt.
3. (15 pt) Figure 3(a) below shows two interlaced video frames.
a. (3pt) Generate the field data associated with each frame. Write down your results in the
graph provided in Figure 3(b)
b. (3pt) Deinterlace field 1 of frame 2 using field averaging. Write down the deinterlaced
field. To save time, you only need to fill in the second line in field 1 of frame 2.
c. (3pt) Now try line averaging. Write down the deinterlaced field.
100
100
50
100
100
100
100
50
50
100
100
50
100
50
50
100
100
50
50
50
50
100
50
50
50
50
50
100
50
50
Frame1
Frame2
100
100
50
0
0
0
100
100
100
0
0
0
0
0
0
100
50
50
0
0
0
100
100
50
100
50
50
0
0
0
100
100
50
0
0
0
0
0
0
50
50
50
0
0
0
100
50
50
50
50
50
0
0
0
100
50
50
0
0
0
Field 1, frame1
Field 2, frame1
100
75
Field 1, frame2
Deinterlaced by
field averaging
50
Field 1, frame2
100
Field 2, frame2
100
75
Field 1, frame2
Deinterlaced by line
averaging
d. (3pt) In general, which method is better for what type of scene content?
In general, line averaging is better for fast moving scenes with vertical patterns, and field averaging is
better for slow moving scenes.
e. (3pt) Each of the proceeding deinterlacing methods requires one to store the available pixel
values from one or more frames into a memory. State the number of video frames that each
method has to store to perform deinterlacing. Based on your result, compare the complexity
of the two methods.
Field averaging requires storage of two frames (the current and past frame), whereas line averaging requires
storage of only the current frame. Therefore, field average is more complex.
4. (15 pt ) For a video of fs frames/second, WxH pixels/frame, what is the number of operations needed per
second to accomplish half-pel EBMA if we use block size of BxB, search range of –R to R in both
horizontal and vertical directions? (count one subtraction and taking absolute value, and sum of two
numbers as one operation. Ignore the computation necessary for interpolating the target frame initially.
Please briefly explain your reasoning. Express your result in terms of parameters fs , W, H, B, R). What
parameters (among fs , W, H, B, R) affect the accuracy of the predicted image and why?
Comparison with each candidate block takes N1=B^2 operations. With a search range of –R to R in half pel
accuracy, there are N2=((2R+1)*2)^2 candidates for each image block in a frame, with total operations for each
image block being N3=((2R+1)*2)^2 B^2. A frame of size WxH has N4=W/B * H/B blocks. Therefore the total
number of operation for one frame is N5=N4*N3=W*H*((2R+1)*2)^2. With frame rate of fs, the total number
of operations per second is N5*fs=fs* W*H*((2R+1)*2)^2. The block size and the search range affects the
prediction accuracy. Typically, a smaller block size (B) and a larger search range (R ) leads to more accurate
prediction. But larger R also leads to more computation. The frame fs also affects the accuracy. When fs is high,
motion between adjacent frames is small, and prediction is more accurate.
5. (20 pt) Video coding and motion estimation:
a. (5 pt) What is the main benefit of using motion-compensated temporal prediction in video coding,
compared to code a video frame directly? What are some of the problems due to motioncompensated temporal prediction?
b. (5 pt) Propose some ways to reduce the computation for motion estimation using EBMA (possibly at
the expense of the prediction accuracy
c. (5 pt) The EBMA assumes the pixels in a block undergo the same translation from one frame to
another. Give some examples when this assumption is inaccurate.
d. (5 pt) What is the difference between unidirectional temporal prediction and bi-directional temporal
prediction? What are the benefits from bi-directional temporal prediction? Give an example when bidirectional prediction from a past and a future frame will clearly outperform uni-directional
prediction from a past frame. What are the disadvantages associated with bi-directional prediction?
Answers:
a. With motion compensated temporal prediction, we only code the prediction errors and the motion
vectors. The prediction error generally has a variance significantly smaller than the original signal, and
can be coded with fewer bits than the original signal. One major problem with using temporal prediction
is if some of the encoded bits in one frame are corrupted during transmission, the decoded frame for this
frame will be wrong. Even if the prediction error signal for following frames are received correctly, the
decoded frames will be wrong because these are decoded based on a wrong reference frame. This is
know as transmission error propagation. Some other problems includes high computation cost for
motion estimation and difficulty in random access.
Note that some of you say that if prediction is not accurate, there will be error propagation. This is not
correct. Remember that prediction error will be coded and sent. When the prediction is not accurate,
prediction error takes more bits to code (therefore a less efficient video encoder), but this will not lead to
error propagation.
b. Please note I am asking how will you reduce the complexity for a given search range and step size.
Some of you suggested to reduce the search range and use an interger step size. Such answers get partial
credit of 2 pts.
For given search range and step size, one way to reduce the complexity is by first search with a large step
size in the specified search range to find an initial candidate motion vector. Then search in a small
neighborhood of this initial candidate with the specified stepsize.
c. When a block containing parts of two different objects with different motions, this assumption is
inaccurate.
Some of you mentioned when scene change occurs. This gets a partial credit of 2 pts.
d. Unidirection prediction predicts a current frame from a past frame. Bidirectional prediction predicts a
current frame both from a past frame and a future frame (previously coded), and uses a weighted average of
the two predictions. Bidirectional prediction generally is more accurate, as it contains uni-directional
prediction as a special case (with weight 0 for the prediction from the past frame). One example where
bidirectional prediction will outperform unidirectional prediction is when there is a scene change, so that the
current frame is complete different from the past frames, but the future frame is similar to the current frame.
The disadvantage of bi-directional predictions includes encoding delay and complexity.
Download