mt 2 sol

advertisement
ELE 488 F 06
Midterm Test II - Solution
1. The standard quantization table for JPEG is given below as Q, along with the DCT
of an 8x8 block, given below as D.
Q = 16
12
14
14
18
24
49
72
11
12
13
17
22
35
64
92
10
14
16
22
37
55
78
95
16 24 40
19 26 58
24 40 57
29 51 87
56 68 109
64 81 104
87 103 121
98 112 100
51 61
60 55
69 56
80 62
103 77
113 92
120 101
103 99
D = 1329
-19
-1
13
-1
4
-7
5
23
56
85
-8
-10
0
6
4
3
32
-3
-29
-18
-9
5
-6
-20
-12
-33
-12
4
27
0
-3
14
8
-2
2
21
7
2
-7
3
-3
-5
-6
0
-6
-5
2
-8
10
-6
2
-4
2
-1
-1
6
-7
8
1
5
3
1
2
(a) (10 pts) Use the table Q with a multiplier of 2 to quantize D. What are the
quantized DCT coefficients?
The actual quantization step is 2 times the entries of Q.
For the first row, 1329/(2x16) = 42.75, round to 42. 23/(2x11) = 1.05, round to 1.
3/20=0.15, round to 0, -20/32= –0.625, round to – 1, . . .
Proceed in this way, the first 4 rows of quantized DCT coefficients are:
43 1 0 -1 0 0 0 0
-1 2 1 0 0 0 0 0
0 3 0 -1 0 0 0 0
0 0 -1 0 0 0 0 0
The rest 4 rows have all 0 entries.
(b) (7 pts) What is the mean squared quantization error of the (1,1)-th DCT coefficients?
The question can be interpreted in two different ways:
1. The unquantized coef is 56, and the quantized value is 2 x(2x12) = 48.
So the squared error is (56 – 48)^2 = 64.
2. step size is 24, MSE quantization error is 24^2/12 = 48
These two answers came out to be the same is a coincidence.
(c) (8 pts) Describe the run length coding in JPEG using part (a) for illustration.
The zigzag pattern is 43 1 -1 0 2 0 -1 1 3 0 0 0 0 0 0 0 -1 -1 0 to end.
The DC coefficient 43 is differentially coded with other DC coefficients.
The symbols for the rest are: ‘number of zeros before a nonzero quantized
coefficient. So here we have: no zero before 1, no zero before -1, one zero before
2, one zero before -1, no zero before 1, no zero before 3, 7 zeros before -1
2. (a) (10 pts) What is a B-frame in MPEG encoding? Why use B-frames?
MPEG frames form a sequence of the form: I B B P B B P B B P B B I . . .
I’s are ‘intra’ frames, coded by itself. P’s are predicted frames, coded differentially
using the previous I or P frames with motion compensation. B’s are bidirectional
frames, coded differentially using both previous and following P or I frames with
motion compensation. The most obvious advantage of using B frames is for occlusion.
(b) (10 pts) What are motion vectors in MPEG encoding? How are they determined?
For each macroblock (16x16) in a P frame, we look for a 16x16 array of pixels in the
previous P or I frame that best matches the given macroblock. The goodness of
match is computed most commonly using the sum of absolute pixel differences. The
relative position of the macroblock and the best matched macroblock is the motion
vector of that macroblock. Usually the search for best match is made only over a
selected area.
3. Given a triangle with the three vertices at (0,0), (1,1) and (0,1). Suppose f(x,y) = 1
inside this triangle and f(x,y) = 0 outside. For a given θ, the Radon transform pθ(r) is
zero outside an interval a(θ) < r < b(θ), and the peak of pθ(r) is at r = c(θ).
(a) (10 pts) Determine the analytic expression for a(θ), b(θ), and c(θ).
Denote by A the point (0,0), B the point (0,1), and C the point (1,1).
Then A maps to r = 0 for all θ. B maps to r = sinθ, and C maps to r = cosθ + sinθ.
So a = min (0, sinθ, cosθ + sinθ), b = max (0, sinθ, cosθ + sinθ), c = not a and not b
See (b) below.
(b) (10 pts) Sketch the three trajectories r = a(θ), r = b(θ), and r = c(θ) in the (r, θ) plane.
First sketch for 0 < θ < π. r = 0, r = sinθ, and r = cosθ + sinθ.
It will be seen that for 0 < θ < π/2, 0 < sinθ < cosθ + sinθ.
For π/2 < θ < 3π/4, 0 < cosθ + sinθ < sinθ .
For 3π/4 < θ < π, cosθ + sinθ < 0 < sinθ .
So a(θ) = 0 for 0 < θ < 3π/4, a(θ) = cosθ + sinθ for 3π/4 < θ < π.
b(θ) = cosθ + sinθ for 0 < θ < π/2, b(θ) = sinθ for π/2 < θ < π.
c(θ) = sinθ for 0 < θ < π/2, c(θ) = cosθ + sinθ for π/2 < θ < 3π/4, c(θ) = 0 for 3π/4 < θ < π.
(c) (10 pts) Sketch pθ(r) for θ=0, 3π/8, and 5π/8.
pθ(r) is triangular shaped in the (r, pθ) plane.
For θ=0, the three vertices are at (0,0), (0, 1) and (1,0).
For θ=3π/8, the three vertices are at (0,0), (cos(π/8), 1/(√2 cos(π/8) ) ) and
(√2 cos(π/8), 0).
For θ=5π/8, the three vertices are at (0,0), (√2 sin(π/8), 1/( cos(π/8) ) ) and
(cos(π/8), 0).
4. (a) (13ps) The binary watermark “I” is to be inserted in an 8x8 image to get a watermarked
image. The pixel value of the original image is shown below. How do you determine the
pixel value of the (1,3)-th element and of the (2,1)-th element? What additional
information do you need to accomplish this?
(b) (6 pts) How can the watermark “I” be retrieved from the watermarked image?
(c) (6 pts) What happens when the bottom half of the watermarked image is replaced by
an image of size 4x8?
See lecture noted L19.
Download