Tutorial_12_E

advertisement
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Tutorial 12 –Signal Representation, Visual communication
Question 1:
N N
is a Hilbert space with the following inner product:
N 1 N 1
a, b   a  m, n   b  m, n 
m 0 n 0
Additionally, there is a collection of N 2 matrixes  k ,l  :
   2m  1  k   
   2n  1  l  
   C  l   cos 

2N
2N

  

 

k ,l  m, n   C  k   cos 

 1 N
C k   
 2 N
k 0
k 0
a. Show that in this space the collection of matrixes  k ,l  is an orthonormal group.
Definition:
A picture a 
N N
is given by:
a
N 1
 aˆk ,l  k ,l
k ,l  0
The DCT (Discrete Cosine Transform) transform of a is a .
b. Given the following matrix (N=8):
 44

 48
 41

 53
a
 41
 44

 44
 40

38 43
49 4
44 - 2
45 - 20
51 12
42 16
41 30
49 41
23
- 44
- 69
- 73
- 70
- 56
- 32
7
18
- 57
- 73
- 74
- 84
- 71
- 60
-3
9
- 60
- 84
- 89
- 92
- 71
- 48
- 24
3
- 59
- 77
- 96
- 98
- 70
- 56
- 15
10 

- 57 
- 68 

- 78 

- 85 
- 79 

- 56 
- 15 
What is the DCT transform of the matrix?
c. Assuming the aforementioned matrix describes a picture, we wish to describe it
using only N 2/2 pixels. To do so we shall choose the most meaningful members
(in their absolute value) in matrix a . What is the pixel's average reconstruction
error in grey levels number terms? What is the reconstruction error when
representing the picture using the N 2/2 DCT coefficients of the picture?
d. A pixel in the picture has a diameter d . The distance from the viewer to the
picture is D. What is the connection between the matrix kl and the spatial
frequency (radial angular) which it represents?
Find the values of the following MTF function in the matching frequencies
(weight matrix).
1
Visual and Auditory Systems
Tutorial 12
  






  e  0 
MTF    C   
0 

The Technion - Israel Institute of Technology
Electrical engineering faculty

C  2.2,   0.192,   1.1, 0  8 cpd  , D  3072  d
e. Based on the MTF function, how can the DCT coefficients be quantified
(quantization) while causing a minimal damage to the quality of the received
picture?
2
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Solution:
Section a:
The group  k ,l  defines a collection of N 2 lattices. Each lattice is a multiplication of a
vertical lattice and a horizontal lattice, unlike a directional lattice defined by
cos  x x   y y  . That is how for N=8 we receive the following 64 lattices:
l
k
The inner product of the two lattices  k1 , l1  ,  k2 , l2  is:
k ,l , k ,l 
1 1
2 2
 C  k1  C  l1  C  k2  C  l2  
N 1 N 1
   m  1 2   k1 
   n  1 2   l1 
   m  1 2   k2 
   n  1 2   l2 
  cos 
  cos 
  cos 
  cos 

N
N
N
N
m0 n 0








 C  k1  C  l1  C  k2  C  l2  
N 1
   m  1 2   k1 
   m  1 2   k2  N 1
   n  1 2   l1 
   n  1 2   l2 
  cos 

cos


   cos 
  cos 

N
N
N
N
m0



 n 0




  k1 , k2 
  l1 ,l2 
 C  k1   C  l1   C  k2   C  l2     k1 , k2     l1 , l2     k1  k2     l1  l2 
 N
k1  k 2  0
 0

else
For: k1, k2    N 2 k1  k2  0
Meaning we are looking at an orthonormal group.
3
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
The function   k2 , l2  can be found:
k1, k 2  

1
2
1

2
N 1
  m  1 2k1 
  m  1 2k 2 
  cos

N
N



 cos
m 0
N 1
  m  1 2k1  k 2  
  m  1 2k1  k 2  
  cos

N
N



 cos
m 0
N 1

 jmk1  k 2  j k1  k 2   1

 
N
2N
 2
 Re exp 
m 0
N 1

 jmk1  k 2  j k1  k 2  


N
2N

 Re exp 
m 0
N 1
N 1
 1
 1
 j k1  k 2  
 jmk1  k 2  
 j k1  k 2  
 jmk1  k 2  
 Re  exp 
  exp 
  Re  exp 
  exp 

2N
N
2N
N

 m 0



 m 0


 2
 2




1   j k1  k 2   exp  j k1  k 2   1  1   j k1  k 2   exp  j k1  k 2   1 
 Re exp 


  Re exp 

2  
2N
2N
 exp  j k1  k 2    1  2  
 exp  j k1  k 2    1 







N
N
  





A
B
For expression A we receive:
k1  0, k2  0
(l'Hôpital's rule)
A N 2
k1, k 2  0
k1  k 2 even
A0
k1  k2 odd



1   j k1  k 2   exp  j k1  k 2   1 
A  Re exp 


2  
2N
 exp  j k1  k 2    1 



N

 




1   j k1  k 2  
2

 Re exp 




j

k

k
2  
2N


 exp
1
2  1



N

 

 j k1  k 2   
 j k1  k 2    
   exp 
  1 
 exp 
2N
N


 

  
  Re 

  exp  j k1  k 2    1   exp  j k1  k 2    1 

  

 
 
N
N

  

 








j

k

k
j

k

k

  k1  k 2   
1
2   exp 
1
2 
2i sin
 

 exp 


2N
2N
N




   Re 

 0
  Re 








j

k

k
j

k

k

k

k





1
2  exp
1
2  1
1
2   
1  exp 
 2  2 cos



N
N
N



 

 


Likewise, for expression B we receive:
k1  k 2
A N 2
k1  k 2
k1  k 2 even or odd
A0
Conclusions:
1. The group  k ,l  includes linearly independent N 2 matrixes; therefore it is an
orthonormal basis of the N  N space.
2. Representing a picture according to the  k ,l  group is equal to finding the
lattices (harmonies) composing the picture, and also determining their relative
weight.
4
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Section b:
The matrix a is based on the following picture:
In order to find the representation coefficients, a number of inner products between
the picture a and the members of the group  k ,l  should be found. That is how, for
instance, the DC component in the picture is described by:
aˆ0,0 
1
N
N 1 N 1
  am, n  158
m 0 n 0
And the weight of the  2,4 lattice is:
aˆ 2,4  a,  2,4 
N 1 N 1
  am, n  2,4 m, n  12
m 0 n 0
All the representation coefficients can be calculated in a similar manner, as in the
following matrix (the picture describes absolute numbers)
  158

 7
 166

 25
â  
 57
 14

 27
 2

318
 17
 90
7
 28
 15
 21
1
121  23  44  33 6 16 

7
12
14
 7  4  4
 53  8
12
20
2 1

 15  3  9
9
13
0 

 21  1
17
7 3 1 
4
6
7
3
7  3

 10  2
2
2
3 9 
 3  15  8
0
8
3 
As can be seen in picture a , there is a sharp horizontal transition from a bright area to
a dark area, which is expressed in the a 0,1 coefficient. Additionally, the dominant
lattices composing picture a are the horizontal lattices. This fact is consistent with the
horizontal tendency of the grey levels which exists in the picture itself.
5
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Section c:
We shall choose the N 2/2 most meaningful members in the matrix a , according to
absolute values. (The rest of the members are 0).
The pixel's average reconstruction error in MSE terms is:
MSEa 
1
N2
b  a  2.8
b is the flattened matrix. Meaning there is an average error of 2.8 grey levels in each
pixel. On the other hand, the reconstruction error for the DCT coefficients is:
MSEaˆ 
1
N2
c  a  0.4
c is the DCT inverse transform of the flattened matrix of a .
We received MSEa  MSEa , meaning most of the picture's energy is described by a
small number of DCT coefficients, unlike the grey levels representation. This is an
important characteristic of the transform, since it allows describing pictures by a
relatively small number of members and still maintaining a small representation error.
For example, the following picture was received by division to blocks of size NxN,
DCT calculation for each block, nullification of half of the coefficients and
performing an inverse transform:
The differences picture between the reconstructed picture and the original picture is
given below. As expected, the reconstruction error is expressed in the higher
frequencies, where there are edges (high frequencies) in the picture.
6
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Section d:
We shall find the angular frequency for 1D: the k parameter determines the lattice's
number of cycles over N pixels. The cycle distance M in pixels is received by:
 m  1 2  M k
N

 m  1 2k
N
 2 

Mk
N
 2 
 M 
2N
k
The viewing angle where N pixels are seen is:

N d
rad   N  d  180 deg
D
D

The number of cycles seen in this case is k .
2
Now, the angular frequency describes the number of cycles seen in one degree.
Therefore:
k 
In 2D:
k 2


 k D
360 N  d
cpd 
 D
k , l  k2  l2 
k 2  l 2 cpd 
360 N  d
Therefore the coefficients aˆ k ,l describe the contribution of the lattice  k ,l in the
picture's assembly.
The given MTF function is shown graphically:
7
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
The suitable angular frequencies:
  D
 360 N  d
 
 0

 3. 4
 6. 7

 10 .1

2
2
k l  
 k ,l  13 .4
 16 .8

 20 .1
 23 .5

3.4
4. 7
7.5
10 .6
13 .8
17 .1
20 .4
23 .7
6. 7
7.5
9.5
12 .1
15 .0
18 .0
21 .2
24 .4
10 .1
10 .6
12 .1
14 .2
16 .8
19 .5
22 .5
25 .5
0.38
0.37
0.34
0.29
0.24
0.19
0.15
0.11
0.26 

0.26 
0.24 

0.21 

0.17 
0.14 

0.11 
0.08 
13 .4
13 .8
15 .0
16 .8
19 .0
2135
24 .2
27 .0
16 .8
17 .1
18 .0
19 .5
21 .5
23 .7
26 .2
28 .8
20 .1
20 .4
21 .2
22 .5
24 .2
26 .2
28 .4
30 .9
23 .5 

23 .7 
24 .4 

25 .5 

27 .0 
28 .8 

30 .9 
33 .2 
And the matching weight matrix:
 0.42

 0.92
 0.99

 0.88
W 
 0.70
 0.53

 0.38
 0.26

0.92
0.98
0.98
0.85
0.68
0.51
0.37
0.26
0.99
0.98
0.91
0.78
0.62
0.47
0.34
0.24
0.88
0.85
0.78
0.66
0.53
0. 4
0.29
0.21
0.70
0.68
0.62
0.53
0.43
0.33
0.24
0.17
0.53
0.51
0.47
0.40
0.33
0.26
0.19
0.14
As expected, low angular frequency lattices are more significant in the picture
assembly than high frequency lattices. Therefore, if we change the DCT coefficient
value which matches the high frequency (by quantifying for example), we still won't
damage the picture's quality significantly.
Section e:
Since the picture's quality is determined by the human viewer, the MTF function can
be used to quantify the DCT coefficients themselves. q is defined as the index for the
quantifying process (a large index indicates a large quantifying step). The weighted
quantifying steps are:
Qk ,l 
q
Wk ,l
Meaning there is an inverted ratio: the smaller the lattice's weight is, the larger the
quantifying step can be. The quantization is described by:
 aˆ k , l
aˆ k , l 
 round
 Qk , l





In this manner, by determining quantization coefficients for each lattice, every DCT
coefficient can be quantified separately thus achieving minimal damage to the
picture's quality as seen in the eyes of the human viewer.
8
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
In conclusion:
The DCT transform of pictures has a few characteristics:
 The representation coefficients are real.
 The representation is done according to an orthonormal basis.
 The DCT transform is not the real part of a discrete Fourier transform (DFT).
Instead, it describes a Fourier transform of a symmetrical signal received by
replication of the original signal.
 The transform can be used effectively (similarly to the FFT algorithm)
 Energy concentration: most of the picture's energy is described by a small
number of coefficients.
 Correlation reduction: grey levels values of adjacent pixels are of high
correlation. On the other hand, the DCT coefficients are not. This
characteristic is very important in the encoding of the coefficients' values
(entropy encoding- will not be covered in this tutorial).
 The DCT coefficients can be linked to the MTF function of the human visual
system.
The approach described in this question is a fundamental description of the JPEG
algorithm (Joint Photographic Experts Group). In this algorithm the picture is divided
to 8x8 blocks and the DCT transform is preformed separately on each block. The
DCT coefficients are quantified in a similar manner to the weight matrix W, and only
a small part is sent to the receiver. The process causes lost of information (LOSSY)
since not all the DCT coefficients are sent to the receiver, but the human viewer
receives a picture with minor distortion.
9
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Question 2:
A TV transmitter produces the following signal: r  t   D cos 0t   E sin 0t   C ,
used to describe colored information.
a. Express C, E, D using R, G, B (the correction coefficients connecting between
R  Y , B  Y and V ,U can be omitted), assuming this is a normal colored
broadcast.
b. Due to a mishap the transmitted signal is rˆ  t   E cos 0t   D sin 0t   C
(meaning E and D are exchanged). The signals which appear in the picture below
are the R, G, B components of the camera that feeds the impaired broadcast.
Calculate the R, G, B output received in a normal receiver for each time. What
color in the 3D color space (a cube with R, G, B axis) will the received color
resemble to (proximity to colors in the cub's corners)?
10
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Solution:
a.
r  t   D cos 0t   E sin 0t   C
A normal broadcast is according to:
r  t   y   cos 0t   
  u 2  v2
;
v
 
  tan 1  
u
Using trigonometric identities:
r  t   y   cos 0t  cos    sin 0t  sin   

Cy
D   cos    u  R  y
E   sin    v  B  y
In conclusion:
C  y  0.11B  0.59G  0.3R
D  R  y  0.11B  0.59G  0.7 R
E  B  y  0.89 B  0.59G  0.3R
b. Due to the mishap the transmitted signal is rˆ  t   E cos 0t   D sin 0t   C
The values the receiver "comprehends" are marked with
:
yˆ  C
uˆ  E  Rˆ  yˆ 
Rˆ  uˆ  yˆ  E  C  B
vˆ  D  Bˆ  yˆ 
Bˆ  yˆ  vˆ  D  C  R
1
Gˆ 
yˆ  0.11Bˆ  0.3Rˆ  G  0.322  R  B 
0.59


In conclusion:
Rˆ  B
Gˆ  G  0.322  R  B 
Bˆ  R
11
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Graphically:
A summarizing table:
Time
0t T
T  t  2T
2T  t  3T
3T  t  4T
4T  t  5T
Transmitted
Red
Blue
Green
Cyan
White
12
Received
Blue
Red
Green
Yellow
White
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Question 3:
A Hilbert space H  L2 0,1 is given with the standard inner product:
1
f , g   f  t  g   t  dt ,
f ,g H
0
Given the following vectors:
1  t   t ,
2  t   t 2
a. Is 1 , 2  an orthogonal group?
b. What is the biorthonormal group of 1 , 2  ?
Solution:
a. We demand 1 , 2  0
1
1 , 2   t  t 2 dt 
0
1
0
4

Not orthogonal
b. A general approach for finding the biorthonormal group using a Gram matrix:
Given a sequence n   H , a biorthonormal sequence  n   H that fulfils the
requirement n , m   mn should be found in the following manner:
1. Finding the Gram matrix, defined by Gij   j , i .
2. Finding the inverse Gram matrix Q  G 1 .
3. The biorthonormal group is found by n   Qmn m .
m
Back to our question- finding the Gram matrix:
  ,
2 , 1 
G      1 1
,
2 , 2 
 1 , 2
Calculating the inner products:
  1 , 2 
1
3
1 , 1  ,
2 , 2 
1
5
The biorthonormal group is found according to   G 1    G 1 ,
therefore:
1 1
3 4
 48 60 
G 
G 1  
 

1 1
 60 80 


4 5
 1  481  602
    G 1


 2  601  802
13
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Question 4: Alternative solution - tutorial 11, using the Gram method:
1
2
We need to find the biorthonormal group for  f mn
, f mn
 , given:
1

 f mn  g  x  mD  cos  nWx  , 0  n  ,
 2

 f mn  g  x  mD  sin  nWx  , 1  n  ,
  m  
  m  
Solution:
In order to find the Gram matrix, first we calculate the inner products:
1
mn
1
mn
1
mn
2
mn
f ,f
f ,f
 f ,f
2
mn



2
mn

 m 12  D
 m 12  D

 m 12 D
 m 12  D

 m 12 D
 m 12
cos 2  nWx  dx 
D
2
cos  nWx  cos  nWx  0  dx
1
1
D
cos  2nWx  0   cos 0   dx  D cos 0  
2
2
4
D
The rest of the inner products are equal to zero due to the orthogonality of the cosine
and the windows' lack of overlap. Since there is infinite number of functions in the
1
2
group  f mn
, f mn
 , the Gram matrix is of infinite size, but has a separable nature.
A group of sub-matrixes Gmn can be defined for m  Z, n  0 :
Gmn
1
1
 f mn
, f mn

1
 f mn
, f mn2

The inversed matrixes: Qmn   G 1 
mn
1
 D 2 1
f mn2 , f mn
 

2
2 
f mn , f mn  4  1 2 

4  2 1


3D  1 2 
The biorthonormal group is given by:

1
mn
2
, mn
   fmn1 , fmn2   G 1 
mn
1
  f mn
, f mn2  
3  2 1


4 D  1 2 
Therefore:
3
 2 cos  nWx   cos  nWx  0   g  x  mD 
4D 

4  3
1

 cos  nWx   sin  nWx   g  x  mD 
2
3D  2

4

4




sin  nWx   g  x  mD  
cos  nWx  0   g  x  mD 
3
2
3D
3D


4


2

cos  nWx   g  x  mD 
Likewise:  mn
2
3D

1
 mn

The only case left to be handled is when m  Z, n  0 :
1
Gm 0  f m10 , f m10  D 
 m1 0  g  x  mD 
D
14
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Question 5 – Gabor functions:
Responses from three different cells in the visual system, r1 , r2 , r3 , were measured in a
physiological experiment. For simplicity we shall assume 1D light excitation, as a
function of x alone.
The following results were received ( B0  B1 ):
Excitation E1  x   B0  B1 sin  210 x  caused a reaction only from cell r1 .
Excitation E2  x   B0  B1 sin  2 20 x  caused a reaction only from cell r2 .
Excitation E3  x   B0  B1 cos  2 10 x  caused a reaction only from cell r3 .
It is assumed that the cells respond to the presented signal according to the real or
imaginary part of an inner product with Gabor functions f mn  x  :






ri  Re   E  x  f mn  x  dx  or ri  Im   E  x  f mn
 x  dx  , where B0 is omitted prior




to performing the inner product.
The Gabor functions in this question are of form f mn  x   g  x  mD  e jnWx , where
W  2 , D  1, and g  x  is a normalized square envelope with width D, meaning
g  x   1 when x  0.5 , otherwise g  x   0 .
a. According to the results, characterize the specific Gabor function which is related
to each of the cells ( m and/or n ) as much as you can, and specify which part (real
or imaginary) does the cell react to.
b. Give an example to an excitation which will cause a reaction from all three cells.
Is there a signal that can excite only two of the cells without affecting the third
one? If there is, give an example. If not, explain why.
c. A light point approximated by E4  x   B1  x  vt  is moved across the X axis.
Given v  1 is the movement velocity, the results are:
r1 reacts only in the time frame 2.5  t  1.5 .
r2 reacts only in the time frame 4.5  t  5.5 .
r3 reacts only in the time frame 4.5  t  5.5 .
Complete the information known regarding the identification of the Gabor
functions involved.
d. Sketch the reaction of cell r2 as a function of the time t , as a reaction to the signal
in section c. What is the name of the reaction in physiology you received?
e. Qualitatively describe the range of sensitivity of the three cells in the spacefrequency domain.
15
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
Solution:
a. The excitations presented using the given Gabor functions:
E1  x  


m 
 B1
g  x  m  B1 sin  2 10 x   B1

 g  x  m  sin  2 10 x  
m 
 e j 2 10 x  e j 2 10 x 
g
x

m





2j
m 



Therefore, in case of an excitation E1  x  , the cells which will react are those with
n  10 , imaginary part:
r1  Im am,10 
Likewise for E2  x  , E3  x  :
r2  Im am,20 
r3  Re am,10 
Since the excitations are periodical in the space domain, m cannot be determined!
b. Since the Gabor representation is linear, the reaction for the sum of excitations
will be the sum of reactions, meaning the excitation E  E1  E2  E3 will cause a
response of all three cells.
c. There is an excitation which will cause a reaction in r1 , r2 without affecting r3 .
According to the location of the light point in the time frames where each cell
reacted, m can now be determined:

Re  
Re  jnwvt


  B1  x  vt  f mn  x  dx   B1 g  vt  mD  e
Im 
Im

The reaction is different from zero only for t  m  1 , meaning:
2
m 1 t  m 1 .
2
2
Therefore:
r1 :
m  2
ri 
r2 :
m5
r3 :
m0
d.
r2  t 


 Im   B1  x  t  g  x  5  dx 
 

4.5  t  5.5
 B sin  2  20t  ,

else
0,
16
Visual and Auditory Systems
Tutorial 12
The Technion - Israel Institute of Technology
Electrical engineering faculty
20 sine cycles in uniform amplitude.
This is actually the Receptive field of cell r2 !
e. In the space-frequency domain: in the space axis, the function is actually limited
to bands in width D. The envelop of every band is uniform in the x direction, and
sine-like in the frequency axis  direction. Every decision of effective limitation
in the frequency domain (for example, 3dB ) will result in rectangular ranges:
Last update – January 2011
17
Download