Lecture08_Saliency

advertisement
Presented to : Prof. Hagit Hel-Or
Presented by: Avner Gidron
SALIENCY – DEFINITION
Saliency is defined as the most Prominent part
of the picture. In the last lecture Reem has
defined it as a part that takes at least one half of
the pixels in the picture. We’ll see that it is not
always the case, and Saliency has more than
one definition.
SALIENCY – DEFINITION
What is salient here?
SALIENCY – DEFINITION
Answer:
SALIENCY – DEFINITION
Here we can see that although the grass has more
Variance in color and texture the horse is the salient part.
SALIENCY – DEFINITION
Image can have more than one salient area, and
As a result areas that are more salient than others:
Salient areas:
Also salient,
but less.
SALIENCY – DEFINITION
Our objective – saliency map:

Sometimes all you need are a few words of encouragement.
How would you divide this
picture to segments?
A possible answer:
Two segments:
 The swimmer
 The background
Motivation - application
Image mosaicking: the salient details are preserved,
with the use of smaller building blocks.
Motivation - application
Painterly rendering – the fine details of the dominant
objects are maintained, abstracting the background
input
Painterly rendering
So, what are we going to see today?
 Explanation on Saliency in human eyes.

Automatic detecting single objects (Local).

Automatic detecting fixation points (Global).
 Global + Local approach.
Saliency in human eyes
Saliency in human eyes
Our eyes detect Saliency by:
First, the parallel, fast, but simple pre-attentive
Will be attracted
process, attracted to:
 Movement.
 High contrast .
 Intensity.
here
Saliency in human eyes
Then, the serial, slow but complex attention process,
that takes the points found in the first stage and
chooses which one to focus on while
detecting new information.
Saliency in human eyes
Slow attention process – example:
Firs focus
here:
And then
notice the
cat and
Baby.
Saliency in human eyes
Example for saliency map by eye tracking:

Detecting single objects
One approach to saliency is to consider saliency
as a single object prominent in the image
An Algorithm using this approach is the
Spectral Residual Approach
Spectral Residual Approach
Try to remember from IP lessons.
What did we say that image Consists of?
That’s right!!!
Frequencies
Spectral Residual Approach (1)
Terns out, that if we will take the average frequency
domain of many natural images, it will look like this:
Spectral Residual Approach (2)
Based on this notion, if we take the average
frequency domain and subtract it from a specific
Image frequency domain we will get Spectral Residual
Spectral Residual Approach
The log spec. 𝓁 of Image is defined in
matlab as:
ImageTransform = fft2(Image);
logSpec = log(1+ abs(ImageTransform));
Spectral Residual Approach - example
 F 


Spectral Residual Approach
ℎ𝑛 will be defined as a blurring matrix sized 𝑛⨉𝑛:
hn
f
1

1 1
 2
n 

1
1
1
1
1

1



1
Spectral Residual Approach
Generally one takes average over many images to get
the average spec but because we have only one image
We can convolute it with ℎ𝑛 to get an approximation.
Then we can get:
sp e c tra l re sid u a l 
  hn *

Spectral Residual Approach
At this stage, we’ll perform inverse fft and go back to
The space domain. In matlab:
SaliencyImage = ifft2(ImageSpecResidual);
Spectral Residual Approach
And we will take a threshold to determine the
Object map:
1 if spectral resadual  threshold
 
 0 otherw ise
The saliency map:
Detecting fixation points
Another approach is to detect points in the
image where the human eye would be fixated on.
Not like spectral residual approach, which finds a
single point, this approach may find more than one
point.
One algorithm that uses this approach is the one
based on Information Maximization.
Information Maximization
Before we start, let’s define a few things
Self information:
For a probabilistic event, with a probability of p(x),
the self information is defined as:
 1 
log 
  log  p  x  

 p x 


Information Maximization
An Attribute of self information is that the smaller the
probability the larger the self information
For example:
p  X 1   0 .5  0.2 5  P  X 2 
But in self information:
 log  p  X 1   
 log (0.5)  ~ 0 .3 
0.6  ~  log(0.25)   log  p  X 2  
Information Maximization
Another thing we’ll explain is what does Independent
Component Analysis (ICA) Algorithm.
Given 𝑥 = (𝑥1 , 𝑥2 , … , 𝑥𝑚 )𝑇 a random vector
representing the data and s = (𝑠1 , 𝑠2 , … , 𝑠𝑚 )𝑇 a
random vector representing the components, the task
Is to transform the observed data 𝑥, using a linear
Static transformation 𝑊as 𝑠 = 𝑊𝑥 into maximally
independent components 𝑠.
Information Maximization
ICA numeric example:
1

0

0

0
  w 11
1 

 w 21

0
s
w 12
w 22
W
1
w 13  
0
w 23  
3
2
5
0
1

1

1 
x
We can see that 𝑠 is independent, and we would like
to find 𝑊.
Information Maximization
The answer:
0 
 
1 


0   

1

0

0

s
5

4
1
2
4
3
7
7
W
3 1


4 
0
1 
3
7
2
5
0
x
1

1

1 
Information Maximization
And in signals:

IC A

2 
Information Maximization – ICA
vs PCA
PCA, Principal Components Analysis- a statistic method
for finding a low dim. Representation for a large
dimensional data.
* Fourier basis are PCA components of natural images
Information Maximization – ICA
vs PCA
The different between them is that PCA find his
Components one after the other, in a greedy way, finding
the largest component each time, while paying attention
to ortogonalty. the ICA works in parallel finding all the
components at once, while paying attention to
independency.
Information Maximization – ICA
vs PCA
PCA
ICA
Information Maximization – max info
algorithm
We start with a collection of 360,000 Random patches
and activate ICA on them, to get A which is a set of Basis
Function.
Information Maximization – max info
algorithm
Now, we have the basis function that “created” the
image, and we would like to know what are the
coefficients of each basis function per pixel. We
take the pseudoinverse of A, and multiply it with
the image:
coefficients of the basis functions  pseudoinverse  A   im age
Information Maximization – max info
algorithm
The result of the unmixing is a set of 𝑁 coefficients.
For pixel at location (𝑗,𝑘) denote the i‘th coefficient 𝑤𝑖,𝑗,𝑘 ,
where his value is 𝜈𝑖,𝑗,𝑘 :
In one dim:
w   w 1 , w 2 , ... w N

w 1   1 , w 2   2 , ... w N   N
Information Maximization – max info
algorithm
For each pixel at the location 𝑗, 𝑘, we denote the
probability that 𝑤𝑖,𝑗,𝑘 = 𝑣𝑖,𝑗,𝑘 by 𝑝(𝑤𝑖,𝑗,𝑘 ).
𝑝(𝑤𝑖,𝑗,𝑘 ) evaluates how “likely” the coefficient values at
pixel 𝑗, 𝑘 are, compered to the neighboring pixel
coefficients.
We compute first the likelihood of each coefficient of 𝑗, 𝑘
separately.
Information Maximization – max info
algorithm
Similarity of the
coefficients
A little bit of math:
distance of s,t to j,k.
p  w i , j ,k  
1

2

  i , j , k  i , s , t

  s, t  e
2

2
2
 s , t
This Gaussian measures how “stable” are the coefficients
where 𝛹 is pixel neighborhood, and 𝜔 𝑠, 𝑡 describes the
distance of s,t to j,k.
Information Maximization – max info
algorithm
We can see that for pixel j,k
its coefficients are different Pixel j,k
from its surround. That’s
Pixel m,l
Why (𝑣𝑖,𝑗,𝑘 − 𝑣𝑖,𝑠,𝑡 )2 is
big and the prob. is low.
On the contrary for pixel m,l, its coefficients are similar to
The ones in its surrounding and that’s way this prob. Is
high
p  w i , j ,k  
1

2

  i , j , k  i , s , t

 s , t 
  s, t  e
2
2

2
Information Maximization – max info
algorithm
after computing the likelihood of each coefficient of 𝑗,𝑘
separately, we denote –
p  w 1, j , k  v 1, j , k  w 2 , j , k  v 2 , j , k  ...  w N , j , k  v N , j , k

as:
p  w 1, j , k  v1, j , k   p  w 2 , j , k  v 2 , j , k   ...  p  w N , j , k  v N , j , k

Information Maximization
– max info algorithm
The more similar the pixel coefficients are to it’s
neighbor‘s coefficients the lower the prob. And thus
The smaller the self information, and vice versa.
Information Maximization
For example in the follow image we can see that the
white area will have little “stability” in the coefficients,
and therefore small P(X) and so it will have large S.I.
We can also notice that that fact go hand in hand with
This area being prominent.
Large self
information
Information Maximization – max info
algorithm
Now, we can take the values of the self
information and turn it in to a saliency map!!
And we get:
Information Maximization – max info
algorithm
And the results are:
original
Information max.
Human eye
Global + Local approach
This approach uses the information from both the
Pixel close surroundings and the information in the
Entire picture, because sometimes one of them alone
Isn’t enough.
input
Local
Global
Context aware saliency
One algorithm that do so, uses a new kind of
definition for saliency, were the salient part in the
picture is not only a single object but it’s
surroundings too.
This definition is named Context aware saliency
What do you see?
And now?
Context aware saliency algorithm
(1) Local low-level considerations,
including factors such as contrast and color
(2) Global considerations, which suppress frequently
Occurring features
(3) Visual organization rules, which state that visual
Forms may possess one or several centers of attention.
(4) High- level factors, such as priors on the salient
Object location.
A little math reminder:
The Euclidean distance between two vectors X,Y
is defined as:
d  X ,Y
  || X
n
 Y ||
 x
i 1
i
 yi 
2
Context aware saliency algorithm
The basic idea is to determine the similarity of a pixels
sized r patch, to other patches’ both locally and globally
𝒅𝒄𝒐𝒍𝒐𝒓 (𝒑𝒊 ,𝒑𝒋 ) as the Euclidean distance between the
vectorized patches 𝑝𝑖 and 𝑝𝑗 in CIE L*a*b color
space, normalized to [0,1]
Context aware saliency algorithm
CIE values of
)Y( (3,4,5)
CIE values of
)X( (5,4,3)
d  X ,Y

3
 || X  Y ||

i 1
 xi  yi  
2
404 
8
Context aware saliency algorithm
CIE values of
)Y( (60,30,90)
CIE values of
)X( (5,4,3)
d  X ,Y

3
 || X  Y ||

i 1
 xi  yi  
2
3025  676  7569 
11270
Context aware saliency algorithm
Now we can see that pixel i is considered to be salient
when 𝑑𝑐𝑜𝑙𝑜𝑟 (𝑝𝑖 ,𝑝𝑗 ) is high for all j.
Context aware saliency algorithm
Actually, we don’t really need to check each patch
to all other patches, but only to his K(=64) most
similar patches:
 q k k  1
K
How to find the K most similar patches? We’ll go
back to it
Context aware saliency algorithm
According to principle 3, which state that visual
forms may possess one or several centers of attention
we define 𝑑𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑝𝑖 ,𝑝𝑗 ) as the Euclidean distance
between the positions of 𝑝𝑖 ,𝑝𝑗 normalized to the
image dimension.
Context aware saliency algorithm
𝑑𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 is introduced because as we can notice,
background pixels will have similar patches at
multiple scales (pixel i,j). That’s in contrast to
salient pixels (pixel l).
Pixel j
Pixel l
Pixel i
Context aware saliency algorithm
Now we can define dissimilarity as:
d  pi , qk  
d color  p i , q k 
1  3  d position  p i , q k 
1 
Context aware saliency algorithm
Now, because we know that pixel i is salient if it
differs from it’s K most similar patches, we can
define single scale saliency value:
K
 1
r
r
r 
S i  1  exp  
d  pi , qk    2 

 K k 1

The equation is summing all the dissimilarity between
patch 𝑝𝑖 at size r to it’s k most simeller patches,
normalized by K.
Context aware saliency algorithm
We can see that the larger the dissimilarity between
the patches the larger the saliency is.
 1
S  1  exp  
 K
r
i

 d  p ,q 
k 1

K
r
i
r
k
2
Context aware saliency algorithm
A patches size doesn't have to be all in the same
sizes, we can have multiple sizes of patches.
Size r
𝒓
Size 𝟐
𝒓
Size𝟒
Context aware saliency algorithm
So for patch 𝑝𝑖 at scale r we consider as candidates patches
𝑟 𝑟
Who’s scales are 𝑅𝑞 = {𝑟, , }. now we'll change equation
2 4
(2) to fit:

 1
r
S i  1  exp  
 k

k
dp
k 1
r
i
,q
rt
k




3
rt  R q
Context aware saliency algorithm
And we define the temporary saliency of pixel i as:
Si 
1
M
For: R q
used
  r1 , ..., rM 
 S 4
r
i
r R
where M is the number of scales
Context aware saliency algorithm
Center of attention - center of attention are the
pixels who has the strongest saliency. All their
surrounding will be salient too. We find them by
preforming a threshold on the salient
pixels
For example:
Saliency map: Centers of attention:
Input:
Context aware saliency algorithm
One more thing we want to consider is the salient
pixels surroundings, because as we saw before it may
be important to us.
𝑑𝑓𝑜𝑐𝑖 𝑖 − The Euclidean distance between pixel i and
the closest center of attention.
Context aware saliency algorithm
Also we define 𝑑𝑟𝑎𝑡𝑖𝑜 as:
d ratio 
m ax d
foci
( j)
m ax im age dim
Context aware saliency algorithm
Drop off – drop off is a parameter that states the rate
which pixels loss their saliency in a relation to 𝑑𝑓𝑜𝑐𝑖 𝑖 .
That means that if drop off is big, a pixel i will need to be
closer to a center of attention to have the 𝑑𝑓𝑜𝑐𝑖 𝑖
Saliency effect and vice versa.
Small drop-off:
Large drop-off:
Context aware saliency algorithm
Also we define 𝛾 𝑖 as:
Const. that controls
the drop-off rate
  i   log( d foci  c drop  off )
𝛾 𝑖 actualy express the proximity of pixel i to center of
attention.
Context aware saliency algorithm
Also we define 𝛿 𝑖 as:
  i   d ra tio 
 i 
m ax i   i 
To understand it, let’s simplify it:
m ax d
Constant for all i‘s
foci
m ax IM A G E dim

log  d

foci
m ax log  d
 c drop  off
foci

 c drop  off
That’s why the bigger 𝑑𝑓𝑜𝑐𝑖 (𝑖) is, the smaller 𝛿(𝑖)

Context aware saliency algorithm
Don’t panic!! it’s just their way to express the distance of
pixel i to the nearest center of attention, In relation to the
entire picture:
R i  
 i 
m ax i   i 
Context aware saliency algorithm
And now the temporary saliency is:
Si  Si  R i 
Context aware saliency algorithm
Now, if you’ll think about how you usually take
pictures, You will notice that in most cases the
prominent object :Is in the center of your image
Context aware saliency algorithm
Using that assumption we can give a pixel priority based
On its closeness to the middle.
Let 𝐺 𝜎𝑥 ,𝜎𝑦 be a two dim. Gaussian, origin from the
center, where 𝜎𝑥 =
#𝑐𝑜𝑙𝑜𝑚𝑠
6
So the final saliency is:
and 𝜎𝑦 =
#𝑟𝑜𝑤𝑠
.
6
Si  Si  Gi
Context aware saliency algorithm
How do we find the K closest patches to a given
patch???
Instead of looking at the real size image, lets build
a pyramid
Context aware saliency algorithm
The idea, is to search in a small version of the
image, and then by it focus our search in the real
image.
Context aware saliency algorithm
Let’s see some results and rest a little from all that math:
A few more Saliency uses:
Puzzle-like collage:
A few more Saliency uses:
Movie Time
REFERENCES
Saliency detection: A spectral residual approach.
X. Hou and L. Zhang.In CVPR, pages 1{8}, 2007
Saliency based on information maximization.
N. Bruce and J. Tsotsos.
In NIPS, volume 18, page 155, 2006.
REFERENCES
S. Goferman, L. Zelnik-Manor, and A. Tal
"Context-Aware Saliency Detection",
IEEE Trans. on Pattern Analysis and Machine
Intelligence (PAMI), 34(10): 1915--1926, Oct. 2012.
Saliency For Image Manipulation",
R. Margolin, L. Zelnik-Manor, and A. Tal
Computer Graphics International (CGI) 2012.
Download