Lecture08_Saliency

Presented to : Prof. Hagit Hel-Or Presented by: Avner Gidron SALIENCY – DEFINITION Saliency is defined as the most Prominent part of the picture. In the last lecture Reem has defined it as a part that takes at least one half of the pixels in the picture. We’ll see that it is not always the case, and Saliency has more than one definition. SALIENCY – DEFINITION What is salient here? SALIENCY – DEFINITION Answer: SALIENCY – DEFINITION Here we can see that although the grass has more Variance in color and texture the horse is the salient part. SALIENCY – DEFINITION Image can have more than one salient area, and As a result areas that are more salient than others: Salient areas: Also salient, but less. SALIENCY – DEFINITION Our objective – saliency map:  Sometimes all you need are a few words of encouragement. How would you divide this picture to segments? A possible answer: Two segments:  The swimmer  The background Motivation - application Image mosaicking: the salient details are preserved, with the use of smaller building blocks. Motivation - application Painterly rendering – the fine details of the dominant objects are maintained, abstracting the background input Painterly rendering So, what are we going to see today?  Explanation on Saliency in human eyes.  Automatic detecting single objects (Local).  Automatic detecting fixation points (Global).  Global + Local approach. Saliency in human eyes Saliency in human eyes Our eyes detect Saliency by: First, the parallel, fast, but simple pre-attentive Will be attracted process, attracted to:  Movement.  High contrast .  Intensity. here Saliency in human eyes Then, the serial, slow but complex attention process, that takes the points found in the first stage and chooses which one to focus on while detecting new information. Saliency in human eyes Slow attention process – example: Firs focus here: And then notice the cat and Baby. Saliency in human eyes Example for saliency map by eye tracking:  Detecting single objects One approach to saliency is to consider saliency as a single object prominent in the image An Algorithm using this approach is the Spectral Residual Approach Spectral Residual Approach Try to remember from IP lessons. What did we say that image Consists of? That’s right!!! Frequencies Spectral Residual Approach (1) Terns out, that if we will take the average frequency domain of many natural images, it will look like this: Spectral Residual Approach (2) Based on this notion, if we take the average frequency domain and subtract it from a specific Image frequency domain we will get Spectral Residual Spectral Residual Approach The log spec. 𝓁 of Image is defined in matlab as: ImageTransform = fft2(Image); logSpec = log(1+ abs(ImageTransform)); Spectral Residual Approach - example  F    Spectral Residual Approach ℎ𝑛 will be defined as a blurring matrix sized 𝑛⨉𝑛: hn f 1  1 1  2 n   1 1 1 1 1  1    1 Spectral Residual Approach Generally one takes average over many images to get the average spec but because we have only one image We can convolute it with ℎ𝑛 to get an approximation. Then we can get: sp e c tra l re sid u a l    hn *  Spectral Residual Approach At this stage, we’ll perform inverse fft and go back to The space domain. In matlab: SaliencyImage = ifft2(ImageSpecResidual); Spectral Residual Approach And we will take a threshold to determine the Object map: 1 if spectral resadual  threshold    0 otherw ise The saliency map: Detecting fixation points Another approach is to detect points in the image where the human eye would be fixated on. Not like spectral residual approach, which finds a single point, this approach may find more than one point. One algorithm that uses this approach is the one based on Information Maximization. Information Maximization Before we start, let’s define a few things Self information: For a probabilistic event, with a probability of p(x), the self information is defined as:  1  log    log  p  x     p x    Information Maximization An Attribute of self information is that the smaller the probability the larger the self information For example: p  X 1   0 .5  0.2 5  P  X 2  But in self information:  log  p  X 1     log (0.5)  ~ 0 .3  0.6  ~  log(0.25)   log  p  X 2   Information Maximization Another thing we’ll explain is what does Independent Component Analysis (ICA) Algorithm. Given 𝑥 = (𝑥1 , 𝑥2 , … , 𝑥𝑚 )𝑇 a random vector representing the data and s = (𝑠1 , 𝑠2 , … , 𝑠𝑚 )𝑇 a random vector representing the components, the task Is to transform the observed data 𝑥, using a linear Static transformation 𝑊as 𝑠 = 𝑊𝑥 into maximally independent components 𝑠. Information Maximization ICA numeric example: 1  0  0  0   w 11 1    w 21  0 s w 12 w 22 W 1 w 13   0 w 23   3 2 5 0 1  1  1  x We can see that 𝑠 is independent, and we would like to find 𝑊. Information Maximization The answer: 0    1    0     1  0  0  s 5  4 1 2 4 3 7 7 W 3 1   4  0 1  3 7 2 5 0 x 1  1  1  Information Maximization And in signals:  IC A  2  Information Maximization – ICA vs PCA PCA, Principal Components Analysis- a statistic method for finding a low dim. Representation for a large dimensional data. * Fourier basis are PCA components of natural images Information Maximization – ICA vs PCA The different between them is that PCA find his Components one after the other, in a greedy way, finding the largest component each time, while paying attention to ortogonalty. the ICA works in parallel finding all the components at once, while paying attention to independency. Information Maximization – ICA vs PCA PCA ICA Information Maximization – max info algorithm We start with a collection of 360,000 Random patches and activate ICA on them, to get A which is a set of Basis Function. Information Maximization – max info algorithm Now, we have the basis function that “created” the image, and we would like to know what are the coefficients of each basis function per pixel. We take the pseudoinverse of A, and multiply it with the image: coefficients of the basis functions  pseudoinverse  A   im age Information Maximization – max info algorithm The result of the unmixing is a set of 𝑁 coefficients. For pixel at location (𝑗,𝑘) denote the i‘th coefficient 𝑤𝑖,𝑗,𝑘 , where his value is 𝜈𝑖,𝑗,𝑘 : In one dim: w   w 1 , w 2 , ... w N  w 1   1 , w 2   2 , ... w N   N Information Maximization – max info algorithm For each pixel at the location 𝑗, 𝑘, we denote the probability that 𝑤𝑖,𝑗,𝑘 = 𝑣𝑖,𝑗,𝑘 by 𝑝(𝑤𝑖,𝑗,𝑘 ). 𝑝(𝑤𝑖,𝑗,𝑘 ) evaluates how “likely” the coefficient values at pixel 𝑗, 𝑘 are, compered to the neighboring pixel coefficients. We compute first the likelihood of each coefficient of 𝑗, 𝑘 separately. Information Maximization – max info algorithm Similarity of the coefficients A little bit of math: distance of s,t to j,k. p  w i , j ,k   1  2    i , j , k  i , s , t    s, t  e 2  2 2  s , t This Gaussian measures how “stable” are the coefficients where 𝛹 is pixel neighborhood, and 𝜔 𝑠, 𝑡 describes the distance of s,t to j,k. Information Maximization – max info algorithm We can see that for pixel j,k its coefficients are different Pixel j,k from its surround. That’s Pixel m,l Why (𝑣𝑖,𝑗,𝑘 − 𝑣𝑖,𝑠,𝑡 )2 is big and the prob. is low. On the contrary for pixel m,l, its coefficients are similar to The ones in its surrounding and that’s way this prob. Is high p  w i , j ,k   1  2    i , j , k  i , s , t   s , t    s, t  e 2 2  2 Information Maximization – max info algorithm after computing the likelihood of each coefficient of 𝑗,𝑘 separately, we denote – p  w 1, j , k  v 1, j , k  w 2 , j , k  v 2 , j , k  ...  w N , j , k  v N , j , k  as: p  w 1, j , k  v1, j , k   p  w 2 , j , k  v 2 , j , k   ...  p  w N , j , k  v N , j , k  Information Maximization – max info algorithm The more similar the pixel coefficients are to it’s neighbor‘s coefficients the lower the prob. And thus The smaller the self information, and vice versa. Information Maximization For example in the follow image we can see that the white area will have little “stability” in the coefficients, and therefore small P(X) and so it will have large S.I. We can also notice that that fact go hand in hand with This area being prominent. Large self information Information Maximization – max info algorithm Now, we can take the values of the self information and turn it in to a saliency map!! And we get: Information Maximization – max info algorithm And the results are: original Information max. Human eye Global + Local approach This approach uses the information from both the Pixel close surroundings and the information in the Entire picture, because sometimes one of them alone Isn’t enough. input Local Global Context aware saliency One algorithm that do so, uses a new kind of definition for saliency, were the salient part in the picture is not only a single object but it’s surroundings too. This definition is named Context aware saliency What do you see? And now? Context aware saliency algorithm (1) Local low-level considerations, including factors such as contrast and color (2) Global considerations, which suppress frequently Occurring features (3) Visual organization rules, which state that visual Forms may possess one or several centers of attention. (4) High- level factors, such as priors on the salient Object location. A little math reminder: The Euclidean distance between two vectors X,Y is defined as: d  X ,Y   || X n  Y ||  x i 1 i  yi  2 Context aware saliency algorithm The basic idea is to determine the similarity of a pixels sized r patch, to other patches’ both locally and globally 𝒅𝒄𝒐𝒍𝒐𝒓 (𝒑𝒊 ,𝒑𝒋 ) as the Euclidean distance between the vectorized patches 𝑝𝑖 and 𝑝𝑗 in CIE L*a*b color space, normalized to [0,1] Context aware saliency algorithm CIE values of )Y( (3,4,5) CIE values of )X( (5,4,3) d  X ,Y  3  || X  Y ||  i 1  xi  yi   2 404  8 Context aware saliency algorithm CIE values of )Y( (60,30,90) CIE values of )X( (5,4,3) d  X ,Y  3  || X  Y ||  i 1  xi  yi   2 3025  676  7569  11270 Context aware saliency algorithm Now we can see that pixel i is considered to be salient when 𝑑𝑐𝑜𝑙𝑜𝑟 (𝑝𝑖 ,𝑝𝑗 ) is high for all j. Context aware saliency algorithm Actually, we don’t really need to check each patch to all other patches, but only to his K(=64) most similar patches:  q k k  1 K How to find the K most similar patches? We’ll go back to it Context aware saliency algorithm According to principle 3, which state that visual forms may possess one or several centers of attention we define 𝑑𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑝𝑖 ,𝑝𝑗 ) as the Euclidean distance between the positions of 𝑝𝑖 ,𝑝𝑗 normalized to the image dimension. Context aware saliency algorithm 𝑑𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 is introduced because as we can notice, background pixels will have similar patches at multiple scales (pixel i,j). That’s in contrast to salient pixels (pixel l). Pixel j Pixel l Pixel i Context aware saliency algorithm Now we can define dissimilarity as: d  pi , qk   d color  p i , q k  1  3  d position  p i , q k  1  Context aware saliency algorithm Now, because we know that pixel i is salient if it differs from it’s K most similar patches, we can define single scale saliency value: K  1 r r r  S i  1  exp   d  pi , qk    2    K k 1  The equation is summing all the dissimilarity between patch 𝑝𝑖 at size r to it’s k most simeller patches, normalized by K. Context aware saliency algorithm We can see that the larger the dissimilarity between the patches the larger the saliency is.  1 S  1  exp    K r i   d  p ,q  k 1  K r i r k 2 Context aware saliency algorithm A patches size doesn't have to be all in the same sizes, we can have multiple sizes of patches. Size r 𝒓 Size 𝟐 𝒓 Size𝟒 Context aware saliency algorithm So for patch 𝑝𝑖 at scale r we consider as candidates patches 𝑟 𝑟 Who’s scales are 𝑅𝑞 = {𝑟, , }. now we'll change equation 2 4 (2) to fit:   1 r S i  1  exp    k  k dp k 1 r i ,q rt k     3 rt  R q Context aware saliency algorithm And we define the temporary saliency of pixel i as: Si  1 M For: R q used   r1 , ..., rM   S 4 r i r R where M is the number of scales Context aware saliency algorithm Center of attention - center of attention are the pixels who has the strongest saliency. All their surrounding will be salient too. We find them by preforming a threshold on the salient pixels For example: Saliency map: Centers of attention: Input: Context aware saliency algorithm One more thing we want to consider is the salient pixels surroundings, because as we saw before it may be important to us. 𝑑𝑓𝑜𝑐𝑖 𝑖 − The Euclidean distance between pixel i and the closest center of attention. Context aware saliency algorithm Also we define 𝑑𝑟𝑎𝑡𝑖𝑜 as: d ratio  m ax d foci ( j) m ax im age dim Context aware saliency algorithm Drop off – drop off is a parameter that states the rate which pixels loss their saliency in a relation to 𝑑𝑓𝑜𝑐𝑖 𝑖 . That means that if drop off is big, a pixel i will need to be closer to a center of attention to have the 𝑑𝑓𝑜𝑐𝑖 𝑖 Saliency effect and vice versa. Small drop-off: Large drop-off: Context aware saliency algorithm Also we define 𝛾 𝑖 as: Const. that controls the drop-off rate   i   log( d foci  c drop  off ) 𝛾 𝑖 actualy express the proximity of pixel i to center of attention. Context aware saliency algorithm Also we define 𝛿 𝑖 as:   i   d ra tio   i  m ax i   i  To understand it, let’s simplify it: m ax d Constant for all i‘s foci m ax IM A G E dim  log  d  foci m ax log  d  c drop  off foci   c drop  off That’s why the bigger 𝑑𝑓𝑜𝑐𝑖 (𝑖) is, the smaller 𝛿(𝑖)  Context aware saliency algorithm Don’t panic!! it’s just their way to express the distance of pixel i to the nearest center of attention, In relation to the entire picture: R i    i  m ax i   i  Context aware saliency algorithm And now the temporary saliency is: Si  Si  R i  Context aware saliency algorithm Now, if you’ll think about how you usually take pictures, You will notice that in most cases the prominent object :Is in the center of your image Context aware saliency algorithm Using that assumption we can give a pixel priority based On its closeness to the middle. Let 𝐺 𝜎𝑥 ,𝜎𝑦 be a two dim. Gaussian, origin from the center, where 𝜎𝑥 = #𝑐𝑜𝑙𝑜𝑚𝑠 6 So the final saliency is: and 𝜎𝑦 = #𝑟𝑜𝑤𝑠 . 6 Si  Si  Gi Context aware saliency algorithm How do we find the K closest patches to a given patch??? Instead of looking at the real size image, lets build a pyramid Context aware saliency algorithm The idea, is to search in a small version of the image, and then by it focus our search in the real image. Context aware saliency algorithm Let’s see some results and rest a little from all that math: A few more Saliency uses: Puzzle-like collage: A few more Saliency uses: Movie Time REFERENCES Saliency detection: A spectral residual approach. X. Hou and L. Zhang.In CVPR, pages 1{8}, 2007 Saliency based on information maximization. N. Bruce and J. Tsotsos. In NIPS, volume 18, page 155, 2006. REFERENCES S. Goferman, L. Zelnik-Manor, and A. Tal "Context-Aware Saliency Detection", IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 34(10): 1915--1926, Oct. 2012. Saliency For Image Manipulation", R. Margolin, L. Zelnik-Manor, and A. Tal Computer Graphics International (CGI) 2012.

Lecture08_Saliency

Related documents

Products

Support

Lecture08_Saliency

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib