Seminar on Image Similarity and Image Retrieval { Presentation by Feliks Beilis Background Object categorization and object class detection – How to find images from database with specific query ,for example any red car or any brown horse etc. Methods used : EMD –histograms ,EMD – signatures . Texture classification . Methods used : Gabor filters ,Patch match ,EMD . 1) 0.00 29020.jpg Left texture is the source 2) 0.11 29077.jpg 3) 0.19 157090.jpg 1) 0.00 29020.jpg 2) 0.06 29077.jpg 3) 0.09 29005.jpg 1) 0.00 29020.jpg 2) 8.16 29077.jpg 3) 12.23 29005.jpg 4) 0.21 197037.jpg 4) 0.10 96035.jpg 4) 12.64 29017.jpg 5) 0.21 81005.jpg 5) 0.10 1033.jpg 5) 13.82 20003.jpg 6) 0.21 29017.jpg 6) 0.10 25013.jpg 6) 14.52 53062.jpg 7) 0.22 197058.jpg 7) 0.10 20003.jpg 7) 14.70 29018.jpg 8) 0.22 77045.jpg 8) 0.11 140075.jpg 8) 14.78 29019.jpg Are these 3 textures the same as the source ? Image recognition - This means finding specific object ,for example a face of person . Methods used : SIFT, Color interest points . Points that describe this picture (SIFT) Image editing with patch matching algorithms – How to change image with existing data ,how reconstruct an image . Methods used : NNF – nearest neighbor field ,editing tools with constrains . Object categorization and object class detection In this section we will talk mostly on EMD – Earth Moving Distance ,but there also other methods for comparing Histograms . We choose to focus on EMD because it matches perceptual similarity for image retrieval better then other methods . In IP we talked about why EMD is better then other methods so this part of the proof I won’t explain thoroughly . Just a reminder : Minkovski distance Kullback Lieber X^2 statistics m = (h + k ) / 2 i i i Quadratic form distance Histogram vs Signature Signatures derived from Histograms and represented as {s j = (m j, w j )} m w j j - Cluster mean ,M – is d-dimensional vector of Bins. - Pixels that belong to that cluster J – number of corresponded Bin in histogram The EMD methods that described earlier and used on Histograms, now can be used on Signatures . More intuitive explanation next Histogram of Image A Histogram of Image B Signature of Image A Signature of Image B The Experiment Our database contains 20,000 images . In our first experiment we identified 75 images of red cars ,from this set we choose 10 “good” images ,in those images the background was green/grey . We preformed ten queries using different “good” car each time . For this experiment we used histograms with Coarse binning and Fine binning . Over 20,000 images the average Coarse binning left us with 15.3 non zero bins ,average Fine binning left us with 39 non zero bins . The EMD outperformed other methods and results with Signatures are much better then with Histograms . Middle – Coarse binning Bottom – fine binning The Experiment In our second experiment colors of the objects and background are pretty similar ,we took 157 of brown horses in green fields ,again 10 “good” images were chosen ,again for Coarse and Fine histograms . For Coarse binning EMD signatures outperformed others but Jefrey divergence and x^2 statistics outperformed EMD histograms . (This can be explained that the distance is computed between more distance bin centers and therefore less meaningful ) For Fine binning EMD outperformed but signatures outperformed all the rest . Middle – Coarse binning Bottom – fine binning Conclusion Emd has desirable properties for image retrieval ,Compared to other methods it has advantages in all parameters . As we saw Signatures have a better results in image retrieval . Texture classification We will focus on texture classification mostly using Gabor texture features . While color is purely point wise texture property ,texture involves notion of spatial extent ,a single point has no texture . If texture defined in the frequency domain the information of a texture is carried by a point and it’s neighbors . Short Background on Gabor Filtering Gabor filter is similar to Fourier filter but are limited to certain freq bands ,they do an excellent job in image or compaction . Gabor filters are defined by harmonic functions and modulated by Gaussian distributions . Transform Fourier of Gabor filter -> Texture representation After applying Gabor filters on image with different orientation at different scale ,we obtain an array of magnitudes ,these magnitudes describe energy content at different scale orientation of the image . The main purpose of texture based retrieval is to find images or regions with similar textures ,since this similarity is not rotation invariant ,similar textures with different direction may be missed out from retrieval or get a low rank . Example is on next page To solve this problem we suggested a simple circular shift . The orientation with total high energy will be called dominant ,then we will rotate other images to meet dominant image . Results Our database included 1000 images with different kind of texture ,it contained both natural and both texture images . In first retrieval experiment all the 15 similar textures retrieved in the first 18 images and only one image was irrelevant . Those results were conducted on color image database with 360 different images ,the same images retrieved within 25 images . How to imply EMD over textures As we saw before we can with Gabor filter represent texture, so represent it as 24 bins (4 for scale and 6 for orientation ) after we represented texture we can use it as Histograms or Signatures and other methods for similarity . Results Database - constructed 1744 texture patches . Using EMD we can find partial matches in textures ,the query was 20% texture and 80% don’t care ,16 patches were the same and followed them patches with partial original texture . origin We created 250 images database with 25 zebras ,then we cropped a block of zebra stripes pattern and asked for images with at least 20% of that pattern ,the best 8 matches are shown above . Block of cheetah pattern and asked for images with at least 10% of that pattern ,the best 12 matches are shown above . Conclusion In this chapter we talked about Gabor texture retrieval and mostly focused on rotations ,but this method can be extended to other methods and we saw Emd measurement used Gabor properties for texture retrieval ,the textures usually homogenous and correspond to different parts of images ,therefore image retrieval is very useful . Reminder Patch match can be achieved by SIFT algorithms or Histogram distance (As described earlier) . Image recognition SIFT- scale invariant feature transform . What is SIFT ? SIFT is algorithm used to describe and detect local features in images ,we will mostly talk on Harris corner detector . Corners have long been considered as useful interest points and there for they were used in many different algorithms .Color also have great importance on matching images .In RGB color cube most interest points are found using just intensity (luminance returned light from bright objects) useful with studio photography or artificial images . However in natural images ,high contrast changes might have place and so the changes won’t be that noticeable using intensity based approach. Harris corner detector reminder Intuitive : Harris corner detection achieved by second derivative in axis X and Y meaning Convolving twice with (-1,0,1) and (-1,0,1)^t . More formal explanation for different corners : det(M) = trace(M) = RGB normalized RGB In RGB method the corners are spread all over the image and not concentrate on specific area . In normalized RGB we can see that the corners found around silhouette of the parrot but in dark areas it unstable as can see in the bottom of an image . Harris detector with other color spaces Quasi Invariant colors – derived from RGB color with special Equations (HSI, OCS, spherical color space) Can we improve corner detection ? Harris Detector with scale invariant As we will see scale have huge impact on corner detection ,so we will use “Fixed scale” to improve results. There some drawbacks with images that too large or too small . We will use function to set “Fixed scale” . E – cornerness measurement for each pixel (part of Harris algorithm) M – second moment matrix Convolution t – the amount of scale change The optimum rescale for images decided by experiments with Harris detector 1.2 < t < √2 . Now we can see the changes in Harris Detector after “Fixing scale” the parrot is highly prioritized . Some more improvement is coming ->next Colored Scale invariant Harris Corner Detection Now let add Color information in Scale decision . We will build function from 3 dimensional color to 1 dimensional data set and it will be combined in already known Scale Invariant function .When we combine this information we will get different definition of interest points . Working with quasi-invariant color space ,the interest points now free of shading ,illumination or specular changes so the lighting conditions don’t effect the image . Natural cluttered animals images have different lightning conditions and this method overcomes it as we will se it now . The background is structured with high illumination changes and Quasi invariant HSI found the exact scorpion image . Image retrieval For the retrieval experiment we will capture 1000 images ,for every image, 18 images will be taken with different rotations ,the result is database of 18000 images . As it can be seen Quasi Invariant color outperformed . Conclusion Using those methods that explained upper ,we saw that they are much better than luminance based methods . A color scale selection leads to better stability also it can be transformed into various color spaces and we can take advantage of this variable color properties . In retrieval scenarios our approach was much more stable ,which leads to higher retrieval rates . Image editing and reconstruction What is it Image editing ? As digital and computational photography have matured ,researchers developed methods for high level editing . Now we can resize an image with good likeness of the original image also we can erase unwanted portion of an image and automatic image completion will complete the data . Image reshuffling algorithms allow us to take a portion of an image and move it around so the reminder will resemble original image . These algorithms depend on user intervention to obtain best results because the user knows his expectation from modified image . NNF – nearest neighbor field Is an algorithm that finds in image A for most similar patch in image B . Our algorithm to be efficient relies on 3 keys : – searches in 2D space for possible patch offset, achieving greater speed and efficiency then standard Kd-tree structure search . – Our algorithm ignores natural structure in images by searching for each pixel in the patch , Improves efficiency . – random choice for patch would be a bad guess ,the bigger the patch the chances for correct offset improves . Example for Patch match Good estimate for match ,it doesn’t need to be perfect . Phases of the Algorithm propagation – searches for good matches of the neighbor patches . The blue cube propagating (b) above red and left to green and then (c) searches in neighborhoods with certain radius . The Algorithm The outcome of this Patch Match algorithm is offset map ,this map is 2D field with 2D vectors with the same dimensions as source image . Each vectors stores location of the currently best match vector known . 1.Initialization - is random except areas where we have initial info ,called constrains we talk about the later . 2.Propagation step 3.Random search 2&3 steps executed consecutive for each pixel. Propagation step – the natural correlation is exploited ,assume that red cube is best match ,now when we moving left from our target ,black cube ,we can try and use this offset as our patch guess ,there a good chance that this offset will be a match ,this is also done bellow the current patch . We will take the best match of these 3 patches (using patch match methods) ,in this matter we propagating bottom left . if we stopped -> go to Search step -> next Search step – We will use random unit vector and scale it with decreasing radius ,if this radius bellow certain point we stop ,this random vector is added to the best current offset and the patches are compared ,if it better , then the random one takes from that moment ,if not search step repeated with different random vector . Then this algorithm applied multiple times . The top image reconstructed using patches from the bottom image, after 5 iterations the image complete . Real world implementation Our algorithm is much more faster and uses much less memory then Kd-Tree . For 7x7 patch size we found our algorithm 20x to 100x times faster and uses about 20x less memory . For smaller patches we obtain smaller speedups . We also made GPU (8800 GTS video card) implementation for NNF that 7x times faster then CPU implementation. Editing tools Now we will talk on novel interactive editing tools enabled by our Algorithm. By modifying the search in various ways we can introduce local constraints on offsets to provide user control on synthesis process . We mostly will focus on : Video ….. http://gfx.cs.princeton.edu/pubs/Barnes_2009_PAR/ Search Space Constraints Image completion of Large regions is challenging task ,boundaries of missing region provide few or no constraints . In our work we adopted user interaction approach allowing user to draw curves across missing region . Our algorithm synthesizes simultaneously curves and texture in the same unified work . User provides completion region and a mask Deformation constraints Many recent retargeting methods allow user to mark important regions ,One important cue that was overlooked are lines and objects with straight edges are very common in images ,buildings, roads, trucks keeping those line straight is important . In our algorithm we overcome those problems by constraining the domain of possible nearest neighbors locations in the output . We impose these constraints with “gradual scaling” , deformations become gradual because lack of space and we been able to correct them . Another example - Scaling Hard Constrains (reshuffling) The user wants to keep a region in an Image as hard constrain without changing it during the process ,we can achieve that by fixing NN fields to relevant region points after Iteration we simply correct the offsets to the output position ,so the other object will gradually rearrange to align with these constructed regions . a b c Image moved but the background (b) “reshuffled” and competed all the missing data . (c) – patch transform . Summary We saw different methods for Image retrieval some of them were feature based and some of them were regions based ,every method has it advantages and disadvantages which one to use is up to the implementation . Also we saw some new Editing tool Algorithm that showed what can be done using those image retrieval methods . http://gfx.cs.princeton.edu/pubs/Barnes_2009_PAR/ http://docs.opencv.org/doc/tutorials/features2d/trackingmotion/harris_dete ctor/harris_detector.html Content-based Image Retrieval Using Gabor Texture Features IEEE Transactions PAMI 2000 http://www.gscit.monash.edu.au/~dengs/resource/papers/pcm00.pdf B. S. Manjunath and W. Y. Ma. “Texture features for browsing and retrieval of large image data” IEEE Transactions on Pattern Analysis and Machine Intelligence, (Special Issue on Digital Libraries), Vol. 18 (8), August 1996, pp. 837-842. http://jamf.eu/jamf/export/2097/trunk/doc/papers/96PAMITrans.pdf Y. Rubner and C. Tomasi and L. J. Guibas. The Earth Mover's Distance as a Metric for Image Retrieval. International Journal of Computer Vision, 40(2) November 2000, pages 99--121. http://www.cs.duke.edu/~tomasi/papers/rubner/rubnerIjcv00.pdf Colour Interest Points for Image Retrieval Julian Stottinger, Nicu Sebe, Theo Gevers, and Allan Hanbury Computer Vision Winter Workshop 2007. http://oldwww.prip.tuwien.ac.at/people/julian/publications1/data/Stoettinger_et_al_CVWW07.pdf "Combining Color and Spatial Information for Content-based Image Retrieval". J. Huang and R. Zabih, http://www.cs.cornell.edu/rdz/Papers/ecdl2/spatial.htm Patch Based: Learning Image Patch Similarity (Chapter in book - very detailed. explains bg point matching patch, features. http://ttic.uchicago.edu/~gregory/thesis/thesisChapter6.pdf PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing Connelly Barnes Eli Shechtman Adam Finkelstein1 Dan B Goldman ftp://194.153.101.105/Faculty/arik/Seminar2009/papers/patchMatch.pdf