Texture and Shape for Image Retrieval – Multimedia Analysis and Indexing Winston H. Hsu National Taiwan University, Taipei Office: R512, CSIE Building Communication and Multimedia Lab (通訊與多媒體實驗室) http://www.csie.ntu.edu.tw/~winston October 23, 2007 Outline Texture Statistical features Spectral features Edge Shape MMAI, Fall 07 - Winston Hsu, NTU -2- 1 Reminder Homework #2 Midterm Due: TA@501 (noon, Tuesday, November 13) Rule – “deliver quality work on time with integrity!!” A small recap of what we mentioned (major literatures) High-level concepts mentioned in the course Open book (no computer) but requiring no print-out Mailing list http://cmlmail.csie.ntu.edu.tw/mailman/listinfo/mmai -3- MMAI, Fall 07 - Winston Hsu, NTU Syllabus (tentative) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 9/25/07 10/02/07 10/09/07 10/16/07 10/23/07 10/30/07 11/06/07 11/13/07 11/20/07 11/27/07 12/04/07 12/11/07 12/18/07 12/25/07 01/01/08 01/08/08 01/15/08 01/22/08 MMAI, Fall 07 - Winston Hsu, NTU holiday introduction mpeg; shot detection cbr overview; color texture+shape; relevance feedback multidimensional indexing; feature reduction midterm gmm+cbir; svm+cbir (graphical/discriminative models) structure discovery (sports; story) TRECVID; concept detection; image annotation concept detection; image annotation un-/supervised clustering (clustering) video retrieval intro audio/music holiday project presentation #1, #2 final (no course) project report due -4- 2 Scenario of Content-Based Image Retrieval retrieved images query image Image Database feature extraction distance metric feature (vector) space -5- MMAI, Fall 07 - Winston Hsu, NTU Fusion of Multimodal Features How to weigh the feature significance ? Cross-validation approach User-selected Automatically weighting by relevance feedback Score -> 1 Ranking -> 0 1 0 1 Fusion approaches such as: Sum (Borda fuse) WtSum (weigthed Borda Fuse) Max (Round-Robin) 0 Retrieval Results by Different Features MMAI, Fall 07 - Winston Hsu, NTU N Normalised Results * From Kieran Mc Donald -6- 3 -7- MMAI, Fall 07 - Winston Hsu, NTU Texture What is texture Why texture? Has structures or repetitious pattern, i.e., checkboard Has statistical patterns, i.e., grass, sand, rock Applications to satellite images, medical images Describe contents of real world images, i.e., clouds, fabrics, surfaces, wood, stone Data set e.g., Brodatz: famous texture photographs for imagetexture analysis Man-made textures & natural objects MMAI, Fall 07 - Winston Hsu, NTU -8- 4 Mosaic of Brodatz Texture MMAI, Fall 07 - Winston Hsu, NTU -9- Types of Computational Texture Features Structural – describing arrangement of texture elements Statistical – characterizing texture in terms of statistical features Co-occurrence matrix Tamura (coarseness, directionality, contrast) Multiresolution simultaneous autoregressive model (MRSAR) Edge histogram Spectral – based on analysis in spatial-frequency domain Fourier domain energy distribution Gabor Pyramid-structure wavelet transform (PWT) Tree-structure wavelet transform (TWT) Laws Filter MMAI, Fall 07 - Winston Hsu, NTU -10- 5 Co-occurrence Matrix Co-occurrence matrix Cd Specified with a displacement vector d = {(row, column)} Entry Cd(i, j) indicates how many times a pixel with gray level i is separated from a pixel of gray level j by the displacement vector d Usually use normalized version of Cd physical meaning? d = (1, 1) Sometimes use symmetric version of Cd -11- MMAI, Fall 07 - Winston Hsu, NTU Co-occurrence Matrix (cont.) Examples MMAI, Fall 07 - Winston Hsu, NTU * From Prof. Leow Wee Kheng, NUS -12- 6 Co-occurrence Matrix (cont.) Consider the following example (black = 1, white = 0) For d=(1,1), the only non-zero entries are at (0,0) and (1,1) captures diagonal structure For d=(0,1), the only non-zero entries are at (0,1) and (1,0) captures horizontal structure MMAI, Fall 07 - Winston Hsu, NTU -13- Co-occurrence Matrix (cont.) Measures on the following features What does it mean when entropy has the largest value as the Nd(i,j) are equal? A almost-obsolete feature Not effective for classification and retrieval Expensive to compute MMAI, Fall 07 - Winston Hsu, NTU -14- 7 Tamura – Selected Textual Properties fine / coarse high contrast / low contrast roughness / smooth directional / non-directional line-like / blob-like regular / irregular -15- MMAI, Fall 07 - Winston Hsu, NTU Usefulness in Describing Texture Psychophysical experiments – high correlation between some groups of properties Orientation Line-like Regularity Coarseness Contrast Roughness Similar correlations Computational measures Coarseness Contrast Orientation MMAI, Fall 07 - Winston Hsu, NTU -16- 8 Tamura – Coarseness Goal Pick a large size as best when coarse texture is present, or a small size when only fine texture Step 1: Compute averages at different scales at every points MMAI, Fall 07 - Winston Hsu, NTU -17- Tamura – Coarseness (cont.) Step 2: compute neighborhood difference at each scale on opposite sides of different directions MMAI, Fall 07 - Winston Hsu, NTU -18- 9 Tamura – Coarseness (cont.) Step 3: select the scale with the largest variation Step 4: compute the coarseness crs -19- MMAI, Fall 07 - Winston Hsu, NTU Tamura – Contrast Gaussian-like histogram distribution low contrast Histogram polarization. Is it Gaussian? How many peaks it has? Where they are? Polarization can be estimated by the kurtosis (曲率度) MMAI, Fall 07 - Winston Hsu, NTU -20- 10 Tamura – Contrast (cont.) distribution with two separate peaks unimodal distribution Contrast estimate is given by: -21- MMAI, Fall 07 - Winston Hsu, NTU Tamura – Orientation Building the histogram of local edges at different orientations By deriving the edge magnitude at X and Y directions MMAI, Fall 07 - Winston Hsu, NTU -22- 11 Tamura – Orientation (cont.) Compute the estimate from the sharpness of the peaks By summing the second moments around each peak e.g., flat histogram large 2nd moment (variance) small orientation -23- MMAI, Fall 07 - Winston Hsu, NTU (MR)SAR Each pixel is a random variable whose value is estimated from its neighboring pixels + noise A kid of Markov Random Field model SAR Model (Simultaneous Autoregressive) [Mao’92] Describes each pixel in terms of its neighboring pixels. MRSAR Model (MultiResolution SAR) Describing granularities by representing textures at variety of resolutions SAR SAR applied at various image levels Metric parameter differences SAR model parameters SAR MMAI, Fall 07 - Winston Hsu, NTU input image image pyramid -24- 12 Edge Histogram Edge histogram (EHD) Captures the spatial distribution of the edge in six statues: 0º, 45º, 90º, 135º, non direction and no edge. Utilizing the filters 90° edge 0 ° edge 45 ° edge 135 ° edge non-directional edge Global EHD of an image: Concatenating 16 sub EHDs into a 96 bins Local EHD of a segment Grouping the edge histogram of the image-blocks fallen into the segment Macro-block Image-block -25- MMAI, Fall 07 - Winston Hsu, NTU Vector Space Concept Orthonormal Bases (d-dim. vectors) Any vector in a vector space can be expanded by the set of orthonormal signals Response for basis k, Transform to the new bases (1D/2D) Fourier bases are sets of orthornomal signals MMAI, Fall 07 - Winston Hsu, NTU -26- 13 The Fourier Transform Represent function on a new basis Think of functions as vectors, with many components We now apply a linear transformation to transform the basis In the expression, u and v select the basis element, so a function of x and y becomes a function of u and v basis elements have the !i 2 " ( ux +vy ) form e dot product with each basis element F( g( x, y))( u, v) = ## g( x, y)e !i2" ( ux +vy)dxdy R2 -27- MMAI, Fall 07 - Winston Hsu, NTU Visual Sinus Pattern* MMAI, Fall 07 - Winston Hsu, NTU *The following 5 slides are from Jaap van de Loosdrecht, Noordelijke Hogeschool Leeuwarden -28- 14 Visual Sinus Pattern w/ Low Frequency MMAI, Fall 07 - Winston Hsu, NTU -29- Sinus Pattern Rotated 45 Deg. MMAI, Fall 07 - Winston Hsu, NTU -30- 15 2D Sinus Pattern -31- MMAI, Fall 07 - Winston Hsu, NTU 2D Rectangle Difference in spatial vs. frequency domain 1D sync function of different scales MMAI, Fall 07 - Winston Hsu, NTU -32- 16 Interpreting the Power Spectrum Explain structures in power spectrum 2 3 dark 3 bright high frequency low frequency DC 1 -33- MMAI, Fall 07 - Winston Hsu, NTU Phase and Magnitude Fourier transform of a real function is complex difficult to plot, visualize instead, we can think of the phase and magnitude of the transform Phase is the phase of the complex transform Magnitude is the magnitude of the complex transform MMAI, Fall 07 - Winston Hsu, NTU Curious fact all natural images have about similar magnitude transform hence, phase seems to matter, but magnitude largely doesn’t Same for audio? Demonstration Take two pictures, swap the phase transforms, compute the inverse - what does the result look like? -34- 17 MMAI, Fall 07 - Winston Hsu, NTU -35- This is the magnitude transform of the zebra pic MMAI, Fall 07 - Winston Hsu, NTU -36- 18 This is the phase transform of the zebra pic MMAI, Fall 07 - Winston Hsu, NTU -37- MMAI, Fall 07 - Winston Hsu, NTU -38- 19 This is the magnitude transform of the cheetah pic MMAI, Fall 07 - Winston Hsu, NTU -39- This is the phase transform of the cheetah pic MMAI, Fall 07 - Winston Hsu, NTU -40- 20 Reconstruction with zebra phase, cheetah magnitude MMAI, Fall 07 - Winston Hsu, NTU -41- Reconstruction with cheetah phase, zebra magnitude MMAI, Fall 07 - Winston Hsu, NTU -42- 21 Natural Images and Their FT What happened to the FT patterns when the texture scale and orientation are changed? -43- MMAI, Fall 07 - Winston Hsu, NTU Frequency Domain Features Fourier domain energy distribution Angular features (directionality) Radial features (coarseness) FT where, where, Uniform division may not be the best!! MMAI, Fall 07 - Winston Hsu, NTU -44- 22 Gabor Texture Fourier coefficients depend on the entire image (Global) we lose spatial information Objective: local spatial frequency analysis Gabor kernels: looks like Fourier basis multiplied by a Gaussian The product of a symmetric (even) Gaussian with an oriented sinusoid Gabor filters come in pairs: symmetric and anti-symmetric (odd) Each pair recover symmetric and anti-symmetric components in a particular direction (kx, ky): the spatial frequency to which the filter responds strongly σ : the scale of the filter. When σ = infinity, similar to FT We need to apply a number of Gabor filters are different scales, orientations, and spatial frequencies -45- MMAI, Fall 07 - Winston Hsu, NTU Example – Gabor Kernel Zebra stripes at different scales and orientations and convolved with the Gabor kernel The response falls off when the stripes are larger or smaller The response is large when the spatial frequency of the bars roughly matches the windowed by the Gaussian in the Gabor kernel Local spatial frequency analysis zebra image Gabor kernel magnitude of the filtered image MMAI, Fall 07 - Winston Hsu, NTU -46- 23 Gabor Texture (cont.) Image I(x,y) convoluted with Gabor filters hmn (totally M x N) Using first and 2nd moments for each scale and orientations Features: e.g., 4 scales, 6 orientations 48 dimensions odd even Gabor kernels -47- MMAI, Fall 07 - Winston Hsu, NTU Gabor Texture (cont.) orientation Arranging the mean energy in a 2D form structured: localized pattern oriented (or directional): column pattern granular: row pattern random: random pattern MMAI, Fall 07 - Winston Hsu, NTU scale frequency domain -48- 24 Laws Texture Energy Features Non-Fourier type bases Match better to intuitive texture features The filter algorithm Filter the input image using texture filters Computer texture energy by summing the absolute value of filtered results in local neighborhoods around each pixel Combine features to achieve rotational invariance -49- MMAI, Fall 07 - Winston Hsu, NTU Law’s Texture Masks (1) Basic 1D masks can be extended to create 2D masks L5 (Level) = [ 1 4 6 4 1 ] (Gaussian) gives a center-weighted local average E5 (Edge) = [ -1 -2 0 2 1 ] (gradient) responds to row or column step edges S5 (Spot) = [ -1 0 2 0 -1 ] (LoG) detects spots R5 (Ripple) = [ 1 -4 6 -4 1 ] (Gabor) detects ripples MMAI, Fall 07 - Winston Hsu, NTU -50- 25 Law’s Texture Masks (2) Create 2D mask E5 L5 E5L5 -51- MMAI, Fall 07 - Winston Hsu, NTU Laws Filters (2D) MMAI, Fall 07 - Winston Hsu, NTU -52- 26 Laws Process -53- MMAI, Fall 07 - Winston Hsu, NTU Wavelet Features (PWT, TWT) Wavelet PWT: pyramid-structured wavelet transform Decomposition of signal with a family of basis functions with recursive filtering and sub-sampling Each level, decomposes 2D signal into 4 subbands, LL, LH, HL, HH (L=low, H=high) Recursively decomposes the LL band Feature dimension (3x3x1+1)x2 = 20 TWT: pyramid-structured wavelet transform Some information in the middle frequency channels Feature dimension 40x2 = 80 MMAI, Fall 07 - Winston Hsu, NTU original image PWT TWT -54- 27 Texture Comparisons [Ma’98] Retrieval performance of different texture features according to the number of relevant images retrieved at various scopes using Corel Photo galleries # of relevant images MRSAR (M) Gabor TWT PWT MRSAR Tamura (improved) Coarseness histogram directionality edge histogram Tamura # of top matches considered -55- MMAI, Fall 07 - Winston Hsu, NTU Texture Comparisons (cont.) [Ma’98] Retrieval performance of texture features in terms of the number of top matches considered using Brodatz album Running recall Gabor MRSAR (M) TWT PWT MRSAR Tamura (improved) Tamura Coarseness histogram directionality edge histogram Running # of top matches considered MMAI, Fall 07 - Winston Hsu, NTU -56- 28 Texture Comparisons (cont.) Images of rock samples in applications related to oil exploitation MMAI, Fall 07 - Winston Hsu, NTU Texture Comparisons (cont.) Images of rock samples in applications related to oil exploitation [Li’00] -57- [Li’00] Gabor descriptors outperform the others MMAI, Fall 07 - Winston Hsu, NTU -58- 29 Learned Similarity [Ma’96] Distance metrics DO matter All based on Gabor features Euclidean vs. learned (supervised) distance metric The later was maintained with texture thesaurus Euclidean distance learned (supervised) distance MMAI, Fall 07 - Winston Hsu, NTU Shape Region-base descriptor Contour-based Shape Descriptor 2D/3D Shape Descriptor Some relevant ones are included in MPEG-7 Not easy to derive automatically MMAI, Fall 07 - Winston Hsu, NTU -59- [Bober’01] -60- 30 Region-based vs. Contour-based Descriptor Columns indicate contour similarity Outline of contours Rows indicate region similarity Distribution of pixels -61- MMAI, Fall 07 - Winston Hsu, NTU Region-based Descriptor Express pixel distribution within a 2D object region Employs a complex 2D Angular Radial Transformation (ART) 35 fields each of 4 bits Rotational and scale invariance Robust to some non-rigid transformation L1 metric on transformed coefficients Advantages Describing complex shapes with disconnected regions Robust to segmentation noise Small size Fast extraction and matching MMAI, Fall 07 - Winston Hsu, NTU -62- 31 Contour-based Descriptor It’s based on Curvature (曲率) Scale-Space (CSS) representation Found to be superior to Zernike moments ART Fourier-based Turning angles Wavelets Rotational and scale invariance Robust to some non-rigid transformations For example Applicable to (a) Discriminating differences in (b) Finding similarities in (c)-(e) (a) (b) (c) (d) (e) MMAI, Fall 07 - Winston Hsu, NTU -63- Problems in Shape-based Indexing Many existing approaches assume Segmentation is given Human operator circle object of interest Lack of clutter and shadows Objects are rigid Planar (2-D) shape models Models are known in advance MMAI, Fall 07 - Winston Hsu, NTU -64- 32 Summary Texture features Texture computation are time-consuming Statistical Spectral compressed domain features? Shape features Multimodal fusion are quite helpful Next week Efficient indexing on high-dimensional data Feature reduction MMAI, Fall 07 - Winston Hsu, NTU -65- 33