Image Representation Global vectorial representations • Image features have a feature vector representation to collects the numerical values of the feature descriptor. Therefore features can be regarded as points in a multidimensional feature space. • Representations of images based on global features in vectorial form have a dimension that is determined by the number of features properties used to describe the image patterns. Feature histogram representations • • • Histograms are a common feature vector representation that measures the frequency with which the feature appears in an image or in an image window. With histograms features are quantized into a finite number of bins. Histogram-based holistic representations are widely used with color, edge, lines. Being orderless the histogram representation is invariant to viewing conditions and is also tolerant to some extent to partial occlusions. Histogram representations main drawbacks are: – Global representation of the image (window) content: Might be inaccurate to account for local properties and spatial configuration of regions. – Dimensionality course: Histograms must have a high number of bins (>64) for a meaningful representation. This requires high-dimensional indexes for similarity search – Histogram distances: Several histogram distances can be defined but their choice is critical for a meaningful and sound comparison of feature distributions – Histogram binning: Hard assignment of feature values to bins might result into boundary effects. Color histogram • The color histogram of an image defines the image colour distribution. Color histograms tassellate the color space and hold the count of pixels for each color zone. For gray level images the grey level histogram is built similarly. • Main issues: − Uniform tassellation can result inappropriate for a correct representation of color content. − Color histogram size is typical 256 or more for meaningful representations − Quadratic distances are often more appropriate for color distribution matching − Accounting for spatial information is often fundamental for meaningful color comparison Color histogram Color Correlogram • Correlogram is a variant of histogram that accounts for the local spatial correlation of colors. Correlogram is based on the estimation of the probability of finding a pixel of color j at a distance k from a pixel of color i in an image Image: I Quantized colors: c1, c1,…….. cm, Distance between two pixels: |p1 – p2| = max Pixel set with color c: Ic = p | I(p) = c Given distance: k |x1 – x2|, |y1 – y2| - Correlogram: Dimensionality m m d if number of different k is d - Auto-correlogram: Dimensionality m d • Practical consideration: use auto-correlogram with m <= 64, d = 1, k = 1 Edge histograms • Edge histograms represent distribution of edge properties in one image • Edge intensity histograms provide good discrimination between scenes. • Histograms of edge orientations can also be obtained to have a description of the edge directionality. Each histogram bin accounts for the number of edges at a specific orientation. – 8 orientations are typically sufficient. – Interpolation of bin assignment can be necessary for meaningful representations Line histograms • Histograms of line lenght and orientation provide useful characterization of image content. – Binning can be critical for discriminative representations Image lines Lines Line length histogram Line orientation histogram Human localization capability • Localization is the process of identifying landmarks of a scene. • We as humans can distinguish: – Indoors: strong assumptions of flat walls, narrow hallways… – Outdoors: less conforming set of surfaces We as humans we can for localization: – Objects – Regions – The scene as a whole • Human vision architecture • In the human visual system there exist evidence of place recognizing cells at parahippocampal place area. Context information is obtained within a eye saccade in approx 150 ms. • Two basic models: Gist and Saliency • Visual Cortex: low level filters, center-surround, and normalization Saliency model: attends to pertinent regions Gist model: computes image general characteristics High Level Vision: – Object recognition – Layout recognition – Scene understanding • • • Gist versus Saliency • Gist is the term used to signify the essence, the holistic characteristics of an image. “It is an abstract representation of the scene that spontaneously activates memory representations of scene categories (a city, a mountain, etc.)” [A. Oliva and A. Torralba 2001] • Gist utilizes the same visual cortex raw features as in the Saliency model. Gist is theoretically non-redundant with Saliency • Gist versus Saliency: – instead of looking at most conspicuous locations in image, looks at scene as a whole – detects regularities (not irregularities) – exploits cooperation (accumulation) instead of competition (winner takes all) among locations – there is more spatial emphasis in Saliency Gist GIST global representation • With images an approximate global representation of human gist can be obtained by partitioning the image in a 4x4 grid (3x3 for small images like 32x32) and taking orientations at different scales and center-surround differences of color and intensity at each grid cell. • It is equivalent to compute gradient magnitude and orientation for each grid cell plus color and intensity gradients. Does not imply any segmentation. Gist model implementation • V1 raw image feature-maps ‒ Orientation Channel Gabor filters at 4 angles (0,45,90,135) on 4 scales = 16 sub-channels ‒ Color red-green and blue-yellow center-surround with 6 scale combinations = 12 sub-channels ‒ Intensity dark-bright center-surround with 6 scale combinations = 6 sub-channels • Total of 34 sub-channels Gist model implementation • Gist feature extraction: average values on the predetermined grid Gist vector Gist model implementation • Dimension Reduction – Original: 34 sub-channels x 16 features = 544 features – PCA/ICA reduction: 80 features keep >95% of variance • Place Classification – Three-layer neural network The MPEG-7 standard • MPEG-7 formally named Multimedia Content Description Interface, is a standard for describing the multimedia content that supports some degree of interpretation of the information meaning which can be passed onto or accessed by a device or a computer code . MPEG7 is composed of: – MPEG-7 Visual – the Description Tools dealing with Visual descriptions. – MPEG-7 Audio – the Description Tools dealing with Audio descriptions • The goal of the MPEG-7 standard is to allow interoperable searching, indexing, filtering and access of audio-visual content by enabling interoperability among devices and applications. Ideally, MPEG-7 facilitates exchange and reuse of multimedia content across different application domains Compression Coding MPEG-1,-2,-4 Transmission Retrieval Streaming Management Filtering Acquisition Authoring Editing MPEG-7 Searching Indexing Browsing Navigation MPEG-7 description elements • MPEG-7 provides four types of normative description elements: – Descriptors, – Description Schemes (DSs) – Description Definition Language (DDL) – System Tools (coding schemes) • A description consists of a Description Scheme and the set of Descriptor values: – Descriptor: A representation of a feature. A Descriptor defines the syntax and the semantics of the feature representation. – Description Scheme: The structure and semantics of the relationships between its components, which may be both Descriptors and Description Schemes. MPEG-7 Descriptors • MPEG-7 Descriptors support a range of abstraction levels, from low-level signal characteristics to high-level semantic information. The abstraction level relates to the way we extract the features: we can automatically extract most low-level features, whereas high-level features usually need human supervision and annotation. • Only the description format is fixed, not the extraction methodologies. Description Production (extraction) Description Normative part of MPEG-7 standard Description Consumption MPEG-7 Description Scheme • A Description Scheme deals with the structure of the description and describes both the structure and semantics of the audio-visual content. In addition MPEG-7 Description Scheme also supports the description of other types of information about the multimedia data such as the coding scheme used, the data size, place and time of recording, classification, and links to other relevant material. Syntax 19 Semantics MPEG-7 Segment Description Scheme tree • Among different regions we could use Segment Relationship description tools Spatial segmentation at different levels Annotate the whole image with StillRegion MPEG-7 Segment Relationship Description Scheme graph • Video Segment Relationship description tools can be used to model video shot segments and relationships between regions within video shots MPEG-7 Description Definition Language and System Tools • Basic tools of MPEG-7 are: – Description Definition Language: “A language that allows the creation of new Description Schemes and, possibly, Descriptors. It also allows the extension and modification of existing Description Schemes.” – Systems Tools: Tools to support multiplexing of descriptions, synchronization of descriptions with content, delivery mechanisms, and coded representations for efficient storage and transmission and the management and protection of intellectual property in MPEG-7 Descriptions. • MPEG-7 descriptions take two possible forms: a textual XML form suitable for editing, searching, and filtering, the BiM binary form suitable for storage, transmission, and streaming delivery To reduce the space occupation of the stored MPEG-7 descriptors, due to the verbosity of the XML format, it is possible to use the BiM (Binary Format for MPEG-7) framework. BiM enables compression of any generic XML document, reaching an average 85% compression ratio of MPEG-7 data, and allows the parsing of BiM encoded files, without requiring their decompression Application Areas of MPEG-7 • • • • • • • • • Broadcast media selection (e.g., radio channel, TV channel) Cultural services (history museums, art galleries, etc.). Digital libraries (e.g., image catalogue, musical dictionary, film, video and radio archives). E-Commerce (e.g., personalised advertising, on-line catalogues). Education (e.g., repositories of multimedia courses, multimedia search for material). Multimedia directory services (e.g. yellow pages, Tourist information, Geographical information systems). Remote sensing (e.g., cartography, natural resources management). Surveillance and investigation services (e.g., humans recognition, forensics, traffic control, surface transportation). …… MPEG-7 will also make the web as searchable for multimedia content as it is searchable for text today. This would apply especially to large content archives, which are being made accessible to the public, as well as to multimedia catalogues enabling people to identify content for purchase MPEG-7 Visual : Visual Descriptors Color Descriptors Texture Descriptors Shape Descriptors Motion Descriptors for Video Color Descriptors Color Descriptors Scalable Color HSV space Dominant Color Group Of Frames / Pictures histogram • Constrained color spaces: - Scalable Color Descriptor uses HSV - Color Structure Descriptor uses HMMD - Color Layout Descriptor uses YCbCr Color Structure HMMD space Color Layout YCbCr space Scalable Color Descriptor • Scalable Color Descriptor (SCD) is in the form of a color histogram in the HSV color space encoded using a Haar transform. H is quantized to 16 bin and S and V are quantized to 4 bins each. The binary representation is scalable in the number of bins used and the number of bits per bin. • After all the pixels are processed, the histogram is calculated with the probability for each bin, truncated into an 11-bit value. These values are then non-uniformly quantized into 4-bit values according to the table provided in the ISO specification 13 for more efficient encoding, giving higher significance to small values. Value Green (120o) Yellow (60o) Cyan (180o) Red (0o) Magenta (300o) Blue (240o) Hue White Saturation Black Haar Wavelet Transform* • In numerical analysis and functional analysis, the Discrete Wavelet Transform refers to wavelet transforms for which the wavelets are discretely sampled • The first Discrete Wavelet Transform was invented by the mathematician Alfréd Haar: – for an input represented by a list of 2n numbers, the Haar wavelet transform may be considered to pair up input values, storing the difference and passing the sum. – This process is repeated recursively, pairing up the sums to provide the next scale finally resulting in 2n − 1 differences and 1 final sum • The Haar wavelet transform can be described as a step function. In the discrete domain it is defined as a 2x2 matrix H defined as: F(x) 1 1 2x2 matrix H 1 1 1 -1 -1 +1 -1 0 0 <= x < ½ ½ < x < =1 otherwise • Given a sequence (a0, a1, a2,a3…a2n+1) of even lenght this can be transformed into a sequence of two-component vectors (a0,a1),… (a2n,a2n+1) • If one multiplies each vector with the matrix H one gets the result (s0,d0)…..(sn,dn) of one stage of the Haar wavelet transform (sum, difference). The two sequences s and d are separated and the process is repeated with the sequence s (s0, s1, s2, s3…sn) • • The discrete wavelet transform has nice properties: – It can be performed in O(n) operations – It captures not only some notion of the frequency content of the input, by examining it at different scales, but also captures the temporal content, i.e. the times at which these frequencies occur SCD computation • • With SCD summing pairs of adjacent histogram lines is equivalent to the calculation of a histogram with half number of bins. If this is performed iteratively starting with the H axis, S, V, and hence H…. Usage of subsets of the coefficients in the Haar representation is equivalent to histograms of 128, 64, 32 bins, calculated from the source histogram 256 bins 128 bins 64 bins 32 bins 16 bins This is the 16-H bin group of S=0 V=0 16 bins S 4 V 4 This is the 16-H bin group of S=1 V=0 H 16 16 bins 16 bins ….. 16 bins . Here follow 16-H bin groups of S=2, V=0 S=3, V=0 Here follow 16-H bin groups of S=0-3, V=1 S=0-3, V=2 S=0-3, V=3 Bin scaling • The result of applying Haar Transform is a set of 16 low pass coefficients and up to 240 high-pass coefficients. The high-pass (difference) coefficients of the Haar transform express the information contained in finer-resolution levels of the histogram. • Natural image signals usually exhibit high redundancy between adjacent histogram lines. This can be explained by the slight variation of colors caused by variable illumination and shadowing effects. Hence, it can be expected that the high-pass coefficients expressing differences between adjacent histogram bins usually have only small values. Exploiting this property, it is possible to truncate the high-pass coefficients to integer representation with a low number of bits • • • SCD representations can be stored in different resolutions, ranging from 256 down to 16 coefficients per histogram. Table shows the relationship between number of Haar coefficients as specified in the SCD and partitions in the components of a corresponding HSV histogram that could be reconstructed from the coefficients No. coeff # bins: H # bins: S #bins: V 16 4 2 2 32 8 2 2 64 8 2 4 128 8 4 4 256 16 4 4 Bit scaling • The high-pass (difference) coefficients in the Haar transform can take either positive or negative values. The sign part is always retained whereas the magnitude part can be scaled by skipping the least significant bits. • Using the sign-bit only (1 bit/coefficient) leads to an extremely compact representation, while good retrieval efficiency is retrained. At the highest accuracy level, 1–8 bits are defined for integer representations of the magnitude part, depending on the relevance of the respective coefficients. In between these extremes, it is possible to scale to different resolution levels. • Nbits/bin (#bin<256) 11bits/bin 4bits/bin Matching with SCD • • • With SCD, the reconstruction of color histogram from Haar coefficients allows matching with highest retrieval efficiency. Matching in the histogram domain is only useful to achieve high quality, i.e. when all coefficients are available. It is recommended to perform the matching directly in the Haar coefficient domain, which induces only marginal loss in the precision of the similarity matching with considerable savings in computational cost. For matching in the Haar coefficient domain it is recommended to use the L1 norm. The L1 norm is also recommended for matching in the histogram domain. GoF/GoP Color Descriptor • GoF/GoP Color Descriptor extends Scalable Color Descriptor for a video segment or a group of pictures: joint color histogram is then processed as SCD - Haar transform encoding • In this case two additional bits allow to define how the joint histogram is calculated before applying the Haar transform. The standard allows to use average, median or intersection histograms aggregation methods: – Average: sensitivity to outliers (lighting changes occlusion, text overlays) – Median: increased computational complexity for sorting – Intersection: a “least common” color trait viewpoint • Applications: – Browsing a large collection of images to find similar images – Use histogram Intersection as a color similarity measure for clustering a collection of images – Represent each cluster by GoP descriptor histogram Intersection Dominant Color Descriptor • Dominant Color Descriptor (DCD) assumes that a given image is described in terms of a set of region labels and the associated color descriptors: – Each pixel has a unique region label – Each region is characterized by a color histogram • Colors in a given region are clustered into a small number of representative colors. For each representative color the descriptor consists of: – ci : representative color identifier – pi : its percentage in the region – vi : its color variance in the region – s : the overall spatial coherency of the dominant colors in the region F= {{c , p ,v },s}, i i i (i =1,2, , N ) DCD computation • DCD variance is computed as the variance of each of the dominant colors (h are perceptual weights): vi • • • Spatial coherency for each dominant color captures how coherent the pixels corresponding to the dominant color are and whether they appear to be a solid color in the given image region. Spatial coherency per dominant color is computed by the normalized average connectivity (8-connectedness: pixels with coordinates are counted if connected to the corresponding dominant color pixel ) DCD spatial coherency gives an idea of the spatial homogeneity of the dominant colors of a region. It is computed as a single value by the weighted sum of per-dominant color spatial coherencies. The weight is proportional to the number of pixels corresponding to each dominant color. Matching with DCD • . DCD is suitable for local (object or region) features, when a small number of colors is enough to characterize the color information. Before feature extraction, images must be segmented into regions: − maximum of 8 dominant colors can be used to represent the region (3 bits) − percentage values are quantized to 5 bits each − variance: 3 bits /dominant color − spatial coherence: 5 bits • The color quantization depends on the color space specifications defined for the entire database and need not be specified with each descriptor. LuV uniform color space is recommended. • Dominant color representation is sufficiently accurate and compact compared to the traditional color histogram: - color bins quantized from each image region instead of fixed - 3 bins on average instead of 256 or more • It supports efficient database indexing and search. Typically when using DCD image similarity is evaluated simply comparing the corresponding dominant color percentages and dominant color similarity (color distances): N1 N2 N1 N 2 D (F1, F 2 ) = å p + å p - åå(2a1i,2 j )p1i p2 j 2 2 1i i=1 2 2j j=1 i=1 j=1 ak,l : similarity coefficient between two colors ck and cl a k,l ì1- d k,l / d max =í î0 d k,l £ Td d k,l > Td d k ,l c k c l dk,l : Euclidean distance between two colors ck and cl Td : maximum distance for two colors to be considered similar, dmax = Td , values 1.0 - 1.5, Td values 10 - 20 in the Luv color space Color Structure Descriptor • Similar to a histogram, the Color Structure Descriptor (CSD) represents an image by both the color distribution and the local structure. Scalable Color Descriptor may not distinguish both images but the Color Structure Descriptor can do it. • CSD is obtained by scanning the image by an 8x8 structure element in a sliding window approach: with each shift of the structuring element, the number of times a particular color is contained in the structure element is counted, and a color histogram is constructed. The HMMD color space is used. HMMD Color space* • • The HMMD color space regards the colors adjacent to a given color in the color space as the neighboring colors. It is closely related to HSV: ‒ the Hue is the same as in the HSV space (0-360°) ‒ Max and Min are the maximum and minimum among the R, G, and B values i.e. how much •black and how much white are present respectively ‒ Diff component is the difference between Max and Min i.e. how much a color is close to • pure• color ‒ Sum = (Max + Min) / 2 can also be defined i.e. how much brightness • • Only three of the four components are sufficient to describe the HMMD space (H, Max, Min) or (H, Diff, Sum). HMMD color space can be depicted using the double cone structure • • HMMD can accomplish a color quantization close to the change of the color sensed by the • human eye, thereby capable of enhancing a performance of content-based image searching. HMMD subspace quantization 4 nonuniform quantizations are defined that partition the space into 256, 128, 64, 32 cells Subspace 0 Subspace 1 Each quantization is defined via five subspaces. The Diff axis is defined in 5 subintervals [0,6), [6,20), [20, 60), [60,110), [110, 255). Each subspace has sum and hue allowed to take all values in their ranges. They are partitioned into uniform intervals according to a table. Subspace 2 Subspace 3 Hue Sum 1 16 4 4 8 4 8 4 8 4 Subspace 4 black white Example: 128-bins (cells) of the HMMD color space CSD computation • The color structure histogram allows for m quantized colors cm, where m is {256, 128, 64, 32}. • The bin value h(m) is the number of structuring elements containing one or more pixels with color cm – consider the set of quantized color index of an image and the set of quantized color index existing inside the subimage region covered by the structuring element – with the structuring element scanning the image, the color histogram bins are accumulated – the final value of h(m) is determined by the number of positions at which the structuring element contains color cm CO LO R C0 C1 B IN +1 C2 C3 +1 C4 C5 C6 C7 8 x 8 s tru c tu rin g e le m e n t +1 Matching with CSD • Given two images with DCD representation matching is performed by computing L1 distance measure between CSDs: dist( A , B ) i h A (i ) h B (i ) Color Layout Descriptor • Color Layout Descriptor (CLD) is very Compact Descriptor (63 bit) per image based on: – Grid-based Dominant Color in the YCbCr color space (the dominant color may also be the average color) – DCT (Discrete Cosine transformation) on a 2D-array of Dominant Colors – Final quantization to 63 bits F ={CoefPattern, Y-DC_coef, Cb-DC_coef, Cr-DC_coef, Y-AC_coef, Cb-AC_coef, Cr-AC_coef} Y = 0.299*R + 0.587*G + 0.114*B Cb = -0.169*R - 0.331*G + 0.500*B Cr = 0.500*R - 0.419*G - 0.081*B DCT (Discrete Cosine Transformation)* • • DCT applies to 8x8 image blocks For each block, DCT allows to shift from spatial domain to frequency domain: f(i,j) is the value that is present in the (i,j) position of the 8x8 block of the original image F(u,v) is the DCT coefficient of the 8x8 block in the (u,v) position of the 8x8 matrix that encodes the transformed coefficients The 64 (8 x 8) DCT basis functions: F[0,0] CLD computation • • The image is clustered into 64 (8x8) blocks A single representative color is selected from each block (the average of the pixel colors in a block suggested as the representative color). The selection results in a 8x8 image ... ... ... ... • • Derived average colors are transformed into a series of coefficients by performing DCT A few low-frequency coefficients are selected using zigzag scanning and quantized to form a CLD (large quantization step in quantizing AC coeff / small quantization step in quantizing DC coff). If the time domain data is smooth (with little variation in data) then frequency domain data will make low frequency data larger and high frequency data smaller. Matching with CLD • CLD is efficient for: – Sketch-based image retrieval – Content Filtering using image indexing • The distance of two Color Layout Descriptors CLD and CLD’ with 12 coefficients (6 Y, 3 Cb, 3Cr): CLD {Y0, ..., Y5, Cr0, Cr1, Cr2, Cb0, Cb1, Cb2} is defined as follows : What applications • Scalable Color descriptor is useful for image-to-image matching and retrieval based on color feature. Retrieval accuracy increases with the number of bits used in the representation. • Dominant Color(s) descriptor is most suitable for representing local (object or image region) features where a small number of colors are enough to characterize the color information. A spatial coherency on the entire descriptor is also defined, and used in similarity retrieval. • Color structure descriptor is suited to image-to-image matching and its intended use is for stillnatural image retrieval, where an image may consist of either a single rectangular frame or arbitrarily shaped, possibly disconnected, regions. • Color Layout descriptor allows image-to-image matching at very small computational costs and ultra high-speed sequence-to-sequence matching also at different resolutions. It is feasible to apply to mobile terminal applications where the available resources is strictly limited. Users can easily introduce perceptual sensitivity of human vision system for similarity calculation. Texture Descriptors • • Homogenous Texture Descriptor Non-Homogenous Texture Descriptor (Edge Histogram) Homogenous Texture Descriptor • Homogenous Texture Descriptor (HTD) is composed of 62 numbers: – #1,2: respectively the mean and the standard deviation of the image – #3-62: the energy (e) and the energy deviation (d) of the 30 Gabor filtered responses of the channels, in the subdivision layout of the frequency domain (6 orientations and 5 scales) • This design is based on the fact that response of the visual cortex is bandlimited and brain decomposes the spectra into bands in spatial frequency (from 4 to 8 frequency bands and approx as many orientations) F = {fDC, fSD, e1,…, e30, d1,…, d30} Gabor filter* • • • Gabor filter assumes that the function is first multiplied by a Gaussian function (as a window) and the resulting function is then subjected to Fourier transform to derive the time-frequency analysis (fixed Gaussian and variable frequency of the modulating wave). The characteristic of optimal joint resolution in both space and frequency suggests that these filters are appropriate operators for tasks requiring simultaneous measurement in these domains like f.e. texture discrimination. In 1D the window function means that the signal near the time being analyzed will have higher weight. The Gabor transform of a signal x(t) is defined by this formula: 2D-Gabor filter for HTD • Extension to 2D is satisfied by a family of functions which can be realized as spatial filters consisting of sinusoidal plane waves within two-dimensional elliptical Gaussian envelopes. • The corresponding Fourier transforms contain elliptical Gaussians displaced from the origin in the direction of orientation with major and minor axes inversely proportional to those of the spatial Gaussian envelopes. • Each channel filters a specific type of texture. Energy in i channel is defined as: ei log 10 [1 pi ] 1 where: pi 360 [G P s ,r ( , ) P ( , )] 2 0 0 being P(ω,θ) the Fourier transform of an image represented in the polar frequency domain and G a Gaussian function: . s G P s , r exp 2 2 s 2 r exp 2 2 r 2 The center frequencies of the channels in the angular and radial directions are such that: r = 30 ° x r with 0 ≤ r ≤ 5 s = 0 2-s with 0 ≤ s ≤ 4 , 0 = 3/4. Matching with HTD With HTD one can perform: – Rotation invariance matching – Intensity invariance matching (fDC removed from the feature vector) – Scale-Invariant matching F = {fDC, fSD, e1,…, e30, d1,…, d30} Texture Browsing Descriptor • The Texture Browsing Descriptor (TBD) requires the same spatial filtering as the HTD and captures the regularity (or the lack of it) in the texture pattern. Its computation is based on the following observations: – Structured textures usually consist of dominant periodic patterns. – A periodic or repetitive pattern, if it exists, can be captured by the filtered images. – The dominant scale and orientation information can be captured by analyzing projections of the filtered images. • The texture browsing descriptor can be used to find a set of candidates with similar perceptual properties and then use the HTD to get a precise similarity match list among the candidate images. TBD computation • The TBD descriptor is defined as follows: TBD = [ v1, v2, v3, v4, v5 ] – Regularity (v1): v1 represents the degree of regularity or structuredness of the texture. A larger value of v1 indicates a more regular pattern. – Scale (coarseness) (v3, v5): These represent the two dominant scales of the texture. Similar to directionality, the more structured the texture, the more robust the computation of these two components. – Direction (v2, v4 ): these values represent the two dominant orientations of the texture. The accuracy of computing these two components often depends on the level of regularity of the texture pattern. The orientation space is divided into 30 intervals. E.g look for textures that are very regular and oriented at 300 Regularity (periodic to random) Coarseness (grain to coarse) Scale and orientation selective band-pass filters Directionality (/300) Non-Homogenous Texture Descriptor Edge Histogram Descriptor • Edge Histogram Descriptor (EHD) represents the spatial distribution of five types of edges: vertical, horizontal, 45°, 135°, and non-directional – Dividing the image into 16 (4x4) blocks – Generating a 5-bin histogram for each block • EHD cannot be used for object-based image retrieval. Thedgeif set to 0 EHD applies for binary edge images (sketch-based retrieval) • EHD is scale invariant. The Extended EHD achieves better results than HTD but does not exhibits rotation invariant property • image represents the relative frequency of occurrence of the 5 types of edges in the corresponding sub-image. As a EHD computation result, as shown in figure 5, each local histogram contains 5 bins. Each bin corresponds to one of 5 edge types. Since there are 16 sub-images in the image, a total of 5x16=80 histogram bins is required (figure 6). Note that each of the Algorithm 80- histogram bins 4x4 hasnonitsoverlapping own semantics in terms of – Divide the image into sub-images location histogram and edgeof edge type.distribution For example, bin forusing the 2x2 filter masks to – Generate for eachthe sub-image horizontal type edgehorizontal, in the sub-image located at (0, 0) innon-directional The semantics of the 1-D h bin edges into vertical, 45°diagonal, 135°diagonal, . figure 3 carries the information of the relative population of part of the MPEG-7 s the horizontal edges in the top-left local region of the image. starting from the sub-imag sub-images are visited corresponding local h accordingly. Within each arranged in the followin degree diagonal, 135-degr Table 1 summarizes the with 80 histogram bins. O should be normalized and number of edge occurrenc total number of image-blo The image-block is a ba information. That is, for whether there is at leas predominant. Global Journal of Computer Science and Technolo 9 • Edge map is obtained by using Canny edge operator. EFINITION AND SEMANTICS OF THE EDGE ISTOGRAM DESCRIPTOR (EHD) ally represents the distribution of 5 types of ocal area called a sub-image. As shown in ub-image is defined by dividing the image non-overlapping blocks. Thus, the image s yields 16 equal-sized sub-images regardless he original image. To characterize the subgenerate a histogram of edge distribution for e. Edges in the sub-images are categorized • The basic EHD45-degree uses 5 binsdiagonal, for each sub-image. In total we have 80 bins. The histogram bin values are ertical, horizontal, 135by the total number of the 4) image-blocks. al andnormalized non-directional edges (figure 997]. Thus, the histogram for each sub• relative The bin values are then non-linearlyofquantized to keep the size of the histogram as small as possible. s the frequency of occurrence the bits/bin, 240 bits are neededAs in total ges in With the 3corresponding sub-image. a per sub-image. n in figure 5, each local histogram contains 5 corresponds to one of 5 edge types. Since b-images in the image, a total of 5x16=80 is required (figure 6). Note that each of the bins has its own semantics in terms of Extended EHD computation • For a good performance, we need the global edge distribution for the whole image and semi global, horizontal and vertical edge distributions. • The Extended EHD is obtained by accumulating EHD bins for basic, semi-global and global. Global uses 5 EHD bins for all the sub-images. For the semi-global, four connected sub-images are clustered. In total, we have 150 bins (80 basic + 65 semi-global + 5 global ) Basic (80 bins) Extended (150 bins) basic Semi-global 13 clusters for semi-global global What applications • Homogenous Texture descriptor is for searching and browsing through large collections of similar looking patterns. An image can be considered as a mosaic of homogeneous textures so that these texture features associated with the regions can be used to index the image data. • Texture Browsing descriptor is useful for representing homogeneous texture for browsing type applications. It provides a perceptual characterization of texture, similar to a human characterization, in terms of regularity, coarseness and directionality. • Edge Histogram descriptor, in that edges play an important role for image perception, can retrieve images with similar semantic meaning. It targets image-to-image matching (by example or by sketch), especially for natural images with non-uniform edge distribution. The image retrieval performance can be significantly improved if the edge histogram descriptor is combined with other descriptors such as the color histogram descriptor. Shape Descriptors • • Region-based Descriptor Contour-based Shape Descriptor • • 2D/3D Shape Descriptor 3D Shape Descriptor • A shape is the outline or characteristic surface configuration of a thing: a contour; a form. A shape cannot be described through text. • Shape representation and matching is one of the major and oldest research topics of pattern Recognition and Computer Vision. Property of invariance of the representation - such that shape representations are left unaltered, under a set of transformations - plays a very important role in order to recognize the same object even in its translated /rotated/ scaled/ shrinked.. view. • Region Based Descriptor • Region Based Descriptor (RBD) expresses pixel distribution within a 2D object region. Employs 2D-Angular Radial Transformation (ART) defined on a unit disk in polar coordinates. ART Algorithm ‒ ‒ Perform edge detection Calculate ARTmn for m=0..M, n=0..N according to: ( ) ( ) ò ò V ( r,q ) f ( r,q ) r d r dq ARTnm F=nm = Vnm r ,q , f r ,q ‒ ‒ Scale coefficients by |ART00| to normalize Perform matching on the features ARTmn. = 2p 1 0 0 f ( , ) nm is an image function in polar coordinates, V nm ( , ) is the ART basis function. The ART basis functions are separable along the angular and radial directions The angular and radial basis functions are defined as follows: A m 1 2 exp jm 1 R n 2 cos n n 0 n 0 Magnitude of ARTnm m = 0, ..12 n = 0, ..2 Matching with RBD • • • Applicable to figures (a) – (e) Distinguishes (i) from (g) and (h); (j) Find similarities in (k), and (l) • Advantages: – Describes complex shapes with disconnected regions – Robust to segmentation noise – Fast extraction and matching Contour Based Descriptor • Contour-Based Descriptor (CBD) captures perceptually meaningful features of the shape contour. It is based on Curvature Scale Space representation. • Curvature Scale-Space – Finds curvature zero crossing points of the shape’s contour (keypoints) – Reduces the number of keypoints step by step, by applying Gaussian smoothing (the contour is then gradually smoothed by repetitive application of a low-pass filter with the kernel to X and Y coordinates of the selected N contour points ). – The position of key points are expressed relative to the length of the contour curve Scale space diagram • The number of the curvature zeroes is a decreasing function of . first derivative peaks Gaussian filtered signal • Scale space diagram The diagram of the zero positions as varies is known as scale space diagram. • Properties of scale space diagram: − edge position may shift with increasing scale − two edges may merge with increasing scale − an edge may not split into two with increasing scale • Comparison between two scale space diagrams can be made by considering only the points of maxima of the two diagrams. • To obtain shape rotation invariance, invariance of scale space diagram to horizontal shifting must be assured. Peaks are aligned to the zero of the diagram and the others are shifted accordingly. Matching with CBD • • • Applicable to (a) Distinguishes differences in (b) Find similarities in (c) - (e) • Advantages: ‒ Captures the shape very well ‒ Robust to the noise, scale, and orientation ‒ It is fast and compact 69 RBD versus CBD • • Blue: Similar shapes by Region-Based Yellow: Similar shapes by Contour-Based 70 Global Curvature Vector • Global Curvature Vector (GCV) s pecifies global parameters of the contour, namely the Eccentricity and Circularity: circularit y perimeter 2 for a circle, circularity is area eccentrici ty C circle ( 2 r ) r 2 4 . 2 2 2 2 i 02 (y y 2 2 2 i11 (x x i 20 (x x i 20 i 02 i 20 i 02 2 i 20 i 02 4 i11 i 20 i 02 i 20 i 02 2 i 20 i 02 4 i11 c c ) ) 2 c 2 )( y y c ) What applications • Region Shape descriptor makes use of all pixels constituting the shape within a frame and can describe any shapes. – It is characterized by small size, fast extraction time and matching. The data size for this representation is fixed to 17.5 bytes. – The feature extraction and matching processes have low order of computational complexities, and are suitable for tracking shapes in the video data processing. • Contour Shape descriptor captures perceptually meaningful features of the shape enabling similarity-based retrieval. – It is robust to non-rigid motion. – It is robust to partial occlusion of the shape. – It is robust to perspective transformations, which result from the changes of the camera parameters and are common in images and video