1. introduction - Microsoft Research

Improving Retrieval Performance Using Region Constraints Tao Wang Yong Rui Jia-Guang Sun Dept of Computer Sci. and Tech. Tsinghua University Beijing 100084, P. R. China Microsoft Research One Microsoft Way Redmond, WA 98052, USA Dept of Computer Sci. and Tech. Tsinghua University Beijing 100084, P. R. China wangtao72@263.net yongrui@microsoft.com sunjg@ mail.tsinghua.edu.cn ABSTRACT There are two possible ways to narrow down the semantic gap difficulty we are facing in image retrieval. One is to use relevance feedback to leverage users’ knowledge and the other is to develop more expressive features and more accurate similarity models. In this paper, we primarily focus on the visual feature and similarity model aspect, but we also present our initial results of integrating relevance feedback. Specifically, this paper proposes a constraintbased region matching approach to image retrieval. Unlike existing region-based approaches where either individual regions are used or only first-order constraints are modeled, the proposed approach formulates the problem in a probabilistic framework and simultaneously models both first-order region properties and second-order spatial relationships for all the regions in the image. We present a complete system that includes image segmentation, local feature extraction, first- and second-order constraints, and probabilistic region weight estimation. Furthermore, we discuss how we can further improve the retrieval performance by integrating a Wiener filter based relevance feedback. Extensive experiments have been carried out on a large heterogeneous image collection with 17,000 images. The proposed approach achieves significantly better performance than the state-of-the-art approaches. Keywords content-based image retrieval, region matching, probabilistic weight estimation, relevance feedback, and similarity model. 1. INTRODUCTION Content-based image retrieval (CBIR) is the process of retrieving images based on visual features that are automatically extracted from images. It has been one of the most active research areas in computer science during the past decade. There exist rich literature on CBIR, including QBIC [14], PhotoBook [16], VisualSEEK [24], MARS [19], Netra [12], BlobWorld [1] and SIMPLIcity [11], among many others. This is by no means a complete list, but rather an illustration on how active this area has been. For recent surveys on CBIR systems and techniques, refer to Rui et al. [21] and Smeulders et al. [23]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference ’02, Dec. 1-6, 2002, Juan Les Pins, France. Copyright 2002 ACM 1-58113-000-0/00/0000…$5.00. Despite years of extensive research, assisting users to find their desired images accurately and quickly is still an open problem. The main challenge is the semantic gap between high-level concepts users want and low-level features that we can extract. According to a recent user study result [18], what average users really want are systems that support queries based on high-level concepts (e.g., an apple on a table), rather than low-level features (e.g., 30% red color and a blob with vertical edges). While closing the semantic gap is still a far cry given today’s technology, there are two possible ways to narrow it down. One is to introduce relevance feedback into the system and take advantage of users’ knowledge to guide the search, e.g., [2] and [20], and the other is to directly improve the similarity models and features themselves used in the system. The combination of better relevance feedback and better similarity models/features will bear successful fruit. In this paper, we primarily focus on the similarity model and visual features, but we also present our initial results integrating the two. In most existing systems, global features such as global color histogram, texture and shape are used [14][24][20]. While they are easy to implement, they often fail to narrow down the semantic gap because of their limited description power. Local features, on the other hand, have strong correlations with realworld objects. Compared with global features, local features provide a big step towards semantic-based retrieval. Existing local-feature-based approaches fall into three categories: fixed-layout-based [26], salient-point-based [6][27] and regionbased approaches [1][7][9][13][29]. The region-based approach so far has been the winning approach among various localfeature-based approaches. This approach first segments an image into multiple regions that have high correlation with real-world objects. Then the total similarity between two images is calculated based on all the corresponding regions. Because of imperfect image segmentation (e.g., over or under segmentation), care must be taken when comparing two images. The Netra system compares images based on individual regions [3][12]. Although queries based on multiple regions are allowed, the retrieval is done by merging individual single-region query results. This approach is therefore less robust to imperfect segmentation. The SaFe system, on the other hand, uses a 2D-string approach [25]. While this system is more robust than Netra, it is sensitive to region shifting. Xu et al. takes a different approach [30]. They represent visual objects (e.g., cars) as composite nodes in a hierarchical tree scheme. Compared with previous approaches, this approach is more robust, but less generalizable to other domains. A more recent technique, integrated region matching (IRM), is developed by Li et al.[11]. This approach reduces the influence of inaccurate S ( R1 , R2 )  i 1  j 1 wij S (ri , r j' ) segmentation by allowing a region to be matched to multiple regions from another image. It, however, suffers from a greedy algorithm when estimating inter-region weights, and does not take into account the spatial relationship between regions in an image. In this paper, based on spatial constrains between image regions, a novel probabilistic region matching technique is presented. Different from other approaches, we take into account not only the first-order (e.g., region features) constraints but also the second-order ones (e.g., spatial relationship between them). More importantly, unlike other ad hoc approaches, our similarity model between two images is based on a principled probabilistic framework. The proposed approach achieves significantly better retrieval performance than the best existing approaches. Furthermore, we present a framework that incorporates Wiener filter based relevance feedback into our region based image retrieval. The rest of the paper is organized as follows. For related work in Section 2, we focus on the integrated region matching (IRM) approach[11]. It is one of the best approaches available and the one that we will compare against in this paper. In Section 3, we give detailed descriptions of our proposed constraint-based region matching (CRM) approach. In Section, we present a framework that integrates Wiener filter relevance feedback into CRM. In Section 5, extensive experiments over a large heterogeneous image collection with 17,000 images are reported. Concluding remarks are given in Section 6. M s.t.   M N i 1 j 1 N wij  1 where weight wij indicates the importance of region pair ri and rj with respect to the overall similarity. This region matching scheme is illustrated in Figure 1. Every vertex in the graph corresponds to a region in an image. If two regions ri and rj are matched, the two vertices are connected by an edge with a weight wij. It becomes clear that estimating wij is a key step for the region matching algorithm. IRM uses the “most similar highest priority” (MSHP) principle to estimate wij, i.e., iteratively gives the largest priority to the region pair having minimum distance. Let the normalized importance of ri and rj be pi and pj, the following constraints hold:  M i 1  M i 1 pi  j 1 p j  1 (2)  (3) N wij  p 'j , N j 1 1. 2. Initially, set the assigned region set L = { }. Let the unassigned region set E = {(i,j) | i =1, 2, …, M; j=1, 2 ,…,N}. Choose the minimum dij for (i,j)  E-L. Label the corresponding (i, j ) as (i ' , j ' ) Among various existing region-based approaches discussed in Section 1, IRM [11] is one of the best, and is the one we will compare against in this paper. Compared with other approaches, IRM is more accurate in similarity calculation and more robust to imperfect image segmentation by allowing a region to be matched to multiple regions from another image. 3. min( pi ' , p' j ' )  wi ' j ' 4. If 5. p i '  min( pi ' , p' j ' )  pi ' After image segmentation, the overall similarity between two images can be calculated based on all the regions in the two images. Let images 1 and 2 be represented by region sets R1 = {r1, r2,…, rM} and R2 = {r1, r2, …, rN} respectively, shown in Figure 1, where M and N are the number of regions in the two images. Let the similarity between regions ri and rj be S(ri, rj), where ri is the ith region in R1 and rj is the jth region in R2. To increase region matching’s robustness against inaccurate segmentation, a region in one image can be matched by multiple regions from another image. The total similarity between two images can therefore be defined as the similarity between the two region sets S(R1,R2): 6. p' j '  min( pi ' , p' j ' )  p' j ' r2 r3 … w11 w12 r1 rM r3 r4 r5 … p i '  p ' j ' , set wi ' j  0, j  j ' ; otherwise set wij '  0, i  i ' . 7. L + { (i ' , j ' ) } L 8. If  M i 1 pi  0 and  N j 1 p' j  0 , then go to Step 2; otherwise stop. In IRM, pi is defined as the area percentage of region ri in R1, pj is the area percentage of region rj in R2, and dij is the distance between ri and rj. While IRM makes a significant step in regionbased matching, its accuracy and robustness suffers from the fact that it does not take into account the spatial relationships between regions. In addition, it uses a greedy algorithm to estimate interregion weight wij, which again reduces the robustness of the similarity model. For example, the greedy algorithm can easily be trapped into local minimum, and all the subsequent weight estimation becomes un-reliable (see Section 3.5). 3. CONSTRAINT-BASED REGION MATCHING (CRM) wMN r2 wij  pi The IRM approach can be summarized as follows: 2. RELATED WORK r1 (1) rN Figure 1. region matching between region sets R1,R2 To improve the robustness of region matching, we propose a new technique that depends on not only the region properties but also the spatial relationships between regions. More importantly, it formulates the weight estimation problem in a probabilistic framework and solves the problem in a principled way. In the rest of this section, we present detailed description of our system including image segmentation, region feature extraction, first- and second-order constraints, and probabilistic matching of regions sets. e 3.1 Image Segmentation Image segmentation tries to divide an image into regions that have strong correlations with real-world objects. Our goal in this paper is not to achieve perfect segmentation, but rather to develop good region matching techniques that are robust to imperfect segmentation. We use an improved region growing approach to segment images based on HSV color histograms. The approach is summarized as follows:  Use a Gaussian filter ( = 2) to smooth out image noise and small texture effects.  Perform a pixel gradient operation, and use the local minima as starting seeds.  Start growing regions around each of the starting seeds. Neighboring pixels are examined and added to a region if they are similar, in HSV color space, to the region and no edges are detected between them.  (3) Shape feature: The shape feature is region’s eccentricity e, which is the ratio of the minor axis length Imin and the major axis length Imax of the best-fit ellipse of the region [10]. If the HSV color histograms of two adjacent regions are similar, merge the two regions. More sophisticated segmentation techniques can be used, e.g., Pavlidis and Liow[15], and Felzenszwalb et al. [4]. But that is beyond the scope of this paper. An example segmentation result is shown in Figure 2. 2 2 I min u 20  u 02  (u 20  u 02 )  4u11  [0,1] I max u  u  (u  u ) 2  4u 2 20 02 20 02 11 where u  p ,q   ( x  x) p (6)  ( y  y ) q , and ( x , y ) is the center of ( x , y )region the region. Given that regions may not have one-to-one correspondence with real-world objects, litter benefit will be gained by using more sophisticated shape feature, such as region’s contour and its Fourier descriptor [5]. (4) Position feature: The position of the region is the normalized region center:  O( x y , ) W H (7) where W and H are image width and height. 3.3 Regional and Spatial Constraints Object and relationship modeling has been extensively studied for the past three decades in spatial layout planning [17][8]. Pfefferkorn [17] defines three important components: (1) Objects. Examples are tables and floor. (2) Specified areas. They encode domain knowledge of the position of objects, e.g., floor is at the bottom. (3) Spatial relationship constraints. For example, a table is on top of a floor. Because the first two constraints concern only about the individual objects, we call them first-order constraints. Similarity, we call the spatial relationship constraints second-order constraints, as they involve two objects. Even though the above concepts were first developed in spatial layout planning, they apply equally well in image content modeling. Let ri, rk be regions in image 1 and rj, rl be regions in image 2. The constraints and their associated region similarities are defined as follows: (1) Region property constraint (a) (b) Figure 2. A landscape image (a) and its segmentation (b)  Color: The matched regions, i.e., ri, rj , should have the similar colors.   S c (ri , r ' j )  exp(  || C i  C ' j || 2 / 2 c ) 3.2 Region Features After image segmentation, the color, size, shape and position features of each region are extracted to represent the content of an image. The spatial relationship between regions is not stored, as it can be derived from the size, shape and position features. (1) Color feature: The color feature of a region is its mean color in HSV color space.  C  (s * sin( h ), s * cos(h ), v ) (4) Area of the region Area of the image (8) where c controls the penalty of difference.  Shape: The matched regions, i.e., ri, rj , should have similar basic shapes. Se (ri , r ' j )  exp(  || ei  e j ||2 / 2 e2 ) (9) where e controls the penalty of difference. (2) Region position constraint (2) Size feature: The size feature is represented by the area percentage of the region to the whole image.  2 (5)  Position: The matched regions, i.e., ri, rj , can have similar positions.   S p (ri , r ' j )  exp(  || O i  O' j || 2 / 2 p2 ) (10) where p controls the penalty of difference. S ( R1 , R2 )  i 1  j 1 wij S1 (ri , r j' ) M (3) Spatial relationship constraint  s.t. Orientation: The orientation of a pair of regions ri, rk in image 1 should be similar to that of the pair of matched regions rj, rl in another image2 (See Figure 4). S o (ri , rk , r ' j , r ' l )  (    (O i  O k )  (O' j  O' l )      1) / 2 (11) || O i  O k ||  || O' j  O' l ||  Inside/outside: If ri is inside/ outside rk in image 1, the matched region rj should be inside/outside rl in image 2. S i (ri , rk , r ' j , r ' l )  (ri in/out rk ) XOR (r ' j in/out r ' l )  (12) Size Ratio: The area ratio of a pair of regions ri, rk in image 1 should be similar to the area ratio of the pair of matched regions rj, rl in image 2. S s (ri , rk , r ' j , r ' l ) ( ( i ,  k )  ( ' j ,  'l ) || (  i ,  k ) ||  || (  ' j ,  ' l ) ||  1) / 2 (13) The total similarity based on the first-order constraints, e.g., (8)(10), for two regions ri, rj is defined as: S1 (ri , r ' j )  wc S c (ri , r ' j )  we S e (ri , r ' j )  w p S p (ri , r ' j ) (14) s.t. wc  we  w p  1 Similarly, the total similarity based on the second-order constraints, e.g., (11)-(13), is S 2 (ri , rk , r ' j , r 'l )  wo S o ( ri , rk , r ' j , r 'l )  wi S i (ri , rk , r ' j , r 'l )  ws S s (ri , rk , r ' j , r 'l ) N i1  j 1 wij  1 M (16) N It is therefore clear that the weight wij plays an important role in determining the overall similarity value. As discussed in Section 2, there are several problems with how IRM estimates the weights. First, it ignores the second-order constraints. Second, it uses a greedy search algorithm, which reduces matching robustness. To address these issues, we cast the weight estimation problem into a probabilistic framework and solve the problem in a principled way. Note that the similarity between two entities can be interpreted as the probability of the two entities being similar. Note also that all the similarities defined in (8) – (15) are in the range of [0, 1], and can readily be used to estimate probabilities. Let x ~ y and x  y denote that x matches y based on the first-order and second-order constraints, respectively. We therefore have P(ri ~ r ' j )  S1 (ri , r ' j ) (17) P(ri  r ' j , rk  r 'l | ri ~ r ' j , rk ~ r 'l ) (18)  S 2 (ri , rk , r ' j , r 'l ) where   stands for “can be estimated by”. That is, the probability that ri matches rj in terms of the first-order constraints can be estimated by S1(ri, rj). Similarly, the probability that ri and rk match rj and rl in terms of the second-order constraints, given ri matches rj and rk matches rl in term of the first-order constraints, can be estimated by S2(ri, rk, rj, rl). It is intuitive that a region pair that has high similarity should receive high weight in the similarity model [11]. P ( ri ~ r ' j , ri  r ' j ) is therefore a good estimate for wij. We have wij  Wij / i 1  j 1 Wij M N Wij  P(ri ~ r ' j , ri  r ' j ) (15) s.t. wo  wi  ws  1  k 1 l 1 P(ri ~ r ' j , rk ~ r ' l , ri  r ' j , rk  r ' l ) M  k 1 l 1 P(ri  r ' j , rk  r ' l | ri ~ r ' j , rk ~ r ' l ) M  k 1 l 1 S 2 (ri , rk , r ' j , r ' l ) P(ri ~ r ' j ) P(rk ~ r ' l ) M 3.4 Probabilistic Weight Estimation The overall similarity between two region sets, thus two images, is defined as follows: N P(ri ~ r ' j , rk ~ r ' l ) where wc, we, wp, wo, wi, and ws are proper weights. For all the constraints (8)-(13), it is worth mentioning that they are translation and scaling invariant, except (10). While it is application and user dependent if the invariance is desired, our proposed similarity model works in both cases. For example, it can easily handle the invariance requirement by relaxing constraint (10). Detailed experiments on the robustness of the similarity model are reported in Section 4.5. N N  k 1 l 1 S 2 (ri , rk , r ' j , r ' l ) S 1 (ri , r ' j ) S 1 (rk , r ' l ) M N where in the above derivation we have used Bayes rule and the independence assumption between region pairs ri and rj , and rk and rl. Examining this new weight estimation technique, we can see that it seamlessly integrates both first-order and second-order constraints. More importantly, it avoids the greedy algorithm used in IRM. Instead, it uses a principled way to estimate the weights based on probabilities, which is more robust to inaccurate image segmentation. regions. 3.5 Illustrations and Simulations In this sub-section, we illustrate the effectiveness of the proposed CRM approach by using two synthetic images. Shown in Figure 3, image 2 is an over-segmented version of image 1, shifted 25% to the right and scaled 25% down. To highlight CRM’s probabilistic weight estimation framework and the second-order constraints, we assign the same color to all the regions in the two images. Let r0 and r0 represent the background regions in the two images, Tables 1 and 2 show the estimated weights, based on CRM and IRM respectively. The following observations can be made:   Because CRM takes into account the second-order constraints, it is much more robust to translation and scaling changes. For example, in CRM, r1 is well matched to both r1 and r2. But in IRM, r1 is not matched to r2 at all. Because CRM estimates the weights using a probabilistic framework, it is much less likely to be trapped into a local minimum. For example, in CRM, even though other regions pose distractions, r3 can still match well with r4. On the other hand, IRM uses a greedy algorithm, i.e., MSHP, to estimate the weights. It is therefore easily trapped into local minima. For example, r3 is matched to r2 instead of r4 because of the local minimum created by the background Similar results are obtained when we assign different colors to different corresponding regions, e.g., assign red to regions r1 , r1 , and r2, and assign yellow to regions r3 and r4. The above simulations clearly demonstrate the advantage of CRM over IRM. 4. RELEVANCE FEEDBACK As discussed in Section 1, two important techniques can narrow down the semantic gap in image retrieval. One is relevance feedback and the other is more expressive features and more accurate similarity models. In Section 3, we have presented in detail how CRM supports a better image similarity model. In this section, we will present our initial attempt to integrating the two, i.e., a relevance feedback enhanced CRM. Let R1 represent the region set in the query image and R2 represent the region set of an image in the database. Refer to (16) – it is clear that in Section 3’s definition of similarity, we have assumed that all the regions in R1 are of equal importance to a user. This is a reasonable assumption to start with, but in reality a user may have different emphases on different regions. For example, he or she may be more interested in some part of the image than the other. This is where the relevance feedback can come into play. Let’s first define a more generic similarity between the query image and an image in the database: S ( R1 , R2 )  i 1 M r2 r1 r1 r2 α α r3 r0 r5 r0 Image 1 Image2 Figure 3. Two images with segmented region sets R1, R2. Table 1. Estimated weights using CRM r0 r1 r2 r3 r4 r5 r0 0.171 0.000 0.000 0.000 0.000 0.000 r1 0.000 0.087 0.060 0.025 0.024 0.014 r2 0.000 0.035 0.040 0.070 0.037 0.012 r3 0.000 0.028 0.042 0.034 0.056 0.044 r4 0.000 0.011 0.022 0.017 0.053 0.106 Table 2. Estimated weights using IRM r0 r1 r2 r3 r4 r5 r0 0.790 0.000 0.000 0.000 0.000 0.000 r1 0.043 0.027 0.000 0.000 0.015 0.000 r2 0.014 0.000 0.000 0.014 0.000 0.000 r3 0.000 0.000 0.032 0.000 0.019 0.000 r4 0.000 0.000 0.000 0.012 0.000 0.024 N j 1 qi wij S1 (ri , r j' )  i 1 qi si M r3 s.t. r4 r4   M i 1 qi  1, (19)   M N i 1 j 1 wij  1 where the term qi models a user’s degree of interest on region i , and si is the aggregated similarity of region i. The retrieval system can learn the interest weights qi using relevance feedback. It has been proven in [28] that the adaptive least-mean-square (LMS) Wiener filter outperforms other linear learning-based relevance feedback techniques on a large-scale database. It has the additional benefit of supporting on-line learning [28]. This is particular useful in that when a new example arrives, the algorithms can learn without starting from scratch. Before we go into details of the LMS Wiener filter algorithm,  we’d like first to define some notations. Refer to Figure 4. X (n ) is the input signal to both the underlying system to be simulated and our simulator – adaptive Winer filter. The output from the underlying unknown system is d(n), and the output from the   Wiener filter is y (n)  W (n) T X (n) . The difference between y(n) and d(n) is the error signal e(n). Our goal is to use e(n) to adjust  the weights W (n ) such that the error is minimized. The LMS Wiener filter algorithm can be summarized as follows: (A) Initialization: Choose step size 0    2 , and set the filter coefficients to  W (0)  [1 / M ,1 / M ,...,1 / M ]T (B) (20) For each n = 1, 2, …, N feedback samples, 1. Compute the distance y(n) based on the current weights   (21) y ( n)  W T ( n) X ( n) 2. Compute the error signal 3. e( n )  d ( n )  y ( n ) (22) Note that compared with standard Wiener filters, we have an extra step to convert from the similarity to distance d(n). Compute the updated weights   W (n)  W (n  1)    X (n)e(n) T  a  X ( n) X ( n) (23) Where a is a small positive constant to avoid denominator to be 0. Because real-world users want to retrieve images based on highlevel concepts, not low-level features (Rodden, K., et al[18]), the ground truth we use in the experiments is based on high-level categories. That is, we consider a retrieved image as a relevant image only if it is in the same category as the query image. This is a difficult task to tackle, but this is what users want, thus our ultimate goal. The Corel data set have also been used in other systems and relatively high retrieval performance has been reported. However, those systems only use pre-selected categories with distinctive visual characteristics (e.g., cars vs. mountains). In our experiments, no pre-selection is made. We believe that only in this manner can we obtain practical and useful evaluation. In our specific context, we have  W  [q1 , q 2 ,..., q M ]T ,  X  [ s1 , s 2 ,..., s M ]T , d(n)= S(R1,R2). 5.2 Queries The LMS Wiener filter relevance feedback algorithm is elegant in theory, easy to implement and requires very little computation [28]. Detailed experiments on the performance of this technique are reported in Section 5.6. 5. EXPERIMENTS To validate the efficacy of the relevance-feedback enhanced CRM, in this section, we will carry out two sets of experiments using large-scale real-world images. First, we will examine and compare the retrieval performance of CRM against one of the best existing approaches, IRM. We then compare the relevancefeedback enhanced CRM against the plain CRM. 5.1 Data Set For all the experiments reported in this section, the Corel image collection is used as the test data set. We choose this data set due to the following considerations:  It is a large-scale data set. Compared with the data sets used in some systems that contain only a few hundred images, the Corel data set includes 17,000 images.  It is heterogeneous. Unlike the data sets used in some systems that are all texture images or car images, the Corel data set covers a wide variety of content from animals and plants to natural and cultural images.  Corel professionals and there are 100 images in each category. Some existing systems only test a few pre-selected query images. It is not clear if those systems will still perform well on other notselected images. To fairly evaluate the retrieval performance of different techniques, we randomly generated 400 queries for each retrieval condition. For all the performance data reported in this section, they are the average of the 400 query results. 5.3 Performance Measures The most widely used performance measures for information retrieval are precision (Pr) and recall (Re) [22]. Pr is defined as the number of retrieved relevant objects over the number of total retrieved objects, say the top 20 images. Re is defined as the number of retrieved relevant objects over the total number of relevant objects in the image collection (in the case of Corel data set, 99). The performance of an "ideal" system is to have both Table 3. Retrieval performance comparison between CRM and IRM with 400 queries on 17,000 images. Precision (percentage) Return top 20 Return top 100 Return top 180 CRM 17.30 9.83 6.91 IRM 14.06 5.87 4.95 Increase 23.04 67.46 39.60 It is professional-annotated. Instead of using the less meaningful low-level features as the evaluation criterion, the Corel data set has human annotated ground truth. The whole image collection has been classified into many categories by  X (n) System Adaptive Filter d(n) y(n) e(n) Figure 4. Human vision system model + Σ e(k) Figure 5. Comparison between CRM and IRM. The Solid curve represents CRM, and dashed curve represents IRM. high Pr and high Re. Unfortunately, Pr and Re are conflicting Table 4. Retrieval performance comparison between relevance feedback enhanced CRM and plain CRM with 400 queries on 17,000 images. Precision (percentage) Return top 20 Return top 100 Return top 180 0 feedback 17.30 9.83 6.91 1 feedback 18.73 11.69 7.20 2 feedbacks 19.83 12.10 7.53 quite different spatial layout from the query image, but are still returned as similar images. 5.5 Robustness of CRM Images with similar content can be taken at quite different conditions. An important criterion for a good similarity model is therefore its robustness to different imaging conditions and image alterations. We have performed extensive tests on this aspect and a retrieval example is shown in Figure 8. The query images are the original image (Figure 8(a)) and the altered images (Figure 8(b)-8(l)). The different imaging conditions include corruption by 40% Gaussion noise, sharpened 20%, blurred 20%, 25% less saturated, 25% more saturated, brightened 20%, and darkened 10%. The image alterations include translation 25%, scaling up 40%, and rotation of 20 degrees. It can be seen that the proposed CRM technique is robust to both different imaging conditions and image alterations. 5.6 Relevance Feedback Enhanced CRM Figure 9. Comparison between CRM with and without feedback. The Solid curve represents 0 feedback, dashed curve represents 1 iteration of feedback and dotted curve represents 2 iterations of feedback. criteria and can not be at high values at the same time. Because of this, instead of using a single value for Pr and Re, the Pr(Re) curve is normally used to characterize the performance of a retrieval system. 5.4 Comparisons Between CRM and IRM Table 3 compares the image retrieval results using CRM and IRM, when the top 20, 100, and 180 most similar images are returned. To better compare the two approaches, we also plot the Pr(Re) curve for CRM and IRM in Figure 5. Because CRM simultaneously takes into account both first-order constraints, e.g., region properties and positions, and second-order constraints, e.g., spatial relationships, it is more accurate in modeling image content and more robust to imperfect image segmentation. It is 20% - 60% (in relative percentage) better than IRM in terms of retrieval accuracy. Figures 6 and 7 show example comparisons between CRM and IRM, where the top-left image is the query image. In Figure 6, a sunset image is submitted as a query. Because CRM models spatial relationships between regions, e.g., the sun is inside the lower part of the sky and the landscape is underneath the sky, it achieves good retrieval results (see Figure 6(a)). On the other hand, IRM only models the first-order constraints and uses a greedy weight-estimating algorithm. It therefore does not have enough description power to distinguish generic orange-color images (e.g., the first, third, fourth, fifth and sixth images in the second row of Figure 6(b)) from the query image, which has a well-defined inter-region spatial relationship. Similar observations can be made in Figure 7. For example, the third, fifth and sixth images in the second row of Figure 7(b) have Table 4 shows the comparison between relevance feedback enhanced CRM and plain CRM when the top 20, 100, and 180 most similar images are returned, and Figure 9 shows their precision-recall curve after two iterations of relevance feedback. Based on the table and figure, the following observations can be made:  With more feedback iterations, the retrieval performance increases.  The performance increase in the first iteration is the biggest. This is important because users prefer to have as few iterations as possible. It is clear that combining relevance feedback with CRM is a good direction to go, where relevance feedback leverages user’s knowledge and CRM provides more expressive features and more accurate similarity model. An example relevance feedback result is shown in Figure 10. 6. CONCLUSIONS In this paper, we have proposed a novel constraint-based region matching approach to image retrieval. This approach is based on a principled probabilistic framework and simultaneously models both first-order region properties and second-order spatial relationships. Simulations and real-world experiments show that its retrieval performance is significantly better than a state-of-theart technique, i.e., IRM. Furthermore, we have presented our initial results on how we can further improve the retrieval performance by integrating a Wiener filter based relevance feedback. As for our future work, we believe that both localfeature-based approaches, e.g., IRM and CRM, and relevance feedback techniques are required to narrow down the semantic gap in image retrieval. The local features provide good object modeling power, and relevance feedback brings in users’ knowledge to guide the search. The combination of the two will bear successful fruits. We are currently developing a unified framework to integrate relevance feedback with CRM to provide users with more accurate, robust and adaptive retrieval experience. 7. ACKNOWLEDGMENTS The first author wants to thank Prof. Wenping Wang for his helpful comments and discussions about the paper. The first author is supported in part by China NSF grant 69902004 and the National Project 2001BA201A07. The Corel image set was obtained from Corel and used in accordance with their copyright statement. Recognition using Dynamic Programming, IEEE trans PAMI, vol.10, pp. 257-266, 1988. [6] Gouet, V., and Boujemaa, N. Object-Based Queries Using Color Points of Interest. Proc. of IEEE Workshop on Content-Based Access of Images and Videos (CBAIVL), Kauai, Hawaii, 2001. [7] Kam, A. H., Ng, T.T., Kingsbury, N.G., and Fitzgerald, W.J. REFERENCES [1] Carson, C., Belongie, S., Greenspan, and H., Malik, J. Region-based image querying, Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries, pp. 4249, 1997. [2] Cox, I.J., Miller, M.L., Minka, T.P., Papathomas, T.V., and Yianilos, P.N. The Bayesian image retrieval system, Pichunter: theory, implementation, and psychophysical experiments. IEEE Trans. on Image Processing, Vol. 9(3), pp. 524 –524, 2000. [3] Deng, Y., and Manjunath, B.S. An efficient low-dimensional color indexing scheme for region-based image retrieval, Proc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3017-3020, 1999. [4] Felzenszwalb, P. F., and Huttenlocher, D. P. Image Content based image retrieval through object extraction and querying, 2000. Proc. IEEE Workshop on Content-based Access of Image and Video Libraries (CBAIVL), pp. 91-95, 2000. [8] Kazuyoshi, H., and Fumio, M. Constraint-based approach for automatic spatial layout planning, Proceedings of the 11th Conference on Artificial Intelligence for Applications, Los Angeles, CA, USA, pp. 38-45, 1995. [9] Ko, B.C., Lee, H.S., Byun, H. Region-based image retrieval system using efficient feature description, Proc. of 15th International Conference on Pattern Recognition, vol.4, pp. 283-286, 2000. [10] Leu, J.-G. Computing a shape moments from its boundary, Pattern Recognition, vol.10, pp. 949-957, 1991. segmentation using local variation, Proc. of IEEE Computer Vision and Pattern Recognition, Santa Barbara, CA, USA, pp. 98-104, 1998. [11] Li, J., Wang, J.Z., and Wiederhold, G. IRM: Integrated [5] Gorman, J.W., Mitchell,O.R., Kuhl, F.P. Partial Shape [12] Ma, W.Y., and Manjunath, B.S. Netra: A Toolbox for (a) Region Matching for Image Retrieval. Proceedings ACM Multimedia 2000, Los Angeles, CA, pp. 147-156, 2000. (b) Figure 6. Retrieving sunset images, where (a) uses CRM and (b) uses IRM. (a) (b) Figure 7. Retrieving car images, where (a) uses CRM and (b) uses IRM. navigation Large Image Database, Proc. of IEEE ICIP, pp568-571, 1997. organization by similarity assist image browsing, Proc. ACM Compute-Human Interaction (CHI), pp. 190-197, 2001. [13] Moghaddam, B., Biermann, H., and Margaritis, D. Image [19] Rui, Y., Huang, T., and Mehrotra, S. Content-based image retrieval with local and spatial queries, Proc. IEEE ICIP 2000,, Vol. 2, pp. 542-545, 2000. retrieval with relevance feedback in MARS. Proc. IEEE Int. Conf. on Image Processing, 1997. [14] Niblack, W. et al. The QBIC project: querying images by [20] Rui, Y., and Huang, T.S. Optimizing learning in image content using color, texture and shape, In SPIE Proc, Storage and Retrieval for Image and Video Databases, , San Jose, vol. 1908, pp.1723-187, 1993. retrieval. Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR2000), Vol. 1, pp. 236 –243, 2000. [15] Pavlidis,T., and Liow,Y.T. Integrating Region Growing and [21] Rui, Y., Huang, T.S., and Chang, S. F. Image retrieval: Edge Detection. IEEE Trans Pattern Analysis and Machine Intelligence, vol. 12(3), pp.225-133, 1990. current techniques, promising directions and open issues, Journal of Visual Communication and Image Representation Vol. 10, pp.39-62, 1999. [16] Pentland, A., Picard, R., and Sclaroff, S. Photobook: Content-based manipulation of image databases. Int. J. Comput. Vis., 18(3): pp. 233-254, 1996. [22] Salton, G., and McGill, M. J. Introduction to modern information retrieval, McGraw-Hill Book Company, New York, 1982. [17] Pfefferkorn, C. A heuristic problem solving design system for equipment or furniture layouts, Communications of the ACM, vol. 18 (5), pp.286-297, 1975. [23] Smeulders, A. W. M., Worring, M., and Santini, S. Content- [18] Rodden, K., Basalaj, W., Sinclair, D., and Wood, K. Does based image retrieval at the end of the early years, IEEE Trans. PAMI, Vol. 22(12), pp. 1349 -1380, 2000. (a). Original image (b). Shifted left 25% (c). Scaled up 40% (d). Rotated +20 degrees (e). Rotated -20 degrees (f). Corrupted by 40% Gaussian noise (g). Sharpened by 20% (h). Blurred by 20% (i). 25% less saturated (j). 25% more saturated (k). Brightened by 20% (l). Darkened by 10% Figure 8. Illustration of CRM’s robustness to both different imaging conditions and image alterations. (a) (b) (c) Figure 10. An relevance feedback example with CRM. (a) is with no feedback, (b) is with one iteration of feedback, and (c) is with two iterations of feedback. [24] Smith, J.R., and Chang, S.F. VisualSEEK: A Fully Automated Content-Based Image Query System, ACM Multimedia 1996, Bostan MA, 1996. [25] Smith, J.R., and Chang S.F. SaFe: a general framework for integrated spatial and feature image search, IEEE First Workshop on Multimedia Signal Processing, pp.301-306, 1997. [26] Tian, Q., Wu, Y., and Huang, T. S. Combine user defined region-of-interest and spatial layout for image retrieval, IEEE International Conference on Image Processing, Vancouver, BC, vol. 3, pp. 746-749, 2000. [27] Tian, Q., Sebe, N., Lew, M. S., Loupias, E., and Huang, T. S. Image Retrieval Using Wavelet-Based Salient Points, in Journal of Electronic Imaging, Special Issue on Storage and Retrieval of Digital Media, Vol. 10(4), pp. 835-849, 2001. [28] Wang, T., Rui Y., and Hu, S.M. Optimal Adaptive Learning for Image Retrieval, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR2001), Kauai, Hawaii, December 11-13, pp. 1140-1147, 2001. [29] Wang, W., Song, Y., and Zhang, A., Semantics Retrieval by Content and Context of Image Regions, Proc. the 15th International Conference on Vision Interface, Calgary, Canada, May 27-29, 2002. [30] Xu, Y., Duygulu, P., Saber, E., Tekalp, A. M., and YarmanVural, F.T. Object based image retrieval based on multi-level segmentation, Proc. IEEE ICASSP, Istanbul, Turkey, vol. 6, pp. 2019-2022, 2000.

1. introduction - Microsoft Research

Related documents

Products

Support

1. introduction - Microsoft Research

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib