Noise Resistant Graph Ranking for Improved Web Image Search Wei Liu† Yu-Gang Jiang† Jiebo Luo‡ Shih-Fu Chang† † Electrical Engineering Department, Columbia University, New York, NY, USA {wliu,yjiang,sfchang}@ee.columbia.edu ‡ Kodak Research Laboratories, Eastman Kodak Company, Rochester, NY, USA jiebo.luo@kodak.com Abstract pseudo visual query set … In this paper, we exploit a novel ranking mechanism that processes query samples with noisy labels, motivated by the practical application of web image search re-ranking where the originally highest ranked images are usually posed as pseudo queries for subsequent re-ranking. Availing ourselves of the low-frequency spectrum of a neighborhood graph built on the samples, we propose a graphtheoretical framework amenable to noise resistant ranking. The proposed framework consists of two components: spectral filtering and graph-based ranking. The former leverages sparse bases, progressively selected from a pool of smooth eigenvectors of the graph Laplacian, to reconstruct the noisy label vector associated with the query sample set and accordingly filter out the query samples with less authentic positive labels. The latter applies a canonical graph ranking algorithm with respect to the filtered query sample set. Quantitative image re-ranking experiments carried out on two public web image databases bear out that our re-ranking approach compares favorably with the state-ofthe-arts and improves web image search engines by a large margin though we harvest the noisy queries from the topranked images returned by these search engines. image graph construction spectral decomposition … graph ranker spectral filter smooth eigenbasis filtered visual query set re-ranked search result list Figure 1. The flowchart of the proposed noise resistant graph ranking framework for web image search re-ranking. Our approach takes the noisy search results returned by a web image search engine as pseudo visual queries, and uses a spectral filter to produce a cleaner set of visual queries which are then cast into a graph ranker to output the re-ranked (better) search results. Clearly, text-based image search yields unsatisfactory search results due to the difficulty in automatically associating the content of images with text. Thus, researchers started to design algorithms incorporating visual features of images to improve the search results of text-based image search engines. The problem, generally referred to as web image search re-ranking, studies mechanisms for reranking the images returned by web search engines so that the visually relevant images can appear prominently higher. Some recent works along this direction [1][14][2][7][6][10] have used the visual features for re-ranking web images, while [3][18][11][15] used hybrid textual+visual features. In the meantime, a surge of efforts have been made in theory and algorithms for graph-based learning, especially in spectral clustering [17] and semi-supervised learn- 1. Introduction Nowadays, image search has become an indispensable feature of all popular web search engines such as Google, Yahoo! and Bing as well as most photo sharing websites such as Flickr, Photobucket and ImageShack. However, these image search engines are not sufficiently effective because images are too complicated to handle. On account of the remarkable success of text retrieval for searching web pages, most of the search engines still return images solely based on the associated or surrounding text on the web pages containing the images, and visual content analysis is rarely explored so far. 849 ing [23]. Recently, researchers found that the data graph, behaving as an informative platform, can also be utilized to rank practical data including text [24], images [24][13], and videos [12]. One obvious advantage that graph-based methods bear is that the data graph is capable of reflecting the intrinsic manifold structure which is collectively hidden in data such as images and tends to capture the underlying semantics of data. Therefore, performing ranking on image graphs is very suitable for image ranking towards semantics. Inspired by this desirable advantage of ranking on image graphs, we intend to devise a graph-theoretical framework for effective image search re-ranking. We consider a particular ranking scenario where the query samples are noisy, i.e., they do not all belong to the same class. Such a scenario is pervasively present in the web image search re-ranking problem for which some re-ranking approaches [18][13][21][8] employ a few topranked images from web search engines as pseudo visual queries and conduct re-ranking on a larger working set composed of hundreds to thousands of images returned by the search engines. Inevitably, the pseudo visual queries taking pseudo relevance labels (noisy labels) contain the outliers unrelated to user’s search intent, as shown in Fig. 1. To this end, we propose a noise resistant graph ranking framework integrating outlier removal and graph ranking. Shown in Fig. 1, our framework first builds a neighborhood graph on the working set to be ranked, then leverages a pool of smooth eigenbases stemming from the normalized graph Laplacian to discover the high-density regions, and lastly selects properly sparse eigenbases to construct a spectral filter which automatically filters the low-density outliers out of the noisy query sample set (query set in short). With respect to the filtered query set, a canonical graph ranker, e.g., manifold ranking [24], is used to rank the samples in the working set. The remainder of this paper is organized as follows. In Section 2, we review recent works on web image search reranking. After that, we introduce the background knowledge about graph ranking in Section 3, and present our graph ranking framework exclusively designed for the purpose of robust ranking with noisy queries in Section 4. Superior image search performance on two public web image databases is shown in Section 5 and conclusions are drawn in Section 6. significant clusters. The images are then re-ranked based on their distances to the cluster centers. This family of methods introduce some uncertain factors such as how to merge clusters, how to drop small clusters, and how to give weights to clusters for distance computation. The topic models [7][6][10] employ pLSA [6] or LDA [10] to learn the topics which are latent in the images returned by search engines and reveal the visual target of user’s textual query. Re-ranking images is then done on the basis of the dominating topics using the topic membership proportions of every image. This family of methods tend to be more suited to object-like text queries, and some of them [7][6] require separate cross-validation sets to determine the dominating topics. Besides, training topic models is nontrivial. The supervised methods [3][18][11][15] usually acquire binary textual features indicating the occurrence of query terms in the surrounding text on web pages and image metadata (e.g., image tags) of the searched images. Such textual features can be used together with visual features to train a classifier like SVM [18] or logistic regression [15] that predicts the relevance label of each image with respect to the text query. However, these supervised methods train queryspecific [3][11] or query-relative [18][15] models that may suffer from the overfitting issue when applied to unseen text queries. Our experiments show that the overfitting issue affects the methods in [18][15]. Our approach to addressing image search re-ranking falls into the fourth category of graph-based methods [12][13][21] by taking into account the sensible idea of ranking on image graphs. Our core contribution that will be discussed in Section 4 is to formulate a spectral filter for effective outlier removal in the noisy query set. In comparison, [12][13] did not consider outlier removal and directly used the noisy query set. [21] handled outliers but did not remove them in a very effective way, as will be validated in our experiments. 3. Background: Ranking on Graphs In this section, we briefly discuss two popular and theoretically related graph ranking methods: Personalized PageRank (PPageRank) [9][13] and Manifold Ranking (MRank) [24]. PPageRank tailors graph ranking towards user’s interest, which is a query-biased version of the wellknown PageRank algorithm in essence. In contrast, MRank explicitly brings the query-biased order to data samples by making use of the inherent manifold structure. Without loss of generality, suppose that we have π data points π³ = {x1 , ⋅ ⋅ ⋅ , xπ , xπ+1 , ⋅ ⋅ ⋅ , xπ } ⊂ βπ , where the first π points are the query samples and the rest ones are the samples to be ranked. The target of either PPageRank or MRank is to rank the samples based on their relevances to the query samples via a neighborhood graph 2. Related Work We roughly divide the existing works on web image search re-ranking into four categories: clustering-based methods, topic models, supervised methods, and graphbased methods. The clustering-based methods [1][14][2] apply clustering algorithms such as mean-shift, K-means, and Kmedoids on the images returned by search engines to seek 850 πΊ = (π, πΈ, π ). π is a vertex set composed of π vertices representing π raw data points, πΈ ⊆ π × π is an edge set connecting nearby data points, and π ∈ βπ×π is a matrix measuring the strength of edges. For convenience, we assume the graph is connected, which is satisfied if an adequate number of edges are included in πΈ. The (sparse) weight matrix of the widely used πNN graph is defined by β§ β¨ π(xπ , xπ )2 exp(− ), π ∈ ππ or π ∈ ππ (1) πππ = π2 β© 0, otherwise true positive sample outlier noisy query sample set high density region Figure 2. The multi-region assumption for outlier removal. where π(, ) is a distance measure in βπ , e.g., Euclidean distance, ππ ⊂ [1 : π] consists of π indexes of π nearest neighbors of xπ in π³ , and π > 0 is the bandwidth parameter. Let π· = diag(π ∑ 1) be a degree matrix whose diagonal eleπ ments are π·ππ = π=1 πππ . For the query-based ranking problem, we define a query indicator vector (query vector in short) y ∈ βπ with π¦π = 1 if π ∈ π (π = [1 : π] denotes the query index set) and π¦π = 0 otherwise. The π query samples are supposed to come from the same class, taking the ground-truth label 1. The PPageRank algorithm defines a random walk on πΊ using the stochastic transition matrix π = πΌπ·−1 π + (1 − πΌ)1yβ€ /π (0 < πΌ < 1 is the bias parameter), and the resulting ranking scores constitute a stationary probability distribution π ∈ βπ over π decided by the linear system π = π β€ π from which PPageRank solves π= 1−πΌ (πΌ − πΌπ π·−1 )−1 y. π 4. Noise Resistant Graph Ranking In this section, we address a particular ranking scenario where the query sample set needed by a graph ranker is noisy. Under this setting, directly applying an existing graph ranker (PPageRank or MRank) is very likely to fail. Once the amount of outliers, i.e., those images irrelevant to user’s interest, exceeds the amount of the relevant ones in the query set, a graph ranker will end up ranking many irrelevant samples higher. To attain the goal of noise resistant graph ranking, we propose a spectral filter for automatic outlier removal. 4.1. Spectral Filter As a noisy query set may ruin graph ranking, it is necessary to eliminate the outliers as much as possible and feed a cleaner query set to graph rankers. The previous method Label Diagnosis (LabelDiag) in [21] removed an asserted outlier and simultaneously added an asserted positive sample in each iteration of a greedy gradient search algorithm, and it simply set the number of iterations to the half of the query set size. This method is very likely to bring in more outliers owing to sample addition. Given that adding samples is risky, we are only concerned about removing outliers. We desire an outlier filter which can remove the outliers thoroughly and result in a filtered query set that is precise. As mentioned in [8], the spectrum of the graph Laplacian πΏ = π· − π has the potential to suppress the noisy labels in semi-supervised learning. Likewise, we believe that graph spectrum also has potential in handling the noisy labels of query samples in ranking. The spectrum of the πNN graph πΊ = (π, πΈ, π ) built in Section 3 is a set of eigenvalue, eigenvector pairs {(ππ , uπ )}ππ=1 of the normalized graph Laplacian β = π·−1/2 πΏπ·−1/2 , in which the eigenvalues are sorted in a nondecreasing order such that u1 represents the lowest frequency eigenvector and uπ represents the highest frequency eigenvector. When the graph is connected, only one eigenvalue is zero, that is, π1 = 0. [19] pointed out that when the data points have formed clusters, each high density region implicitly corresponds to some low-frequency (smooth) eigenvector which takes rela- (2) The MRank algorithm defines an iterative label diffusion process as follow π (π‘ + 1) = πΌπ·−1/2 π π·−1/2 π (π‘) + (1 − πΌ)y, (3) in which π‘ is the time stamp and π (π‘) ∈ βπ receives the diffused labels at time π‘. The solution of MRank at convergence is given by π = (1 − πΌ)(πΌ − πΌπ·−1/2 π π·−1/2 )−1 y. (4) Interestingly, when replacing π π·−1 with π·−1/2 π π·−1/2 and ignoring the constant factors, PPageRank yields the same analytic solution as MRank. Therefore, we unify PPageRank and MRank into the single solution: f = (πΌ − πΌπ)−1 y, (5) where f saves the final rank scores, and matrix π ∈ βπ×π reveals the ranking type. π = π π·−1 represents PPageRank whereas π = π·−1/2 π π·−1/2 corresponds to MRank. Actually, the ranking performance of PPageRank and MRank is very comparable, as will be shown in the later experiments. 851 Algorithm 1 β1 -Ball Projection πΉπ§ () tively large absolute values for points in the region (cluster) and whose values are close to zero elsewhere. Note that we exclude the first eigenvector u1 because it is nearly constant and does not form clusters. We would assume that the true positive samples in the query set reside in multiple high-density regions. Note that this “multi-region” assumption makes sense for web image re-ranking since the in-class images matching user’s textual query may fall in a few categories due to polysemy. Fig. 2 gives a schematic illustration of the multi-region assumption. We only consider π smooth eigenvectors u2 , ⋅ ⋅ ⋅ , uπ+1 associated with π lowest eigenvalues wrapped in Λ = diag(π2 , ⋅ ⋅ ⋅ , ππ+1 ) to explore the high density regions. Let us inspect the noisy label vector yπ = 1 that is defined on the noisy query set π and contains the first π entries in vector y. The desired outlier filter aims at refining yπ through pruning its unreliable 1 entries and then pro˜ through picking ducing a smaller yet cleaner query set π up the remained 1 entries. Ideally, the exact label vector yπ∗ takes 1 for the true positive samples whereas 0 for the outliers. Within the label value space βπ , we seek π smooth eigenbases uπ,2 , ⋅ ⋅ ⋅ , uπ,π+1 each of which originates from upper π entries in each eigenvector uπ . According to the multi-region assumption, yπ∗ should nearly lie in some subspace spanned by sparse eigenbases out of uπ,2 , ⋅ ⋅ ⋅ , uπ,π+1 (wrap them in the matrix ππ = [uπ,2 , ⋅ ⋅ ⋅ , uπ,π+1 ] ∈ βπ×π ). Consequently, we formulate a Spectral Filter (SpecFilter) via sparse eigenbase fitting, that is, reconstructing the noisy label vector yπ with sparse eigenbases: min β₯ππ a − yπ β₯2 + πβ₯aβ₯1 + πΎaβ€ Λa, a∈βπ π Input: ∑π A vector a ∈′ β . If π=1 β£ππ β£ ≤ π§, a = a; else ⋅ ⋅ ≥ π£π , sort β£aβ£ into v such that π£1 ≥ π£2 ≥ ⋅ ∑ find π = max{π ∈ [1 : π] : π£π − 1π ( ππ ′ =1 π£π ′ − π§) > 0}, ∑ compute π = 1π ( ππ=1 π£π −π§) and form a′ = [π′1 , ⋅ ⋅ ⋅ , π′π ]β€ such that π′π = sign(ππ ) ⋅ max{β£ππ β£ − π, 0} for π ∈ [1 : π]. Output: A vector a′ ∈ βπ . constraint in eq. (7) via the β1 -ball projection operator πΉπ§ (a) = arg min β₯b − aβ₯ β₯bβ₯1 ≤π§ which has been implemented in π(π log π) [5]. We describe it in Algorithm 1. By leveraging the β1 -ball projection operator, the iterative updating rule applied in projected gradient descent is a(π‘ + 1) = πΉπ§ (a(π‘) − π½π‘ ∇a π₯ (a(π‘), yπ )) , (9) where π‘ denotes the time stamp, π½π‘ > 0 denotes the appropriate step size, and ∇a π₯ (a, yπ ) denotes the gradient of the cost function π₯ with respect to a. To expedite the projected gradient method, we offer it a good starting point a(0) = (ππβ€ ππ + πΎΛ)−1 ππβ€ yπ that is the globally optimal solution to the unconstrained counterpart of eq. (7). Because the dimension of the solution space π is very low in practice (no more than 40 in our all experiments), the projected gradient method converges fast (no more than 100 iterations throughout our experiments). Now we are ready to filter out the outliers by pruning the reconstructed vector ππ a with a being the optimal solution to eq. (7). We simply obtain the denoised label vector yΜπ = πππ(ππ a) by the rounding function πππ : βπ → {1, 0}π defined as follows { 1, π£π ≥ πΏ ⋅ max{v} (πππ(v))π = (10) 0, otherwise (6) ∑π where a is the sparse coefficient vector, β₯aβ₯1 = π=1 β£ππ β£ is the β1 -norm encouraging sparsity of a, and π > 0, πΎ > 0 are two regularization parameters. Note that the last term in eq. (6) is actually a weighted β2 -norm since aβ€ Λa = ∑π 2 π=1 ππ+1 ππ , which imposes that the smoother eigenbases with smaller ππ are preferred in reconstruction of yπ . The nature of eq. (6) is a sparse linear regression model Lasso [20] augmented by weighted β2 -regularization. To control the sparsity of a more conveniently, we convert eq. (6) to the following equivalent problem as done for Lasso: where the parameter 0 < πΏ < 1 is properly chosen to prune the low-valued entries in ππ a which correspond to the unreliable 1 entries (i.e., noisy labels) in yπ . In order to achieve thorough outlier removal, we deploy SpecFilter in a consecutive mode. Specifically, we generate a sequence of {yππ } via successive filtering and seek the convergent one as the ultimate denoised label vector yΜπ . Setting yπ0 = yπ , we launch a successive alternating updating process: given yππ , update aπ+1 by min π₯ (a, yπ ) = β₯ππ a − yπ β₯2 + πΎaβ€ Λa a∈βπ s.t. β₯aβ₯1 ≤ π§ (8) (7) aπ+1 = arg min π₯ (a, yππ ), π = 0, 1, 2, ⋅ ⋅ ⋅ ; where the sparsity level parameter π§ > 0 maps to the parameter π in one-to-one correspondence. Eq. (7) is a convex optimization problem and can thus be solved accurately by the first-order optimization method Projected Gradient Descent [4]. To exploit this method, we must handle the β1 β₯aβ₯1 ≤π§ (11) and given aπ+1 , update yππ+1 by yππ+1 = πππ(ππ aπ+1 ), π = 0, 1, 2, ⋅ ⋅ ⋅ . 852 (12) 3. Compute the rank score vector f]= (πΌ −πΌπ)−1 y based [ yΜπ on the query vector y = ∈ βπ . 0 Algorithm 2 Spectral Filter (SpecFilter) π Input: A noisy label vector yπ ∈ β and model parameters πΎ, π§ > 0, 0 < πΏ < 1. Set π½ = 0.5, π = 0.01, π = 10−4 ; define functions π₯ (a, yπ ) = β₯ππ a − yπ β₯2 + πΎaβ€ Λa and ∇a π₯ (a, yπ ) = 2(ππβ€ ππ + πΎΛ)a − 2ππβ€ yπ ; initialize yπ0 = yπ and π = 0; repeat initialize a(0) = (ππβ€ ππ + πΎΛ)−1 ππβ€ yππ , for π‘ = 0, ⋅ ⋅ ⋅ do ( ) a(π‘ + 1) := πΉπ§ a(π‘) − π½π‘ ∇a π₯ (a(π‘), yππ ) where π½π‘ = π½ π such that π is the smallest nonnegative integer satisfying π₯ (a(π‘ + 1), yππ ) − π₯ (a(π‘), yππ ) ≤ )β€ ( π ∇a π₯ (a(π‘), yππ ) (a(π‘ + 1) − a(π‘)), π if β£π₯ (a(π‘ + 1), yπ ) − π₯ (a(π‘), yππ )β£ < π then aπ+1 := a(π‘ + 1) and break, end if end for yππ+1 := πππ(ππ aπ+1 ), π := π + 1, until yππ converges; Output: The denoised label vector yΜπ = yππ . One can also acquire a slightly different framework SpecFilter+PPageRank by substituting π with π π·−1 . 5. Experiments We evaluate the proposed noise resistant ranking framework on two public web image databases: Fergus dataset [6] and INRIA dataset [15] in which there are 4,091 and 71,478 web images, respectively. Given a text query, each image across the two datasets has an initial ranking score from a web search engine and a ground-truth label indicating whether it is relevant to the query. For visual feature extraction of web images, we adopt Locality-constrained Linear Coding (LLC) [22] to obtain image representations which have demonstrated state-ofthe-art performance. In detail, we use the SIFT descriptors [16] which were computed from 16×16 pixel densely sampled image patches with a stepsize of 8 pixels. The images were all preprocessed into gray scale, and a pre-trained codebook with 1024 bases1 was used to generate LLC codes. Following the scheme of Spatial Pyramid Matching (SPM) [16], we used 1×1 and 2×2 sub-regions for LLC and max-pooled the LLC codes for each sub-region. Finally, these pooled features from each sub-region were concatenated and β2 -normalized as the final 1024 ∗ 5-dimensional image feature representation. As the working set to be re-ranked for each text query across the two datasets has 1000 images at most, we keep a small number of eigenbases to run SpecFilter. Specifically, we use π = 40 eigenbases for Fergus dataset since its images are more diverse, and π = 20 eigenbases for INRIA dataset. Accordingly, we set πΎ = 1, π§ = 6, πΏ = 0.5 for Fergus dataset and πΎ = 1, π§ = 3, πΏ = 0.5 for the other. We build a 20NN graph on the working set for each text query with π in eq. (1) defined by the Euclidean distance. We compare six graph-based ranking methods including EigenFunc [8]2 , LabelDiag [21], PPageRank [13], MRank [24], and our proposed SpecFilter+PPageRank and SpecFilter+MRank using the same LLC features. To compare them with the existing re-ranking methods, we follow the same evaluation settings and metrics as [6] and [15] on Fergus dataset and INRIA dataset, respectively. We detail the proposed SpecFilter encompassing solving the sparse eigenbase fitting problem eq. (7) and successive filtering eq. (11)(12) in Algorithm 2. In our experiments, Algorithm 2 achieves a convergent denoised label vector yΜπ in about 10 iterations, i.e., yΜπ ∼ = yπ10 . We remark SpecFilter for summary. Remarks: 1) The idea of sparse eigenbase fitting exploited by SpecFilter implements the multi-region assumption, and the sparsity of eigenbases naturally unveils multiple high-density regions which the true positive query samples belong to. 2) SpecFilter removes the low-density outliers that correspond to the low-valued entries in yπ ’s reconstructed version ππ a. Hence, we are aware which and how many outliers should be removed from the noisy query set. 4.2. Algorithmic Framework So far, we can integrate the proposed spectral filter into a noise resistant graph ranking framework which ranks the samples with respect to the query samples of noisy labels. We outline this algorithmic framework, termed SpecFilter+MRank, in below. 1. Use eq. (1) to build a πNN graph πΊ = (π, πΈ, π ) on π samples {xπ }ππ=1 of which the first π ones are queries. Compute π = π·−1/2 π π·−1/2 . Compute πΌ − π·−1/2 π π·−1/2 and its low-frequency eigenvectors π = [u2 , ⋅ ⋅ ⋅ , uπ+1 ] corresponding to π lowest eigenvalues Λ = diag(π2 , ⋅ ⋅ ⋅ , ππ+1 ) except π1 = 0. 5.1. Fergus Dataset On this benchmark, we calculate precision at 15% recall of raw search engine Google and ten re-ranking methods for seven object categories in Table 1. All 2. Given the noisy label vector yπ = 1 ∈ βπ , run SpecFilter (Algorithm 2) using ππ and Λ to output the denoised label vector yΜπ ∈ βπ . 1 Courtesy of http://www.ifp.illinois.edu/ jyang29/LLC.htm 2 This is a graph-based semi-supervised learning method Eigenfunction. Here we use its one-class version for ranking. 853 Table 1. Comparisons on Fergus dataset. Ranking precision at 15% recall corresponding to seven object categories: airplane, cars (rear), face, guitar, leopard, motorbike, and wrist watch. For each column, two best results are shown in bold. Precision airplane (%) Google 50 SVM [18] 35 LogReg [15] 65 TSI-pLSA [6] 57 LDA [10] 100 EigenFunc [8] 60 LabelDiag [21] 50 PPageRank [13] 32 MRank [24] 39 SpecFilter+PPageRank 80 SpecFilter+MRank 86 cars rear 41 55 77 83 94 54 53 53 94 100 face guitar leopard motorbike 19 72 82 100 47 68 64 66 75 75 31 29 28 50 91 36 33 32 32 61 58 41 50 44 59 65 21 42 48 50 47 63 46 63 49 72 97 48 79 79 79 79 79 Mean 43 54 56 69 91 54 58 54 56 76 80 that LabelDiag brings in outliers during sample addition although it filters out some outliers. We also report the precision of two topic models TSIpLSA [6] and LDA [10] as well as two supervised models SVM [18] and logistic regression (LogReg) [15] in Table 1. We find out that our approach SpecFilter+MRank surpasses the topic model TSI-pLSA and the two supervised models by a large margin in mean precision. Although the topic model LDA achieves the higher mean precision than our approach (higher precision except on ‘cars’), it seems to prefer the object-like text queries since its primary purpose is object detection, not web image re-ranking. In contrast, our re-ranking approach can perform well for diversified text queries, which is verified on the larger dataset INRIA. Query Set Precision 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 wrist watch 70 93 79 88 100 73 83 72 75 97 100 Google top-50 LabelDiag SpecFilter (dense eigenbases) SpecFilter (sparse eigenbases) Figure 3. Precision of visual query sets corresponding to seven object categories in Fergus dataset. Both LabelDiag and SpecFilter work on initial noisy query sets composed of top-50 images from Google and yield filtered query sets. 5.2. INRIA Dataset of the six graph-based ranking methods work on an initial noisy query set comprised of top-50 images returned by Google for each category. Table 1 exhibits SpecFilter+MRank>SpecFilter+PPageRank>LabelDiag> MRank>PPageRank=EigenFunc in terms of mean precision, which testifies that outlier removal is vital to graph ranking faced with noisy queries. Among the six methods, the proposed SpecFilter+MRank is the best, improving 86% over Google in mean precision. Such an improvement is substantially sharp. We also test our re-ranking approach on the recently released INRIA dataset which contains a diversified group of text queries up to 353 including object names ‘cloud’, ‘flag’, ‘car’, celebrity names ‘jack black’, ‘will smith’, ‘dustin hoffman’, and abstract terms ‘tennis course’, ‘golf course’, ‘4x4’. We report the mean average precision (MAP) of the supervised model LogReg and six graph-based ranking methods including ours over the 353 text queries in Table 2. SpecFilter+MRank is still the best among the six graphbased methods when using top-100 images returned by raw search engine as a pseudo visual query set, improving 29% over raw search engine in MAP. Moreover, we list the precision of filtered query sets achieved by LabelDiag and SpecFilter in Table 3, and conclude that SpecFilter using sparse eigenbases yields the cleanest visual query sets consistently. It is important to note that the supervised model LogReg using both textual and visual features is even inferior to the baselines PPageRank and MRank, which confirms our suspicion that supervised models tend to overfit the training data. In contrast, our re-ranking approach needs neither large training sets nor accurate labels, applying to unseen We attribute the success of SpecFilter+MRank to SpecFilter that produces a cleaner query set than the initial noisy query set harvested from Google and the filtered query set achieved by LabelDiag. Additionally, we find that SpecFilter using sparse eigenbases is superior to using dense eigenbases (all π eigenbases). We plot the query set precision of LabelDiag and two versions of SpecFilter in Fig. 3 which discloses that SpecFilter using sparse eigenbases consistently produces the cleanest query set across seven object categories. The precision of the filtered query set of LabelDiag is almost the same as that of Google, so we can say 854 20 best text queries text queries adaptively. Remarkably, SpecFilter+MRank using only visual features improves by 10% over LogReg using hybrid textual+visual features in MAP. Some re-ranked image lists are displayed in Fig. 6, which shows that SpecFilter+MRank outperforms raw search engine and MRank consistently. For the text query ‘jack black’, the top images ranked by search engine suffer seriously from the issue of polysemy as ‘jack black’ is semantically ambiguous with ‘Black Jack card game’. As such, MRank performs worse than search engine because the initial noisy query set contains a lot of outliers from ‘Black Jack card game’, whereas SpecFilter+MRank is resistant to the outliers owing to SpecFilter. We further study the effect of search engine result quality on our approach by choosing 20 text queries having the best search engine average precision (AP) and 20 text queries having the worst AP. Fig. 4 shows AP for 20 best queries and Fig. 5 shows AP for 20 worst queries, in which SpecFilter+MRank is more robust to the search engine result quality than MRank while MRank works worse for some queries on which search engine works poorly. 1 average precision 0.8 top-20 images 48.49 69.51 69.31 69.80 71.17 72.75 top-50 images 56.99 57.00 64.90 67.30 45.38 70.12 69.09 69.46 71.35 73.58 Search Engine MRank SpecFilter + MRank 0 Figure 4. Average precision of ranking for 20 best text queries in INRIA dataset. Initial noisy query sets are composed of top-100 images from search engine. 20 worst text queries 0.8 average precision 0.7 0.6 0.5 0.4 0.3 0.2 0.1 top-100 images 0 Search Engine MRank SpecFilter + MRank Figure 5. Average precision of ranking for 20 worst text queries in INRIA dataset. Initial noisy query sets are composed of top-100 images from search engine. 41.08 69.68 68.86 69.04 71.35 73.76 spectral filter, a core component of our graph ranking framework, boosts the performance of existing graph rankers in terms of re-ranked image search results. Second, we find that web image search re-ranking can be addressed using image content alone, and that our re-ranking approach using visual features surpasses supervised re-ranking models relying upon multiple cues including surrounding text and image metadata in addition to visual features [15]. Eventually, compared to other alternative re-ranking approaches, the effect of search engine result quality (e.g., the polysemy issue shown in the last example of Fig. 6) is less significant on our proposed approach. Table 3. INRIA dataset: mean precision of visual query sets over 353 text queries. Both LabelDiag and SpecFilter work on initial noisy query sets from search engine and yield filtered query sets. Query Set Precision (%) Search Engine LabelDiag SpecFilter (dense eigenbases) SpecFilter (sparse eigenbases) 0.4 0.2 Table 2. INRIA dataset: mean average precision (MAP) of ranking over 353 text queries with visual query sets of different sizes. MAP (%) Search Engine LogReg (textual) [15] LogReg (visual) [15] LogReg (t+v) [15] EigenFunc LabelDiag PPageRank MRank SpecFilter+PPageRank SpecFilter+MRank 0.6 top-20 images 63.35 63.06 top-50 images 56.94 56.82 top-100 images 50.91 50.91 68.54 60.28 51.20 73.51 62.83 54.30 Acknowledgement This work has been supported in part by Eastman Kodak Research, NSF (CNS-07-51078), and ONR (N00014-10-10242). References 6. Conclusions [1] N. Ben-Haim, B. Babenko, and S. Belongie. Improving webbased image search via content based clustering. In SLAM Workshop of CVPR, 2006. The conclusions drawn by this paper are three-fold. First, a filtered visual query set produced by the proposed 855 Figure 6. INRIA dataset: in each subfigure, the first row shows top-20 images ranked by raw search engine, the second row ranked by MRank, and the third row ranked by SpecFilter+MRank; the initial noisy query set is composed of top-100 images from search engine. [2] T. Berg and A. Berg. Finding iconic images. In the 2nd Internet Vision Workshop of CVPR, 2009. [3] T. Berg and D. Forsyth. Animals on the web. In Proc. CVPR, 2006. [4] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2004. [5] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the β1 -ball for learning in high dimensions. In Proc. ICML, 2008. [6] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning object categories from google’s image search. In Proc. ICCV, 2005. [7] R. Fergus, P. Perona, and A. Zisserman. A visual category filter for google images. In Proc. ECCV, 2004. [8] R. Fergus, Y. Weiss, and A. Torralba. Semi-supervised learning in gigantic image collections. In NIPS 22, 2010. [9] D. Fogaras, B. RaΜcz, K. CsalogaΜny, and T. SarloΜs. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333– 358, 2005. [10] M. Fritz and B. Schiele. Decomposition, discovery and detection of visual categories using topic models. In Proc. CVPR, 2008. [11] D. Grangier and S. Bengio. A discriminative kernel-based model to rank images from text queries. IEEE Trans. PAMI, 30(8):1371–1384, 2008. [12] W. Hsu, L. Kennedy, and S.-F. Chang. Reranking methods for visual search. IEEE MultiMedia, 14(3):14–22, 2007. [13] Y. Jing and S. Baluja. Visualrank: Applying pagerank to large-scale image search. IEEE Trans. PAMI, 30(11):1877– 1890, 2008. [14] L. Kennedy and M. Naaman. Generating diverse and representative image search results for landmarks. In Proc. WWW, 2008. [15] J. Krapac, M. Allan, J. Verbeek, and F. Jurie. Improving web image search results using query-relative classifiers. In Proc. CVPR, 2010. [16] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR, 2006. [17] A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS 14, 2002. [18] F. Schroff, A. Criminisi, and A. Zisserman. Harvesting image databases from the web. In Proc. ICCV, 2007. [19] T. Shi, M. Belkin, and B. Yu. Data spectroscopy: Eigenspaces of convolution operators and clustering. The Annals of Statistics, 37(6B):3960–3984, 2009. [20] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1):267–288, 1996. [21] J. Wang, Y.-G. Jiang, and S.-F. Chang. Label diagnosis through self tuning for web image search. In Proc. CVPR, 2009. [22] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In Proc. CVPR, 2010. [23] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. SchoΜlkopf. Learning with local and global consistency. In NIPS 16, 2004. [24] D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. SchoΜlkopf. Ranking on data manifolds. In NIPS 16, 2004. 856