Noise Resistant Graph Ranking for Improved Web Image Search

advertisement
Noise Resistant Graph Ranking for Improved Web Image Search
Wei Liu†
Yu-Gang Jiang†
Jiebo Luo‡
Shih-Fu Chang†
†
Electrical Engineering Department, Columbia University, New York, NY, USA
{wliu,yjiang,sfchang}@ee.columbia.edu
‡
Kodak Research Laboratories, Eastman Kodak Company, Rochester, NY, USA
jiebo.luo@kodak.com
Abstract
pseudo visual query set
…
In this paper, we exploit a novel ranking mechanism that
processes query samples with noisy labels, motivated by
the practical application of web image search re-ranking
where the originally highest ranked images are usually
posed as pseudo queries for subsequent re-ranking. Availing ourselves of the low-frequency spectrum of a neighborhood graph built on the samples, we propose a graphtheoretical framework amenable to noise resistant ranking.
The proposed framework consists of two components: spectral filtering and graph-based ranking. The former leverages sparse bases, progressively selected from a pool of
smooth eigenvectors of the graph Laplacian, to reconstruct
the noisy label vector associated with the query sample set
and accordingly filter out the query samples with less authentic positive labels. The latter applies a canonical graph
ranking algorithm with respect to the filtered query sample set. Quantitative image re-ranking experiments carried
out on two public web image databases bear out that our
re-ranking approach compares favorably with the state-ofthe-arts and improves web image search engines by a large
margin though we harvest the noisy queries from the topranked images returned by these search engines.
image graph construction
spectral
decomposition
…
graph
ranker
spectral
filter
smooth eigenbasis
filtered visual
query set
re-ranked search result list
Figure 1. The flowchart of the proposed noise resistant graph ranking framework for web image search re-ranking. Our approach
takes the noisy search results returned by a web image search engine as pseudo visual queries, and uses a spectral filter to produce
a cleaner set of visual queries which are then cast into a graph
ranker to output the re-ranked (better) search results.
Clearly, text-based image search yields unsatisfactory
search results due to the difficulty in automatically associating the content of images with text. Thus, researchers
started to design algorithms incorporating visual features
of images to improve the search results of text-based image search engines. The problem, generally referred to as
web image search re-ranking, studies mechanisms for reranking the images returned by web search engines so that
the visually relevant images can appear prominently higher.
Some recent works along this direction [1][14][2][7][6][10]
have used the visual features for re-ranking web images,
while [3][18][11][15] used hybrid textual+visual features.
In the meantime, a surge of efforts have been made
in theory and algorithms for graph-based learning, especially in spectral clustering [17] and semi-supervised learn-
1. Introduction
Nowadays, image search has become an indispensable
feature of all popular web search engines such as Google,
Yahoo! and Bing as well as most photo sharing websites
such as Flickr, Photobucket and ImageShack. However,
these image search engines are not sufficiently effective because images are too complicated to handle. On account
of the remarkable success of text retrieval for searching
web pages, most of the search engines still return images
solely based on the associated or surrounding text on the
web pages containing the images, and visual content analysis is rarely explored so far.
849
ing [23]. Recently, researchers found that the data graph,
behaving as an informative platform, can also be utilized
to rank practical data including text [24], images [24][13],
and videos [12]. One obvious advantage that graph-based
methods bear is that the data graph is capable of reflecting
the intrinsic manifold structure which is collectively hidden
in data such as images and tends to capture the underlying semantics of data. Therefore, performing ranking on
image graphs is very suitable for image ranking towards semantics. Inspired by this desirable advantage of ranking
on image graphs, we intend to devise a graph-theoretical
framework for effective image search re-ranking.
We consider a particular ranking scenario where the
query samples are noisy, i.e., they do not all belong to
the same class. Such a scenario is pervasively present in
the web image search re-ranking problem for which some
re-ranking approaches [18][13][21][8] employ a few topranked images from web search engines as pseudo visual
queries and conduct re-ranking on a larger working set composed of hundreds to thousands of images returned by the
search engines. Inevitably, the pseudo visual queries taking
pseudo relevance labels (noisy labels) contain the outliers
unrelated to user’s search intent, as shown in Fig. 1. To
this end, we propose a noise resistant graph ranking framework integrating outlier removal and graph ranking. Shown
in Fig. 1, our framework first builds a neighborhood graph
on the working set to be ranked, then leverages a pool of
smooth eigenbases stemming from the normalized graph
Laplacian to discover the high-density regions, and lastly
selects properly sparse eigenbases to construct a spectral filter which automatically filters the low-density outliers out
of the noisy query sample set (query set in short). With
respect to the filtered query set, a canonical graph ranker,
e.g., manifold ranking [24], is used to rank the samples in
the working set.
The remainder of this paper is organized as follows. In
Section 2, we review recent works on web image search reranking. After that, we introduce the background knowledge about graph ranking in Section 3, and present our
graph ranking framework exclusively designed for the purpose of robust ranking with noisy queries in Section 4. Superior image search performance on two public web image
databases is shown in Section 5 and conclusions are drawn
in Section 6.
significant clusters. The images are then re-ranked based on
their distances to the cluster centers. This family of methods
introduce some uncertain factors such as how to merge clusters, how to drop small clusters, and how to give weights to
clusters for distance computation.
The topic models [7][6][10] employ pLSA [6] or LDA
[10] to learn the topics which are latent in the images returned by search engines and reveal the visual target of
user’s textual query. Re-ranking images is then done on the
basis of the dominating topics using the topic membership
proportions of every image. This family of methods tend
to be more suited to object-like text queries, and some of
them [7][6] require separate cross-validation sets to determine the dominating topics. Besides, training topic models
is nontrivial.
The supervised methods [3][18][11][15] usually acquire
binary textual features indicating the occurrence of query
terms in the surrounding text on web pages and image metadata (e.g., image tags) of the searched images. Such textual
features can be used together with visual features to train a
classifier like SVM [18] or logistic regression [15] that predicts the relevance label of each image with respect to the
text query. However, these supervised methods train queryspecific [3][11] or query-relative [18][15] models that may
suffer from the overfitting issue when applied to unseen text
queries. Our experiments show that the overfitting issue affects the methods in [18][15].
Our approach to addressing image search re-ranking
falls into the fourth category of graph-based methods
[12][13][21] by taking into account the sensible idea of
ranking on image graphs. Our core contribution that will
be discussed in Section 4 is to formulate a spectral filter for
effective outlier removal in the noisy query set. In comparison, [12][13] did not consider outlier removal and directly
used the noisy query set. [21] handled outliers but did not
remove them in a very effective way, as will be validated in
our experiments.
3. Background: Ranking on Graphs
In this section, we briefly discuss two popular and
theoretically related graph ranking methods: Personalized PageRank (PPageRank) [9][13] and Manifold Ranking
(MRank) [24]. PPageRank tailors graph ranking towards
user’s interest, which is a query-biased version of the wellknown PageRank algorithm in essence. In contrast, MRank
explicitly brings the query-biased order to data samples by
making use of the inherent manifold structure.
Without loss of generality, suppose that we have 𝑛 data
points 𝒳 = {x1 , ⋅ ⋅ ⋅ , xπ‘ž , xπ‘ž+1 , ⋅ ⋅ ⋅ , x𝑛 } ⊂ ℝ𝑑 , where the
first π‘ž points are the query samples and the rest ones are
the samples to be ranked. The target of either PPageRank or MRank is to rank the samples based on their relevances to the query samples via a neighborhood graph
2. Related Work
We roughly divide the existing works on web image
search re-ranking into four categories: clustering-based
methods, topic models, supervised methods, and graphbased methods.
The clustering-based methods [1][14][2] apply clustering algorithms such as mean-shift, K-means, and Kmedoids on the images returned by search engines to seek
850
𝐺 = (𝑉, 𝐸, π‘Š ). 𝑉 is a vertex set composed of 𝑛 vertices
representing 𝑛 raw data points, 𝐸 ⊆ 𝑉 × π‘‰ is an edge set
connecting nearby data points, and π‘Š ∈ ℝ𝑛×𝑛 is a matrix measuring the strength of edges. For convenience, we
assume the graph is connected, which is satisfied if an adequate number of edges are included in 𝐸. The (sparse)
weight matrix of the widely used π‘˜NN graph is defined by
⎧
⎨
𝑑(x𝑖 , x𝑗 )2
exp(−
), 𝑖 ∈ 𝑁𝑗 or 𝑗 ∈ 𝑁𝑖
(1)
π‘Šπ‘–π‘— =
𝜎2
⎩ 0,
otherwise
true positive sample
outlier
noisy query
sample set
high density region
Figure 2. The multi-region assumption for outlier removal.
where 𝑑(, ) is a distance measure in ℝ𝑑 , e.g., Euclidean distance, 𝑁𝑖 ⊂ [1 : 𝑛] consists of π‘˜ indexes of π‘˜ nearest neighbors of x𝑖 in 𝒳 , and 𝜎 > 0 is the bandwidth parameter.
Let 𝐷 = diag(π‘Š ∑
1) be a degree matrix whose diagonal ele𝑛
ments are 𝐷𝑖𝑖 = 𝑗=1 π‘Šπ‘–π‘— .
For the query-based ranking problem, we define a query
indicator vector (query vector in short) y ∈ ℝ𝑛 with 𝑦𝑖 = 1
if 𝑖 ∈ 𝑄 (𝑄 = [1 : π‘ž] denotes the query index set) and
𝑦𝑖 = 0 otherwise. The π‘ž query samples are supposed to
come from the same class, taking the ground-truth label 1.
The PPageRank algorithm defines a random walk on 𝐺
using the stochastic transition matrix 𝑃 = 𝛼𝐷−1 π‘Š + (1 −
𝛼)1y⊀ /π‘ž (0 < 𝛼 < 1 is the bias parameter), and the
resulting ranking scores constitute a stationary probability
distribution πœ‹ ∈ ℝ𝑛 over 𝑉 decided by the linear system
πœ‹ = 𝑃 ⊀ πœ‹ from which PPageRank solves
πœ‹=
1−𝛼
(𝐼 − π›Όπ‘Š 𝐷−1 )−1 y.
π‘ž
4. Noise Resistant Graph Ranking
In this section, we address a particular ranking scenario
where the query sample set needed by a graph ranker is
noisy. Under this setting, directly applying an existing
graph ranker (PPageRank or MRank) is very likely to fail.
Once the amount of outliers, i.e., those images irrelevant
to user’s interest, exceeds the amount of the relevant ones
in the query set, a graph ranker will end up ranking many
irrelevant samples higher. To attain the goal of noise resistant graph ranking, we propose a spectral filter for automatic
outlier removal.
4.1. Spectral Filter
As a noisy query set may ruin graph ranking, it is necessary to eliminate the outliers as much as possible and feed a
cleaner query set to graph rankers. The previous method Label Diagnosis (LabelDiag) in [21] removed an asserted outlier and simultaneously added an asserted positive sample in
each iteration of a greedy gradient search algorithm, and it
simply set the number of iterations to the half of the query
set size. This method is very likely to bring in more outliers owing to sample addition. Given that adding samples
is risky, we are only concerned about removing outliers. We
desire an outlier filter which can remove the outliers thoroughly and result in a filtered query set that is precise.
As mentioned in [8], the spectrum of the graph Laplacian 𝐿 = 𝐷 − π‘Š has the potential to suppress the noisy
labels in semi-supervised learning. Likewise, we believe
that graph spectrum also has potential in handling the noisy
labels of query samples in ranking. The spectrum of the
π‘˜NN graph 𝐺 = (𝑉, 𝐸, π‘Š ) built in Section 3 is a set of
eigenvalue, eigenvector pairs {(πœ†π‘— , u𝑗 )}𝑛𝑗=1 of the normalized graph Laplacian β„’ = 𝐷−1/2 𝐿𝐷−1/2 , in which the
eigenvalues are sorted in a nondecreasing order such that
u1 represents the lowest frequency eigenvector and u𝑛 represents the highest frequency eigenvector. When the graph
is connected, only one eigenvalue is zero, that is, πœ†1 = 0.
[19] pointed out that when the data points have formed
clusters, each high density region implicitly corresponds to
some low-frequency (smooth) eigenvector which takes rela-
(2)
The MRank algorithm defines an iterative label diffusion
process as follow
𝑓 (𝑑 + 1) = 𝛼𝐷−1/2 π‘Š 𝐷−1/2 𝑓 (𝑑) + (1 − 𝛼)y,
(3)
in which 𝑑 is the time stamp and 𝑓 (𝑑) ∈ ℝ𝑛 receives the
diffused labels at time 𝑑. The solution of MRank at convergence is given by
𝑓 = (1 − 𝛼)(𝐼 − 𝛼𝐷−1/2 π‘Š 𝐷−1/2 )−1 y.
(4)
Interestingly, when replacing π‘Š 𝐷−1 with 𝐷−1/2 π‘Š 𝐷−1/2
and ignoring the constant factors, PPageRank yields the
same analytic solution as MRank. Therefore, we unify
PPageRank and MRank into the single solution:
f = (𝐼 − 𝛼𝑆)−1 y,
(5)
where f saves the final rank scores, and matrix 𝑆 ∈
ℝ𝑛×𝑛 reveals the ranking type. 𝑆 = π‘Š 𝐷−1 represents
PPageRank whereas 𝑆 = 𝐷−1/2 π‘Š 𝐷−1/2 corresponds to
MRank. Actually, the ranking performance of PPageRank
and MRank is very comparable, as will be shown in the later
experiments.
851
Algorithm 1 β„“1 -Ball Projection 𝔹𝑧 ()
tively large absolute values for points in the region (cluster)
and whose values are close to zero elsewhere. Note that
we exclude the first eigenvector u1 because it is nearly constant and does not form clusters. We would assume that
the true positive samples in the query set reside in multiple high-density regions. Note that this “multi-region” assumption makes sense for web image re-ranking since the
in-class images matching user’s textual query may fall in a
few categories due to polysemy. Fig. 2 gives a schematic
illustration of the multi-region assumption.
We only consider π‘š smooth eigenvectors u2 , ⋅ ⋅ ⋅ , uπ‘š+1
associated with π‘š lowest eigenvalues wrapped in Λ =
diag(πœ†2 , ⋅ ⋅ ⋅ , πœ†π‘š+1 ) to explore the high density regions.
Let us inspect the noisy label vector yπ‘ž = 1 that is defined on the noisy query set 𝑄 and contains the first π‘ž entries in vector y. The desired outlier filter aims at refining
yπ‘ž through pruning its unreliable 1 entries and then pro˜ through picking
ducing a smaller yet cleaner query set 𝑄
up the remained 1 entries. Ideally, the exact label vector yπ‘ž∗ takes 1 for the true positive samples whereas 0 for
the outliers. Within the label value space β„π‘ž , we seek π‘š
smooth eigenbases uπ‘ž,2 , ⋅ ⋅ ⋅ , uπ‘ž,π‘š+1 each of which originates from upper π‘ž entries in each eigenvector u𝑗 . According to the multi-region assumption, yπ‘ž∗ should nearly
lie in some subspace spanned by sparse eigenbases out
of uπ‘ž,2 , ⋅ ⋅ ⋅ , uπ‘ž,π‘š+1 (wrap them in the matrix π‘ˆπ‘ž =
[uπ‘ž,2 , ⋅ ⋅ ⋅ , uπ‘ž,π‘š+1 ] ∈ β„π‘ž×π‘š ). Consequently, we formulate
a Spectral Filter (SpecFilter) via sparse eigenbase fitting,
that is, reconstructing the noisy label vector yπ‘ž with sparse
eigenbases:
min βˆ₯π‘ˆπ‘ž a − yπ‘ž βˆ₯2 + 𝜌βˆ₯aβˆ₯1 + 𝛾a⊀ Λa,
a∈β„π‘š
π‘š
Input:
∑π‘š A vector a ∈′ ℝ .
If 𝑗=1 βˆ£π‘Žπ‘— ∣ ≤ 𝑧, a = a;
else
⋅ ⋅ ≥ π‘£π‘š ,
sort ∣a∣ into v such that 𝑣1 ≥ 𝑣2 ≥ ⋅ ∑
find π‘Ÿ = max{𝑗 ∈ [1 : π‘š] : 𝑣𝑗 − 1𝑗 ( 𝑗𝑗 ′ =1 𝑣𝑗 ′ − 𝑧) > 0},
∑
compute πœƒ = 1π‘Ÿ ( π‘Ÿπ‘—=1 𝑣𝑗 −𝑧) and form a′ = [π‘Ž′1 , ⋅ ⋅ ⋅ , π‘Ž′π‘š ]⊀
such that π‘Ž′𝑗 = sign(π‘Žπ‘— ) ⋅ max{βˆ£π‘Žπ‘— ∣ − πœƒ, 0} for 𝑗 ∈ [1 : π‘š].
Output: A vector a′ ∈ β„π‘š .
constraint in eq. (7) via the β„“1 -ball projection operator
𝔹𝑧 (a) = arg min βˆ₯b − aβˆ₯
βˆ₯bβˆ₯1 ≤𝑧
which has been implemented in 𝑂(π‘š log π‘š) [5]. We describe it in Algorithm 1. By leveraging the β„“1 -ball projection operator, the iterative updating rule applied in projected
gradient descent is
a(𝑑 + 1) = 𝔹𝑧 (a(𝑑) − 𝛽𝑑 ∇a π’₯ (a(𝑑), yπ‘ž )) ,
(9)
where 𝑑 denotes the time stamp, 𝛽𝑑 > 0 denotes the appropriate step size, and ∇a π’₯ (a, yπ‘ž ) denotes the gradient
of the cost function π’₯ with respect to a. To expedite the
projected gradient method, we offer it a good starting point
a(0) = (π‘ˆπ‘žβŠ€ π‘ˆπ‘ž + 𝛾Λ)−1 π‘ˆπ‘žβŠ€ yπ‘ž that is the globally optimal
solution to the unconstrained counterpart of eq. (7). Because the dimension of the solution space π‘š is very low in
practice (no more than 40 in our all experiments), the projected gradient method converges fast (no more than 100
iterations throughout our experiments).
Now we are ready to filter out the outliers by pruning the
reconstructed vector π‘ˆπ‘ž a with a being the optimal solution
to eq. (7). We simply obtain the denoised label vector yΜƒπ‘ž =
π‘Ÿπ‘›π‘‘(π‘ˆπ‘ž a) by the rounding function π‘Ÿπ‘›π‘‘ : β„π‘ž → {1, 0}π‘ž
defined as follows
{
1, 𝑣𝑖 ≥ 𝛿 ⋅ max{v}
(π‘Ÿπ‘›π‘‘(v))𝑖 =
(10)
0, otherwise
(6)
∑π‘š
where a is the sparse coefficient vector, βˆ₯aβˆ₯1 = 𝑗=1 βˆ£π‘Žπ‘— ∣
is the β„“1 -norm encouraging sparsity of a, and 𝜌 > 0, 𝛾 > 0
are two regularization parameters. Note that the last term
in
eq. (6) is actually a weighted β„“2 -norm since a⊀ Λa =
∑π‘š
2
𝑗=1 πœ†π‘—+1 π‘Žπ‘— , which imposes that the smoother eigenbases
with smaller πœ†π‘— are preferred in reconstruction of yπ‘ž .
The nature of eq. (6) is a sparse linear regression model
Lasso [20] augmented by weighted β„“2 -regularization. To
control the sparsity of a more conveniently, we convert
eq. (6) to the following equivalent problem as done for
Lasso:
where the parameter 0 < 𝛿 < 1 is properly chosen to prune
the low-valued entries in π‘ˆπ‘ž a which correspond to the unreliable 1 entries (i.e., noisy labels) in yπ‘ž . In order to achieve
thorough outlier removal, we deploy SpecFilter in a consecutive mode. Specifically, we generate a sequence of {yπ‘žπ‘— }
via successive filtering and seek the convergent one as the
ultimate denoised label vector yΜƒπ‘ž . Setting yπ‘ž0 = yπ‘ž , we
launch a successive alternating updating process:
given yπ‘žπ‘— , update a𝑗+1 by
min π’₯ (a, yπ‘ž ) = βˆ₯π‘ˆπ‘ž a − yπ‘ž βˆ₯2 + 𝛾a⊀ Λa
a∈β„π‘š
s.t. βˆ₯aβˆ₯1 ≤ 𝑧
(8)
(7)
a𝑗+1 = arg min π’₯ (a, yπ‘žπ‘— ), 𝑗 = 0, 1, 2, ⋅ ⋅ ⋅ ;
where the sparsity level parameter 𝑧 > 0 maps to the parameter 𝜌 in one-to-one correspondence. Eq. (7) is a convex optimization problem and can thus be solved accurately
by the first-order optimization method Projected Gradient
Descent [4]. To exploit this method, we must handle the β„“1
βˆ₯aβˆ₯1 ≤𝑧
(11)
and given a𝑗+1 , update yπ‘žπ‘—+1 by
yπ‘žπ‘—+1 = π‘Ÿπ‘›π‘‘(π‘ˆπ‘ž a𝑗+1 ), 𝑗 = 0, 1, 2, ⋅ ⋅ ⋅ .
852
(12)
3. Compute the rank score vector
f]= (𝐼 −𝛼𝑆)−1 y based
[
yΜƒπ‘ž
on the query vector y =
∈ ℝ𝑛 .
0
Algorithm 2 Spectral Filter (SpecFilter)
π‘ž
Input: A noisy label vector yπ‘ž ∈ ℝ and model parameters
𝛾, 𝑧 > 0, 0 < 𝛿 < 1.
Set 𝛽 = 0.5, πœ‚ = 0.01, πœ– = 10−4 ;
define functions π’₯ (a, yπ‘ž ) = βˆ₯π‘ˆπ‘ž a − yπ‘ž βˆ₯2 + 𝛾a⊀ Λa and
∇a π’₯ (a, yπ‘ž ) = 2(π‘ˆπ‘žβŠ€ π‘ˆπ‘ž + 𝛾Λ)a − 2π‘ˆπ‘žβŠ€ yπ‘ž ;
initialize yπ‘ž0 = yπ‘ž and 𝑗 = 0;
repeat
initialize a(0) = (π‘ˆπ‘žβŠ€ π‘ˆπ‘ž + 𝛾Λ)−1 π‘ˆπ‘žβŠ€ yπ‘žπ‘— ,
for 𝑑 = 0, ⋅ ⋅ ⋅ do (
)
a(𝑑 + 1) := 𝔹𝑧 a(𝑑) − 𝛽𝑑 ∇a π’₯ (a(𝑑), yπ‘žπ‘— )
where 𝛽𝑑 = 𝛽 πœ” such that πœ” is the smallest nonnegative integer satisfying π’₯ (a(𝑑 + 1), yπ‘žπ‘— ) − π’₯ (a(𝑑), yπ‘žπ‘— ) ≤
)⊀
(
πœ‚ ∇a π’₯ (a(𝑑), yπ‘žπ‘— ) (a(𝑑 + 1) − a(𝑑)),
𝑗
if ∣π’₯ (a(𝑑 + 1), yπ‘ž ) − π’₯ (a(𝑑), yπ‘žπ‘— )∣ < πœ– then
a𝑗+1 := a(𝑑 + 1) and break,
end if
end for
yπ‘žπ‘—+1 := π‘Ÿπ‘›π‘‘(π‘ˆπ‘ž a𝑗+1 ),
𝑗 := 𝑗 + 1,
until yπ‘žπ‘— converges;
Output: The denoised label vector yΜƒπ‘ž = yπ‘žπ‘— .
One can also acquire a slightly different framework SpecFilter+PPageRank by substituting 𝑆 with π‘Š 𝐷−1 .
5. Experiments
We evaluate the proposed noise resistant ranking framework on two public web image databases: Fergus dataset
[6] and INRIA dataset [15] in which there are 4,091 and
71,478 web images, respectively. Given a text query, each
image across the two datasets has an initial ranking score
from a web search engine and a ground-truth label indicating whether it is relevant to the query.
For visual feature extraction of web images, we adopt
Locality-constrained Linear Coding (LLC) [22] to obtain
image representations which have demonstrated state-ofthe-art performance. In detail, we use the SIFT descriptors [16] which were computed from 16×16 pixel densely
sampled image patches with a stepsize of 8 pixels. The images were all preprocessed into gray scale, and a pre-trained
codebook with 1024 bases1 was used to generate LLC
codes. Following the scheme of Spatial Pyramid Matching
(SPM) [16], we used 1×1 and 2×2 sub-regions for LLC and
max-pooled the LLC codes for each sub-region. Finally,
these pooled features from each sub-region were concatenated and β„“2 -normalized as the final 1024 ∗ 5-dimensional
image feature representation.
As the working set to be re-ranked for each text query
across the two datasets has 1000 images at most, we keep
a small number of eigenbases to run SpecFilter. Specifically, we use π‘š = 40 eigenbases for Fergus dataset since
its images are more diverse, and π‘š = 20 eigenbases for
INRIA dataset. Accordingly, we set 𝛾 = 1, 𝑧 = 6, 𝛿 = 0.5
for Fergus dataset and 𝛾 = 1, 𝑧 = 3, 𝛿 = 0.5 for the
other. We build a 20NN graph on the working set for each
text query with π‘Š in eq. (1) defined by the Euclidean distance. We compare six graph-based ranking methods including EigenFunc [8]2 , LabelDiag [21], PPageRank [13],
MRank [24], and our proposed SpecFilter+PPageRank and
SpecFilter+MRank using the same LLC features. To compare them with the existing re-ranking methods, we follow
the same evaluation settings and metrics as [6] and [15] on
Fergus dataset and INRIA dataset, respectively.
We detail the proposed SpecFilter encompassing solving the
sparse eigenbase fitting problem eq. (7) and successive filtering eq. (11)(12) in Algorithm 2. In our experiments, Algorithm 2 achieves a convergent denoised label vector yΜƒπ‘ž in
about 10 iterations, i.e., yΜƒπ‘ž ∼
= yπ‘ž10 .
We remark SpecFilter for summary.
Remarks: 1) The idea of sparse eigenbase fitting exploited by SpecFilter implements the multi-region assumption, and the sparsity of eigenbases naturally unveils multiple high-density regions which the true positive query samples belong to. 2) SpecFilter removes the low-density outliers that correspond to the low-valued entries in yπ‘ž ’s reconstructed version π‘ˆπ‘ž a. Hence, we are aware which and how
many outliers should be removed from the noisy query set.
4.2. Algorithmic Framework
So far, we can integrate the proposed spectral filter into
a noise resistant graph ranking framework which ranks the
samples with respect to the query samples of noisy labels.
We outline this algorithmic framework, termed SpecFilter+MRank, in below.
1. Use eq. (1) to build a π‘˜NN graph 𝐺 = (𝑉, 𝐸, π‘Š )
on 𝑛 samples {x𝑖 }𝑛𝑖=1 of which the first π‘ž ones are
queries. Compute 𝑆 = 𝐷−1/2 π‘Š 𝐷−1/2 . Compute
𝐼 − 𝐷−1/2 π‘Š 𝐷−1/2 and its low-frequency eigenvectors π‘ˆ = [u2 , ⋅ ⋅ ⋅ , uπ‘š+1 ] corresponding to π‘š lowest
eigenvalues Λ = diag(πœ†2 , ⋅ ⋅ ⋅ , πœ†π‘š+1 ) except πœ†1 = 0.
5.1. Fergus Dataset
On this benchmark, we calculate precision at 15%
recall of raw search engine Google and ten re-ranking
methods for seven object categories in Table 1. All
2. Given the noisy label vector yπ‘ž = 1 ∈ β„π‘ž , run SpecFilter (Algorithm 2) using π‘ˆπ‘ž and Λ to output the denoised label vector yΜƒπ‘ž ∈ β„π‘ž .
1 Courtesy
of http://www.ifp.illinois.edu/ jyang29/LLC.htm
2 This is a graph-based semi-supervised learning method Eigenfunction.
Here we use its one-class version for ranking.
853
Table 1. Comparisons on Fergus dataset. Ranking precision at 15% recall corresponding to seven object categories: airplane, cars (rear),
face, guitar, leopard, motorbike, and wrist watch. For each column, two best results are shown in bold.
Precision
airplane
(%)
Google
50
SVM [18]
35
LogReg [15]
65
TSI-pLSA [6]
57
LDA [10]
100
EigenFunc [8]
60
LabelDiag [21]
50
PPageRank [13]
32
MRank [24]
39
SpecFilter+PPageRank
80
SpecFilter+MRank
86
cars
rear
41
55
77
83
94
54
53
53
94
100
face
guitar
leopard
motorbike
19
72
82
100
47
68
64
66
75
75
31
29
28
50
91
36
33
32
32
61
58
41
50
44
59
65
21
42
48
50
47
63
46
63
49
72
97
48
79
79
79
79
79
Mean
43
54
56
69
91
54
58
54
56
76
80
that LabelDiag brings in outliers during sample addition although it filters out some outliers.
We also report the precision of two topic models TSIpLSA [6] and LDA [10] as well as two supervised models
SVM [18] and logistic regression (LogReg) [15] in Table 1.
We find out that our approach SpecFilter+MRank surpasses
the topic model TSI-pLSA and the two supervised models
by a large margin in mean precision. Although the topic
model LDA achieves the higher mean precision than our approach (higher precision except on ‘cars’), it seems to prefer
the object-like text queries since its primary purpose is object detection, not web image re-ranking. In contrast, our
re-ranking approach can perform well for diversified text
queries, which is verified on the larger dataset INRIA.
Query Set Precision
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
wrist
watch
70
93
79
88
100
73
83
72
75
97
100
Google top-50
LabelDiag
SpecFilter (dense
eigenbases)
SpecFilter (sparse
eigenbases)
Figure 3. Precision of visual query sets corresponding to seven object categories in Fergus dataset. Both LabelDiag and SpecFilter
work on initial noisy query sets composed of top-50 images from
Google and yield filtered query sets.
5.2. INRIA Dataset
of the six graph-based ranking methods work on an
initial noisy query set comprised of top-50 images returned by Google for each category. Table 1 exhibits
SpecFilter+MRank>SpecFilter+PPageRank>LabelDiag>
MRank>PPageRank=EigenFunc in terms of mean precision, which testifies that outlier removal is vital to graph
ranking faced with noisy queries. Among the six methods,
the proposed SpecFilter+MRank is the best, improving
86% over Google in mean precision. Such an improvement
is substantially sharp.
We also test our re-ranking approach on the recently released INRIA dataset which contains a diversified group of
text queries up to 353 including object names ‘cloud’, ‘flag’,
‘car’, celebrity names ‘jack black’, ‘will smith’, ‘dustin
hoffman’, and abstract terms ‘tennis course’, ‘golf course’,
‘4x4’. We report the mean average precision (MAP) of
the supervised model LogReg and six graph-based ranking
methods including ours over the 353 text queries in Table
2. SpecFilter+MRank is still the best among the six graphbased methods when using top-100 images returned by raw
search engine as a pseudo visual query set, improving 29%
over raw search engine in MAP. Moreover, we list the precision of filtered query sets achieved by LabelDiag and SpecFilter in Table 3, and conclude that SpecFilter using sparse
eigenbases yields the cleanest visual query sets consistently.
It is important to note that the supervised model LogReg
using both textual and visual features is even inferior to the
baselines PPageRank and MRank, which confirms our suspicion that supervised models tend to overfit the training
data. In contrast, our re-ranking approach needs neither
large training sets nor accurate labels, applying to unseen
We attribute the success of SpecFilter+MRank to SpecFilter that produces a cleaner query set than the initial noisy
query set harvested from Google and the filtered query set
achieved by LabelDiag. Additionally, we find that SpecFilter using sparse eigenbases is superior to using dense eigenbases (all π‘š eigenbases). We plot the query set precision of
LabelDiag and two versions of SpecFilter in Fig. 3 which
discloses that SpecFilter using sparse eigenbases consistently produces the cleanest query set across seven object
categories. The precision of the filtered query set of LabelDiag is almost the same as that of Google, so we can say
854
20 best text queries
text queries adaptively. Remarkably, SpecFilter+MRank
using only visual features improves by 10% over LogReg
using hybrid textual+visual features in MAP.
Some re-ranked image lists are displayed in Fig. 6, which
shows that SpecFilter+MRank outperforms raw search engine and MRank consistently. For the text query ‘jack
black’, the top images ranked by search engine suffer seriously from the issue of polysemy as ‘jack black’ is semantically ambiguous with ‘Black Jack card game’. As such,
MRank performs worse than search engine because the initial noisy query set contains a lot of outliers from ‘Black
Jack card game’, whereas SpecFilter+MRank is resistant to
the outliers owing to SpecFilter. We further study the effect
of search engine result quality on our approach by choosing 20 text queries having the best search engine average
precision (AP) and 20 text queries having the worst AP.
Fig. 4 shows AP for 20 best queries and Fig. 5 shows AP
for 20 worst queries, in which SpecFilter+MRank is more
robust to the search engine result quality than MRank while
MRank works worse for some queries on which search engine works poorly.
1
average precision
0.8
top-20
images
48.49
69.51
69.31
69.80
71.17
72.75
top-50
images
56.99
57.00
64.90
67.30
45.38
70.12
69.09
69.46
71.35
73.58
Search Engine
MRank
SpecFilter + MRank
0
Figure 4. Average precision of ranking for 20 best text queries in
INRIA dataset. Initial noisy query sets are composed of top-100
images from search engine.
20 worst text queries
0.8
average precision
0.7
0.6
0.5
0.4
0.3
0.2
0.1
top-100
images
0
Search Engine
MRank
SpecFilter + MRank
Figure 5. Average precision of ranking for 20 worst text queries in
INRIA dataset. Initial noisy query sets are composed of top-100
images from search engine.
41.08
69.68
68.86
69.04
71.35
73.76
spectral filter, a core component of our graph ranking framework, boosts the performance of existing graph rankers in
terms of re-ranked image search results. Second, we find
that web image search re-ranking can be addressed using
image content alone, and that our re-ranking approach using visual features surpasses supervised re-ranking models
relying upon multiple cues including surrounding text and
image metadata in addition to visual features [15]. Eventually, compared to other alternative re-ranking approaches,
the effect of search engine result quality (e.g., the polysemy
issue shown in the last example of Fig. 6) is less significant
on our proposed approach.
Table 3. INRIA dataset: mean precision of visual query sets over
353 text queries. Both LabelDiag and SpecFilter work on initial
noisy query sets from search engine and yield filtered query sets.
Query Set
Precision (%)
Search Engine
LabelDiag
SpecFilter
(dense eigenbases)
SpecFilter
(sparse eigenbases)
0.4
0.2
Table 2. INRIA dataset: mean average precision (MAP) of ranking over 353 text queries with visual query sets of different sizes.
MAP
(%)
Search Engine
LogReg (textual) [15]
LogReg (visual) [15]
LogReg (t+v) [15]
EigenFunc
LabelDiag
PPageRank
MRank
SpecFilter+PPageRank
SpecFilter+MRank
0.6
top-20
images
63.35
63.06
top-50
images
56.94
56.82
top-100
images
50.91
50.91
68.54
60.28
51.20
73.51
62.83
54.30
Acknowledgement
This work has been supported in part by Eastman Kodak
Research, NSF (CNS-07-51078), and ONR (N00014-10-10242).
References
6. Conclusions
[1] N. Ben-Haim, B. Babenko, and S. Belongie. Improving webbased image search via content based clustering. In SLAM
Workshop of CVPR, 2006.
The conclusions drawn by this paper are three-fold.
First, a filtered visual query set produced by the proposed
855
Figure 6. INRIA dataset: in each subfigure, the first row shows top-20 images ranked by raw search engine, the second row ranked by
MRank, and the third row ranked by SpecFilter+MRank; the initial noisy query set is composed of top-100 images from search engine.
[2] T. Berg and A. Berg. Finding iconic images. In the 2nd
Internet Vision Workshop of CVPR, 2009.
[3] T. Berg and D. Forsyth. Animals on the web. In Proc. CVPR,
2006.
[4] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2004.
[5] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the β„“1 -ball for learning in high dimensions. In Proc. ICML, 2008.
[6] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning object categories from google’s image search. In Proc.
ICCV, 2005.
[7] R. Fergus, P. Perona, and A. Zisserman. A visual category
filter for google images. In Proc. ECCV, 2004.
[8] R. Fergus, Y. Weiss, and A. Torralba. Semi-supervised learning in gigantic image collections. In NIPS 22, 2010.
[9] D. Fogaras, B. Rácz, K. Csalogány, and T. Sarlós. Towards
scaling fully personalized pagerank: Algorithms, lower
bounds, and experiments. Internet Mathematics, 2(3):333–
358, 2005.
[10] M. Fritz and B. Schiele. Decomposition, discovery and detection of visual categories using topic models. In Proc.
CVPR, 2008.
[11] D. Grangier and S. Bengio. A discriminative kernel-based
model to rank images from text queries. IEEE Trans. PAMI,
30(8):1371–1384, 2008.
[12] W. Hsu, L. Kennedy, and S.-F. Chang. Reranking methods
for visual search. IEEE MultiMedia, 14(3):14–22, 2007.
[13] Y. Jing and S. Baluja. Visualrank: Applying pagerank to
large-scale image search. IEEE Trans. PAMI, 30(11):1877–
1890, 2008.
[14] L. Kennedy and M. Naaman. Generating diverse and representative image search results for landmarks. In Proc. WWW,
2008.
[15] J. Krapac, M. Allan, J. Verbeek, and F. Jurie. Improving web
image search results using query-relative classifiers. In Proc.
CVPR, 2010.
[16] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of
features: Spatial pyramid matching for recognizing natural
scene categories. In Proc. CVPR, 2006.
[17] A. Ng, M. Jordan, and Y. Weiss. On spectral clustering:
Analysis and an algorithm. In NIPS 14, 2002.
[18] F. Schroff, A. Criminisi, and A. Zisserman. Harvesting image databases from the web. In Proc. ICCV, 2007.
[19] T. Shi, M. Belkin, and B. Yu.
Data spectroscopy:
Eigenspaces of convolution operators and clustering. The
Annals of Statistics, 37(6B):3960–3984, 2009.
[20] R. Tibshirani. Regression shrinkage and selection via the
lasso. Journal of the Royal Statistical Society Series B,
58(1):267–288, 1996.
[21] J. Wang, Y.-G. Jiang, and S.-F. Chang. Label diagnosis
through self tuning for web image search. In Proc. CVPR,
2009.
[22] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong.
Locality-constrained linear coding for image classification.
In Proc. CVPR, 2010.
[23] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf.
Learning with local and global consistency. In NIPS 16,
2004.
[24] D. Zhou, J. Weston, A. Gretton, O. Bousquet, and
B. Schölkopf. Ranking on data manifolds. In NIPS 16, 2004.
856
Download