poster

advertisement
Using Large-Scale Web Data to Facilitate
Textual Query Based Retrieval of
Consumer Photos
Motivation
• Digital cameras and mobile phone cameras popularize
rapidly:
– More and more personal photos;
– Retrieving images from enormous collections of personal
photos becomes a more and more important topic.
?
How to retrieve?
Prior Work: CBIR
• Content-Based
The paramountImage
challenge
Retrieval
-- semantic
(CBIR)gap:
– The
Users
gap
provide
between
images
the low-level
as queries
visual
to retrieve
features and
personal
the
high-level
photos.
semantic concepts.
query
result
Image with highlevel concept
…
Semantic
Gap
compare
Low-level
Feature vector
…
…
Feature
vectors
in DB
Prior Work: Image Annotation
Image
annotation
is used
classify
images
w.r.t.
high-level
•• It
more
convenient
forto
the
user
to retrieve
the
Anisintermediate
stage
for
textual
query
based
image
semantic
desirable
personal photos using textual queries.
retrieval. concepts.
– Semantic concepts are analogous to the textual terms describing
document contents.
annotate
query
Sunset
compare
database
result
Annotation Result:
high-level concepts
…
…
Retrieve
Idea
• But
Leverage
information
from web
images
tocategories
retrieve doand
Web
raw
images
consumer
are accompanied
photos
from
bydigital
tags,
cameras
not
consumer
photos
in personal
photo
collection.
titles.
contain
such
semantic
textual
descriptions
– Google and Flickr exploit them to index web images.
building
people, family
information
Web Contextual
people, wedding
Images Information
No intermediate
image annotation
sunset
process.
……
……
Web Images
Consumer Photos
Framework
Large Collection of
Web images
(with descriptive words)
Textual
Query
Automatic Web
Image Retrieval
WordNet
Raw Consumer Photos
Relevant/
Irrelevant
Images
Classifier
Consumer
Photo Retrieval
Relevance
Feedback
Top-Ranked
Consumer
Photos
Refined
Top-Ranked
Consumer
Photos
• When
Then,
auser
classifier
isuse
trained
based
on
theseto
web
provides
arelevance
textual
query,
It would
And
The
user
then
be
can
consumer
used
alsoto
find
photos
relevant/irrelevant
can
be
feedback
ranked
based
refine
images.
images
on
the
the
retrieval
classifier’s
in web
results.
image
decision
collections.
value.
Automatic Web Image Retrieval
……
……
Relevant
Web Images
boat
……
Inverted
File
“boat”
ark
barge
……
Irrelevant
Web Images
……
dredger houseboat
Semantic Word Trees
Based on WordNet
• The
which
do notfirst
containing
the
query
word
For web
The
user’s
webimages
images
textual
containing
query,
the
search
query
it word
in
theare
and
its two-level
descendants
areimages”.
considered as “irrelevant
semantic
considered
word
as “relevant
trees. web
web images”.
Classifier Training
Relevant
Web Images
ds
ds
ds
ds
……
……
……
sample1 sample2
sample3
sample4
Irrelevant
Web Images
Classifier f s(x)
• Construct 100 smaller training sets:
– Negative
Samples:
Randomly
sample
a fixed
number
ofon
irrelevant
• Based
Finally,
on
linearly
each
training
combine
set,
alltrain
decision
decision
stumps
stumps
based
oneach
their
web images
for 100 times;
dimension.
training
errors.
– Positive Samples: The relevant web images.
Relevance Feedback via
Cross-Domain Regularized Regression
User-labeled
images x1,…,xl
Other images
f T(x) should be
close to
+1 (labeled as positive)
−1 (labeled as negative)
f T(x) should be
close to f s(x)
A regularizer to control the complexity of
the target classifier f T(x)
a targetcan
linear
classifier
(x) =square
wTx.
• Design
This problem
be solved
withf Tleast
solver.
Source Classifiers
• Decision Stump Ensemble:
– Trained on each dimension for each bag;
– Decision values are fused after a sigmoid
mapping: fd(x) = ∑i γid h(sid(xd-θid));
– Pros:
• Non-linear;
• Easy to be parallelized;
– Cons:
• Testing is time-consuming;
Accelerating Source Classifiers
• One possible solution:
– Remove sigmoid mapping:
• fd(x) = ∑i γid sid(xid-θid) = (∑i γid sid)xi-(∑i γid sidθid);
• Assume there are N bags, D dims:
– Testing Complexity: O(ND) --> O(D)
– Cons:
• Become linear;
– Too weak.
Accelerating Source Classifiers
• Another possible solution:
– Use linear svm instead of decision stump
ensemble.
• Train 1 linear svm classifier for each bag;
• Fuse the decision values with a sigmoid mapping;
– Pros:
• It is hopeful to use less bags to achieve a satisfying
retrieval precision;
• Although testing complexity is still O(ND), there are
much less ``exp'' function calls (ND --> N);
• Individual classifiers are computed with just a vector
dot product, which can be efficiently computed with
SIMD instructions.
Comparison on Time Cost
Comparison on Time Cost
Performance Comparison
Relevance Feedback
+1
positive
-0.1
negative
Error Rate Refinement during RF
• Assume that there are M training data, in
which E instances are incorrectly classified.
– err_rate = E / M;
• For fs(x), when user labels one instance x as
y \in (-1, 1):
– If fs(x) = y, then
• err_rate = E / (M + α)
– If fs(x) = -y, then
• err_rate = (E + α) / (M + α)
The End
Download