Document

advertisement
Multimedia Retrieval
Outline
• Overview Indexing Multimedia
• Generative Models & MMIR
– Probabilistic Retrieval
– Language models, GMMs
• Experiments
– Corel experiments
– TREC Video benchmark
Indexing Multimedia
A Wealth of Information
Speech
Audio
Images
Temporal
composition
Database
Associated Information
gender
name
country
Player Profile
history
Biography
id
picture
User Interaction
Poses a query
Gives examples
Views
Evaluates
query text
video segments
results
feedback
Database
Indexing Multimedia
• Manually added descriptions
– ‘Metadata’
• Analysis of associated data
– Speech, captions, OCR, …
• Content-based retrieval
– Approximate retrieval
– Domain-specific techniques
Limitations of Metadata
• Vocabulary problem
– Dark vs. somber
• Different people describe different
aspects
– Dark vs. evening
Limitations of Metadata
• Encoding Specificity Problem
– A single person describes different aspects
in different situations
• Many aspects of multimedia simply
cannot be expressed unambiguously
– Processes in left (analytic, verbal) vs. right
brain (aesthetics, synthetic, nonverbal)
Approximate Retrieval
• Based on similarity
– Find all objects that are similar to this one
– Distance function
– Representations capture some (syntactic)
meaning of the object
• ‘Query by Example’ paradigm
Feature extraction
N-dimensional
space
Ranking
Display
Low-level Features
Low-level Features
Query image
So, … Retrieval?!
IR is about satisfying vague
information needs provided by users,
(imprecisely specified in ambiguous
natural language) by satisfying them
approximately against information
provided by authors (specified in the
same ambiguous natural language)
Smeaton
No ‘Exact’ Science!
• Evaluation is not done analytically, but
experimentally
– real users (specifying requests)
– test collections (real document collections)
– benchmarks (TREC: text retrieval conference)
– Precision
– Recall
– ...
Known Item
Query
Results
…
Query
Results
…
Semantic gap…
concepts
?
features
raw multimedia data
Observation
• Automatic approaches are successful
under two conditions:
– the query example is derived from the
same source as the target objects
– a domain-specific detector is at hand
1. Generic Detectors
Retrieval Process
Database
Query
Parsing
Query type
Nouns
Adjectives
Detector /
Feature
selection
Filtering
Ranking
Camera operations
Invariant
People, Names
color spaces
Natural/physical
objects
.
.
.
.
Parameterized detectors
Example
Topic 41
Query
text
People detector
<1, 2, 3, many>
Results
Query Parsing
Find
The query type
Nouns
Adjectives
Other examples of overhead zooming in
views of canyons in the Western
United States
+ names
Detectors
The universe and everything
F
O
C
U
S
Camera operations (pan, zoom, tilt, …)
People (face based)
Names (VideoOCR)
Natural objects (color space selection)
Physical objects (color space selection)
Monologues (specifically designed)
Press conferences (specifically designed)
Interviews (specifically designed)
Domain specific detectors
2. Domain knowledge
Player Segmentation
Original image
Initial segmentation
Final segmentation
Advanced Queries
Show clips from tennis matches,
starring Sampras,
playing close to the net;
3. Get to know your users
Mirror Approach
• Gather User’s Knowledge
– Introduce semi-automatic processes for selection
and combination of feature models
• Local Information
– Relevance feedback from a user
• Global Information
– Thesauri constructed from all users
Feature extraction
N-dimensional
space
Clustering
Ranking
Concepts
Display
Thesauri
Low-level Features
Identify Groups
Representation
• Groups of feature vectors are
conceptually equivalent to words in text
retrieval
• So, techniques from text retrieval can
now be applied to multimedia data as if
these were text!
Query Formulation
• Clusters are internal representations,
not suited for user interaction
• Use automatic query formulation based
on global information (thesaurus) and
local information (user feedback)
Interactive Query Process
• Select relevant clusters from thesaurus
• Search collection
• Improve results by adapting query
– Remove clusters occuring in irrelevant images
– Add clusters occuring in relevant images
Assign Semantics
Visual Thesaurus
Glcm_47
Correct cluster
representing
‘Tree’, ‘Forest’
‘Incoherent’
cluster
Fractal_23
Mis-labeled
cluster
Gabor_20
Learning
• Short-term: Adapt query to better reflect
this user’s information need
• Long-term: Adapt thesaurus and
clustering to improve system for all
users
Thesaurus Only
After Feedback
4. Nobody is unique!
Collaborative Filtering
• Also: social information filtering
– Compare user judgments
– Recommend differences between similar users
• People’s tastes are not randomly distributed
• You are what you buy (Amazon)
Collaborative Filtering
• Benefits over content-based approach
– Overcomes problems with finding suitable
features to represent e.g. art, music
– Serendipity
– Implicit mechanism for qualitative aspects
like style
• Problems: large groups, broad domains
5. Ask for help
Query Articulation
Feature extraction
N-dimensional
space
How to articulate the query?
What is the query semantics ?
Details matter
Problem Statement
• Feature vectors capture ‘global’ aspects of
the whole image
• Overall image characteristics dominate the
feature-vectors
• Hypothesis: users are interested in details
Irrelevant Background
Query
Result
Image Spots
• Image-spots articulate desired image details
– Foreground/background colors
– Colors forming ‘shapes’
– Enclosure of shapes by background colors
• Multi-spot queries define the spatial relations
between a number of spots
Query Images
Results
Hist
16Hist
Spot
Spot+Hist
5968
6274
6563
7062
192
2
14
2
6098
5953
6612
7107
6888
7034
4
3
1
4
3
1
A: Simple Spot Query
`Black sky’
B: Articulated Multi-Spot Query
`Black sky’ above `Monochrome ground’
C: Histogram Search
in `Black Sky’ images
2-4:
14:
Complicating Factors
• What are Good Feature Models?
• What are Good Ranking Functions?
• Queries are Subjective!
Probabilistic Approaches
Generative Models…
• A statistical model for generating data
– Probability distribution over samples in a
given ‘language’aka
M
‘Language
Modelling’
P(
|M) =P( |M)
P ( | M, )
P ( | M,
)
P ( | M,
)
© Victor Lavrenko, Aug. 2002
… in Information Retrieval
• Basic question:
– What is the likelihood that this document is
relevant to this query?
• P(rel|I,Q) = P(I,Q|rel)P(rel) / P(I,Q)
• P(I,Q|rel) = P(Q|I,rel)P(I|rel)
‘Language Modelling’
• Not just ‘English’
• But also, the
language of
–
–
–
–
author
newspaper
text document
image
• Hiemstra or Robertson?

• ‘Parsimonious language
models explicitly
address the relation
between levels of
language models that
are typically used for
smoothing.’
‘Language Modelling’
• Not just ‘English’
• But also, the
language of
–
–
–
–
author
newspaper
text document
image
• Guardian or Times?
‘Language Modelling’
• Not just English!
• But also, the
language of
–
–
–
–
author
newspaper
text document
image
•
or
?
P(
Unigram and higher-order
models
)
=P( )P( | ) P( |
) P( |
• Unigram Models
P( )P( )P( )P( )
• N-gram Models
P( )P( | )P( | )P( | )
• Other Models
– Grammar-based models, etc.
– Mixture models
© Victor Lavrenko, Aug. 2002
)
The fundamental problem
• Usually we don’t know the model M
– But have a sample representative of that
model
P(
|M(
))
• First estimate a model from a sample
• Then compute the observation probability
M
© Victor Lavrenko, Aug. 2002
Indexing: determine models
Docs
Models
•Indexing
– Estimate
Gaussian Mixture
Models from images
using EM
– Based on feature
vector with colour,
texture and position
information from
pixel blocks
– Fixed number of
components
Retrieval: use query likelihood
• Query:
• Which of the models is most likely to
generate these 24 samples?
Probabilistic Image Retrieval
?
Rank by P(Q|M)
P(Q|M1)
P(Q|M2)
P(Q|M3)
P(Q|M4)
Query
Topic Models
Models
P(Q|M1)
Query
P(D1|QM)
P(Q|M2)
P(Q|M3)
P(D2|QM)
P(D3|QM)
P(Q|M4)
Query Model
P(D4|QM)
Documents
Probabilistic Retrieval Model
• Text
– Rank using probability of drawing query terms
from document models
• Images
– Rank using probability of drawing query blocks
from document models
• Multi-modal
– Rank using joint probability of drawing query
samples from document models
Text Models
• Unigram Language Models (LM)
– Urn metaphor
• P(
© Victor Lavrenko, Aug. 2002
)~P( )P( )P( )P( )
= 4/9 * 2/9 * 4/9 * 3/9
Generative Models and IR
• Rank models (documents) by probability of
generating the query
• Q:
• P(
|
) = 4/9 * 2/9 * 4/9 * 3/9 = 96/9
• P(
|
) = 3/9 * 3/9 * 3/9 * 3/9 = 81/9
• P(
|
) = 2/9 * 3/9 * 2/9 * 4/9 = 48/9
• P(
|
) = 2/9 * 5/9 * 2/9 * 2/9 = 40/9
The Zero-frequency Problem
• Suppose some event not in our example
– Model will assign zero probability to that event
– And to any set of events involving the unseen
event
• Happens frequently with language
• It is incorrect to infer zero probabilities
– Especially when dealing with incomplete samples
?
Smoothing
• Idea: shift part of probability mass to
unseen events
• Interpolation with background (General
English)
– Reflects expected frequency of events
– Plays role of IDF
–
+(1-)
Hierarchical Language Model
• MNM Smoothed over multiple levels
Alpha * P(T|Shot) +
Beta * P(T|‘Scene’) +
Gamma * P(T|Video) +
(1–Alpha–Beta–Gamma) * P(T|Collection)
• Also common in XML retrieval
– Element score smoothed with containing
article
Image Models
• Urn metaphor not useful
– Drawing pixels useless
• Pixels carry no semantics
– Drawing pixel blocks not effective
• chances of drawing exact query blocks from document slim
• Use Gaussian Mixture Models (GMM)
– Fixed number of Gaussian
components/clusters/concepts
Key-frame representation
split
Y
colour
channels
Cb
Cr
Take samples
position
DCT coefficients
Query model
675
661
668
665
669
9
7
-7
10
-5
12
13
13
11
18
11
5
3
2
7
1
-5
-3
4
-3
9
11
0
5
1
4
3
-1
2
-5
1517
1536
1534
1534
1534
-9
2
0
0
0
-3
-4
-5
-5
-5
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
0
850
844
837
829
833
EM algorithm
15
5
3
0
-5
4
4
3
3
4
0
-2
-3
-1
-1
1
0
0
0
0
4
1
-2
0
3
-2
-2
1
0
-1
1
1
1
1
1
1
2
3
4
5
Image Models
?
• Expectation-Maximisation (EM) algorithm
– iteratively
• estimate component assignments
• re-estimate component parameters
Expectation
Maximization
E
M
Component 1
Component 2
Component 3
Expectation
Maximization
animation
E
M
Component 1
Component 2
Component 3
VisualSEEk
• Querying by image regions and spatial
laylout
– Joint content-based/spatial querying
– Automated region extraction
– Direct indexing of color feature
Image Query Process
Color Similarity
• Color space similarity
– measure the closeness in the HSV color space
• Color histograms distance
– Minkowski metric between hq and ht
• Histogram quadratic distance
– used in QBIC project
– measure the weighted similarity between
histograms
– compute the cross similarity between colors
• It is computational expensive
Color sets
• Color sets
– A compact alternative to color histograms
• Color set examples
– Transform RGB to HSV
• Quantize the HSV color space to 2 hues, 2
saturations, and 2 values
• Assign a unique index m to each quantized
HSV color => right dimensional binary space
• color set: a selection from the eight colors
Color sets and Back-Projection
• Process stage
– Color selection
• 1, 2 colors, …, until extract salient regions
– Back-projection onto the image
• map I[x,y] to the most similar color in the color set
– Thresholding
– Labeling
Color Set Query Strategy
• Given the query color set
– perform several range queries on the query
color set’s colors
– take the intersection of these lists
– minimize the sum of attributes in the
intersection list
Single Region Query
• Region absolute location
– fixed query location
• The Euclidean distance of centroids
– bounded query location
• fall within a designed area: dq,t=0
• otherwise: the Euclidean distance of the centroids
Index Structure
• Centroid location spatial access
– Spatial quad-tree
• Rectangle (MBR) location spatial access
– R-trees
Size Comparison
• Area
– The absolute distance between two
regions
• Spatial Extent
– measure the distance among the width and
the height of the MBRs
– is much simpler than shape information
Single Region Query Strategy
Multiple Regions Query
Region Relative Location
• Use 2D-string
Spatial Invariance
• Provide scaling/rotation
– approximate rotation invariance by
providing different projections
Multiple Regions Query Strategy
• Relatives locations
– perform query on all attributes except
location
– find the intersection of the region lists
– for each candidate image
• the 2D-string is generated
• compared to the 2D-string of the query image
Query Formulation
• User tools
– sketches regions
– position them on the
query grid
– assign colors, size, and
absolute location
– may assign boundaries
Query
Examples(1)
Query
Examples(2)
Query Examples(3)
Bolbworld
• Image is treated as a few “blobs”
– Image regions are roughly homogeneous
with respect color and texture
Grouping Pixels into Regions
• For each pixel
– assign a vector consisting of color, texture, and
position features (8-D space)
• Model the distribution of pixels
– Use EM algorithm to fit the mixture of Gaussians
model to the data
• Perform spatial grouping
– Connected pixels belonging to the same
color/texture cluster
Describe the Regions
• For each region
– store its color histogram
• Match the color of two regions
– Use the quadratic distance between their
histograms x and y
– d2hist(x,y) = (x-y)TA(x-y)
– Matrix A
• a symmetric matrix of weights between 0 and 1
• represent the similarity between bin i and bin j
Querying in Blobworld
• The user composes a query
– By submitting an image
• see its Blobworld representation
• select the relevant blobs to match
• specify the relative importance of the blob features
– Atomic query
• Specify a particular blob with feature vector vi to match
• For each blob vj in the database image
• Find the distance between vi and vj
– Measure the similarity between bi and bj
– Take the maximum value
Querying in Blobworld
• Compound query
– calculated using fuzzy-logic operator
– user may specify a weight for each atomic
query
– rank the images according to the overall
score
• indicate the blobs provided the highest score
• helps the user refine the query
– Change the weighting of blob features
– Specify new blobs to match
Indexing of Blobworld
• Indexing the color feature vectors
– to speed up atomic queries
– use R*-trees
– higher dimensional data
• require larger data entries => lower fanout
– Need a low dimensional approximation to
the full color feature vectors
• use SVM to find Ak: the best rank k
approximation to the weight matrix A
• project the feature space into the subspace
spanned by the rows of Ak
• index the low-dimensional vector
Multimedia Cross-model
Correlation
• Case study: automatic
image captioning
– Each image has a set
of extracted regions
– Some of the regions
have a caption
Feature Extraction
• Discover image regions
– Extracted by a standard segmentation
algorithm
– Each region is mapped into a 30-dim feature
vector
• Mean, standard deviation of RGB values, average
responses to various texture filters, position in the
entire image layout, shape description
• How to capture cross-media correlations?
Mixed Media Graph (MMG)
• Graph construction
– V(O): the vertex of object O
– V(ai): the vertex of the attribute value A= ai
– Add an edge if and only if the two token
values are close enough
• for each feature-vector, choose its k nearest
neighbors => add the corresponding edges
– the nearest neighbor relationship is not symmetric
• NN-links: the nearest neighbor links
• OAV-links: the object-attribute-value-links
Example of MMG
Correlation Discovery
• Correlation discovery by random walk
– random walk with restart
– To compute the affinity of node “B” for node
“A”
• a random walker starts from node “A”
• choose randomly among the available edges
every time
– with probability c to go back to node “A” (restart)
• the steady-state probability that the random
walker will find at node “B)
– the affinity of “B” with respect to “A”
RWR Algorithm
• Given a query object Oq
– do an RWR from node q=V(Oq)
– compute the steady state probability vector
uq=(uq(1), …, uq(N)).
• Example
– Given an image I3
– Estimate ui3 for all nodes in the GMMG
– Report the top few caption words for the image
Download