Slides - Tamara L Berg

advertisement
Words & Pictures
Clustering and Bag of Words
Representations
Many slides adapted from Svetlana Lazebnik, Fei-Fei Li, Rob Fergus, and Antonio Torralba
Announcements
• HW1 due Thurs, Sept 27 @ 12pm
– By email to cse595@gmail.com. No need to
include shopping image set.
– Write-up can be webpage or pdf.
Document Vectors
 Represent document as a “bag of words”
Origin: Bag-of-words models
• Orderless document representation: frequencies
of words from a dictionary Salton & McGill (1983)
Origin: Bag-of-words models
• Orderless document representation: frequencies
of words from a dictionary Salton & McGill (1983)
US Presidential Speeches Tag Cloud
http://chir.ag/phernalia/preztags/
Origin: Bag-of-words models
• Orderless document representation: frequencies
of words from a dictionary Salton & McGill (1983)
US Presidential Speeches Tag Cloud
http://chir.ag/phernalia/preztags/
Origin: Bag-of-words models
• Orderless document representation: frequencies
of words from a dictionary Salton & McGill (1983)
US Presidential Speeches Tag Cloud
http://chir.ag/phernalia/preztags/
Bag-of-features models
Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
Bags of features for image
classification
1. Extract features
Bags of features for image
classification
1. Extract features
2. Learn “visual vocabulary”
Bags of features for image
classification
1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
Bags of features for image
classification
1.
2.
3.
4.
Extract features
Learn “visual vocabulary”
Quantize features using visual vocabulary
Represent images by frequencies of
“visual words”
1. Feature extraction
• Regular grid
– Vogel & Schiele, 2003
– Fei-Fei & Perona, 2005
1. Feature extraction
• Regular grid
– Vogel & Schiele, 2003
– Fei-Fei & Perona, 2005
• Interest point detector
– Csurka et al. 2004
– Fei-Fei & Perona, 2005
– Sivic et al. 2005
1. Feature extraction
• Regular grid
– Vogel & Schiele, 2003
– Fei-Fei & Perona, 2005
• Interest point detector
– Csurka et al. 2004
– Fei-Fei & Perona, 2005
– Sivic et al. 2005
• Other methods
– Random sampling (Vidal-Naquet & Ullman, 2002)
– Segmentation-based patches (Barnard et al. 2003)
1. Feature extraction
…
2. Learning the visual vocabulary
…
2. Learning the visual vocabulary
…
Clustering
Slide credit: Josef Sivic
2. Learning the visual vocabulary
Visual vocabulary
…
Clustering
Slide credit: Josef Sivic
Clustering
– The assignment of objects into groups (called clusters)
so that objects from the same cluster are more similar
to each other than objects from different clusters.
– Often similarity is assessed according to a distance
measure.
– Clustering is a common technique for statistical data
analysis, which is used in many fields, including
machine learning, data mining, pattern recognition,
image analysis and bioinformatics.
Any of the similarity metrics we talked about before (SSD, angle between
vectors)
Feature Clustering
Clustering is the process of grouping a set of
features into clusters of similar features.
Features within a cluster should be similar.
Features from different clusters should be
dissimilar.
source: Dan Klein
K-means clustering
•
Want to minimize sum of
squared Euclidean distances
between points xi and their
nearest cluster centers mk
D( X , M ) 
  (x  m )
i
2
k
cluster k point i in
cluster k
source: Svetlana Lazebnik
K-means clustering
•
Want to minimize sum of
squared Euclidean distances
between points xi and their
nearest cluster centers mk
D( X , M ) 
  (x  m )
i
2
k
cluster k point i in
cluster k
source: Svetlana Lazebnik
source: Dan Klein
source: Dan Klein
Source: Hinrich Schutze
Source: Hinrich Schutze
Hierarchical clustering strategies
• Agglomerative clustering
• Start with each point in a separate cluster
• At each iteration, merge two of the “closest” clusters
• Divisive clustering
• Start with all points grouped into a single cluster
• At each iteration, split the “largest” cluster
source: Svetlana Lazebnik
source: Dan Klein
source: Dan Klein
Divisive Clustering
• Top-down (instead of bottom-up as in
Agglomerative Clustering)
• Start with all docs in one big cluster
• Then recursively split clusters
• Eventually each node forms a cluster on its
own.
Source: Hinrich Schutze
Flat or hierarchical clustering?
• For high efficiency, use flat clustering (e.g. k
means)
• For deterministic results: hierarchical
clustering
• When a hierarchical structure is desired:
hierarchical algorithm
• Hierarchical clustering can also be applied if K
cannot be predetermined (can start without
knowing K)
Source: Hinrich Schutze
2. Learning the visual vocabulary
…
Clustering
Slide credit: Josef Sivic
2. Learning the visual vocabulary
Visual vocabulary
…
Clustering
Slide credit: Josef Sivic
From clustering to vector quantization
• Clustering is a common method for learning a visual
vocabulary or codebook
– Unsupervised learning process
– Each cluster center produced by k-means becomes a
codebook entry
– Codebook can be learned on separate training set
– Provided the training set is sufficiently representative,
the codebook will be “universal”
• The codebook is used for quantizing features
– A vector quantizer takes a feature vector and maps it to
the index of the nearest entry in the codebook
– Codebook = visual vocabulary
– Codebook entry = visual word
Example visual vocabulary
Fei-Fei et al. 2005
Visual vocabularies: Issues
• How to choose vocabulary size?
– Too small: visual words not
representative of all patches
– Too large: quantization artifacts,
overfitting
• Computational efficiency
– Vocabulary trees
(Nister & Stewenius, 2006)
frequency
3. Image representation
…..
codewords
Image classification (next)
• Given the bag-of-features representations of
images from different classes, how do we
learn a model for distinguishing them?
Clustering in Action
Names and Faces
President George W. Bush makes a
statement in the Rose Garden while
Secretary of Defense Donald Rumsfeld
looks on, July 23, 2003. Rumsfeld said
the United States would release graphic
photographs of the dead sons of
Saddam Hussein to prove they were
killed by American troops. Photo by
Larry Downing/Reuters
Who’s in the picture?
T.L. Berg, A.C. Berg, J. Edwards, D.A. Forsyth
Intuition
George Bush
500k News Corpora
Actress Winona Ryder (news) reacts to
remarks by prosecutor Ann Rundle during
the sentencing hearing in her felony
shoplifting case Friday, Dec. 6, 2002 at the
Beverly Hills, Calif., courthouse. At right is
Ryder's attorney Mark Geragos. Ryder was
sentenced to three years of probation and
was ordered to perform 480 hours of
community service. (AP
Photo/Steve Grayson, POOL)
Producer and director Bruce Paltrow has
died at the age of 58 in Rome, Italy, the
U.S. Consulate said on October 3, 2002.
Paltrow had suffered from throat cancer for
several years, but the cause of his death
was not immediately known. He is seen
with his daughter actress Gwyneth Paltrow
after the Academy Awards in Los Angles in
March 21, 1999 file photo. (Fred
Prouser/Reuters)
Name & Face Extraction
Detected Faces
President George W. Bush makes a
statement in the Rose Garden while
Secretary of Defense Donald
Rumsfeld looks on, July 23, 2003.
Rumsfeld said the United States
would release graphic photographs
of the dead sons of Saddam
Hussein to prove they were killed by
American troops. Photo by Larry
Downing/Reuters
Name & Face Extraction
Detected Faces
President George W. Bush makes a
statement in the Rose Garden while
Secretary of Defense Donald
Rumsfeld looks on, July 23, 2003.
Rumsfeld said the United States
would release graphic photographs
of the dead sons of Saddam
Hussein to prove they were killed by
American troops. Photo by Larry
Downing/Reuters
Detected Names:
President George W. Bush,
Defense Donald Rumsfeld,
Saddam Hussein.
Goal
Each name in the dataset is a potential
cluster. Want to simultaneously:
1.) Learn image model for each person.
2.) Learn depiction model across names.
Achieve both of these by considering a big
assignment (clustering) problem.
Assignment Problem
Language indicates
Depiction
P(Depicted | Context)
Yes/No
Cues - POS
multiple independent cues
tags before and after name,
location in caption, distance to closest:
( ) (L) (C) (R) left right center shown
pictured above
President George W. Bush
makes a statement in the
Rose Garden while
Secretary of Defense
Donald Rumsfeld looks on,
July 23, 2003. Rumsfeld
said the United States would
release graphic photographs
of the dead sons of Saddam
Hussein to prove they were
killed by American troops.
Photo by Larry
Downing/Reuters
Method
1.) Update assignments
2.) Update:
appearance model for each person.
language model of depiction across names.
Iterate 1-2
Results
British director Sam Mendes and
his partner actress Kate Winslet
arrive at the London premiere of
'The Road to Perdition', September
18, 2002. The films stars Tom
Hanks as a Chicago hit man who
has a separate family life and costars Paul Newman and Jude Law.
REUTERS/Dan Chung
World number one Lleyton Hewitt of
Australia hits a return to Nicolas
Massu of Chile at the Japan Open
tennis championships in Tokyo
October 3, 2002. REUTERS/Eriko
Sugita
Results
US President George W. Bush (L) makes
remarks while Secretary of State Colin
Powell (R) listens before signing the US
Leadership Against HIV /AIDS ,
Tuberculosis and Malaria Act of 2003 at the
Department of State in Washington, DC. The
five-year plan is designed to help prevent
and treat AIDS, especially in more than a
dozen African and Caribbean
nations(AFP/Luke Frazza)
German supermodel Claudia Schiffer gave
birth to a baby boy by Caesarian section
January 30, 2003, her spokeswoman said. The
baby is the first child for both Schiffer, 32, and
her husband, British film producer Matthew
Vaughn, who was at her side for the birth.
Schiffer is seen on the German television show
'Bet It...?!' ('Wetten Dass...?!') in
Braunschweig, on January 26, 2002.
(Alexandra Winkler/Reuters)
Results
Without – CEO Summit
With – Martha Stewart
Without – James Bond
With – Pierce Brosnan
Model
Without – Dick Cheney
With – George W. Bush
Accuracy of labeling
Vision model, No Lang model
67%
Vision model + Lang model
78%
Face Dictionary
http://tamaraberg.com/faces/faceDict/NIPSdict/index.html
Results - Depiction
Classifier
% correct
Baseline (all pictured)
67%
Learned Lang Model
86%
IN - pictured,
OUT - not pictured
Download