Learning Scalable Discriminative
Dictionaries with Sample Relatedness a.k.a.
“Infinite Attributes”
Jiashi Feng, Stefanie Jegelka,
Shuicheng Yan, Trevor Darrell
striped water white furry bright wheels
• generalizable vs. discriminative
• which attributes to use?
(Lampert, Nickisch & Harmeling, 2009; Farhadi &Forsyth, 2009; Parikh & Grauman, 2011 …)
Attribute Generative Model Cartoon cup face car
… objects eye eye eye mouth eye nose nose eye nose mouth eye mouth attributes features
…edges…
Attribute Generative Model Cartoon cup face car
… objects eye eye eye mouth eye nose nose eye nose mouth eye mouth attributes features
…edges…
Attribute Generative Model Cartoon cup face car
… objects eye eye eye nose eye mouth nose eye eye nose mouth mouth attributes features
…edges…
Attribute Generative Model Cartoon cup face car
… objects eye eye eye nose eye mouth nose eye eye nose mouth mouth attributes features
…edges…
Attribute Generative Model Cartoon cup face car
… objects eye eye eye eye nose mouth nose eye eye nose mouth mouth attributes features
…edges…
Attribute Generative Model Cartoon cup face car
… objects eye eye eye eye nose mouth nose eye eye nose mouth mouth attributes features
…edges…
Attribute Generative Model Cartoon cup face car
… objects eye eye eye mouth eye nose nose eye eye nose mouth mouth attributes features
…edges…
Attribute Generative Model Cartoon cup face car
… objects eye eye eye mouth eye nose nose eye eye mouth nose mouth attributes features
…edges…
Attribute Generative Model Cartoon cup face car
… objects eye eye eye mouth eye nose nose eye eye mouth nose mouth attributes features
…edges…
• Flexibility: automatically determine the attributes
– as expressive as needed, as compact as possible
non-parametric Bayesian striped water white furry striped water white furry
?
Animals
Humans
• Efficiently learnable: few positive training samples
– reduce sample complexity Related samples
Pug dog
Samoyed dog
Knowledge transfer via attributes
Corgi dog
• Discriminative: object classification task
max margin
-
-
+ +
+
-
-
-
-
+
+
• Non-parametric Bayesian for flexible attribute learning
• Sample relatedness for knowledge transfer
• Discriminative generative model
Preliminaries: Non-parametric
Bayesian
• Bayesian rule applied in machine learning likelihood of prior probability of posterior of given
• Model comparison for model selection:
• Prediction:
• Inflexible models yield unreasonable inferences.
• Non-parametric models can automatically infer an adequate model size/complexity from the data, without needing to explicitly do Bayesian model comparison.
• Many can be derived by starting with a finite parametric model and taking the limit as number of parameters
• Set of observations:
• Constant clusters,
• Cluster assignment for is
• The probability of each sample:
• The likelihood of samples:
• Infinite clusters likelihood
– It is like saying that we have:
• Since we always have limited samples in reality, we will have limited number of clusters used; so we define two sets of clusters:
– numbers of classes for which
– numbers of classes for which
• Assume a reordering, such that
K
K
0
• Generating : binary matrix
– For each column , draw from beta distribution
– For each customer, flip a coin by features
• Distribution of
Integrate out , leaving:
• Generating : binary matrix
– For each column , draw from beta distribution
– For each customer, flip a coin by features
• is sparse
Even , the matrix is expected to have a finite
Number of non-zero elements.
From Finite to Infinite Binary Matrices
• A technical difficulty : the probability for any particular matrix goes to zero as
• However, if consider equivalent classes of matrices in leftordered form obtained by reordering the columns:
–
– is the number of features assigned is the th harmonic number
– This distribution is exchangeable, independent of the ordering.
From Finite to Infinite Binary Matrices a) The binary matrix on the left is transformed to the binary matrix on the right by the function lof().
b) A left-ordered binary matrix generated by Indian
Buffet Process.
Buffet dishes
“Many Indian restaurants offer lunchtime Buffets with an apparently infinite number of dishes.”
• First customer starts at the left of the buffet, and takes a serving from each dish, stopping after a number of dishes as her plate becomes overburdened.
• The i-th customer moves along the buffet, sampling dishes in proportion to their popularity, with probability , and trying a number of new dishes.
Infinite attributes – Indian Buffet Process prior
• prob (image n samples attribute k)
• sample new attributes striped white
• Likelihood :
, wheels
, striped furry bright wheels
(Griffiths & Ghahramani, 2006)
• prob(image n samples attribute k)
• sample new attributes
• Asymptotics ѕ 0 is determined automatically
(Broderick, Kulis, Jordan ICML 2013) bright wheels furry striped white
Mixture of
Gaussians
Bayesian, nonparametric DP mixture flexible, principled cov zero simple, efficient,
“practical” k-means ??
Principled discrete criteria from BNP:
• Dirichlet Process k-means + penalty
• Beta Process squared loss + penalty
• Dependent Dirichlet Process
(Kulis & Jordan, ICML 2012)
(Broderick, Kulis & Jordan, ICML 2013)
(Campbell, Liu, Kulis, How, Carin, NIPS 2013)
polar bear
?
Related samples clown fish
?
(Christiane Fellbaum, WordNet, 1998) motorbike path length in WordNet
Classifiers
Attributes
+
+
-
-
Samoyed dog Cat
Pug dog
+
-
… discriminative
… attributes with sample relatedness
Input features
Pug Dog
Positive Samples
Samoyed Dog
Related Samples
Cat
Negative Samples
Joint Learning of Dictionary & Classifiers
• BCD: alternatingly update classifiers & dictionary w h i l e n ot conver ged d o
1: u p d at e z i k
2 f 0; 1g gr eed il y
2: A !
X Z
>
( Z Z
>
)
Ў 1
3: sam p le a n ew at t r i b u t e a k + 1 p( a k + 1
= x i
) / k x
4: A Г [A ; a k + 1
] i
Ў A z i k
2
: en d
classification accuracy on ImageNet sample-efficient : higher accuracy with fewer training samples generalization better representation of new classes
AwA data
15
20
25
30
50
more related information using related samples increases sample-efficiency
non-parametric: adapts to complexity of the data
representation-efficient
• Flexible attribute learning method
– generalize to new categories
– adaptive to the dataset complexity
• Efficiently learnable
– sample efficiency
– reduce the user annotation effort
• Perform Well
– recognize existing and new categories well
Q&A