Jiashi Feng

Learning Scalable Discriminative

Dictionaries with Sample Relatedness a.k.a.

“Infinite Attributes”

Jiashi Feng, Stefanie Jegelka,

Shuicheng Yan, Trevor Darrell

Attribute Learning

striped water white furry bright wheels

• generalizable vs. discriminative

• which attributes to use?

(Lampert, Nickisch & Harmeling, 2009; Farhadi &Forsyth, 2009; Parikh & Grauman, 2011 …)

Attribute Generative Model Cartoon cup face car

… objects eye eye eye mouth eye nose nose eye nose mouth eye mouth attributes features

…edges…


… objects eye eye eye mouth eye nose nose eye nose mouth eye mouth attributes features

…edges…


… objects eye eye eye nose eye mouth nose eye eye nose mouth mouth attributes features

…edges…


… objects eye eye eye nose eye mouth nose eye eye nose mouth mouth attributes features

…edges…


… objects eye eye eye eye nose mouth nose eye eye nose mouth mouth attributes features

…edges…


… objects eye eye eye eye nose mouth nose eye eye nose mouth mouth attributes features

…edges…


… objects eye eye eye mouth eye nose nose eye eye nose mouth mouth attributes features

…edges…


… objects eye eye eye mouth eye nose nose eye eye mouth nose mouth attributes features

…edges…


… objects eye eye eye mouth eye nose nose eye eye mouth nose mouth attributes features

…edges…

Goals I

• Flexibility: automatically determine the attributes

– as expressive as needed, as compact as possible

 non-parametric Bayesian striped water white furry striped water white furry

?

Animals

Humans

Goals II

• Efficiently learnable: few positive training samples

– reduce sample complexity  Related samples

Pug dog

Samoyed dog

Knowledge transfer via attributes

Corgi dog

Goals III

• Discriminative: object classification task

 max margin

-

-

+ +

+

-

-

-

-

+

+

Outline

• Non-parametric Bayesian for flexible attribute learning

• Sample relatedness for knowledge transfer

• Discriminative generative model

Preliminaries: Non-parametric

Bayesian

• Bayesian rule applied in machine learning likelihood of prior probability of posterior of given

• Model comparison for model selection:

• Prediction:

Non-parametric Bayesian Models

• Inflexible models yield unreasonable inferences.

• Non-parametric models can automatically infer an adequate model size/complexity from the data, without needing to explicitly do Bayesian model comparison.

• Many can be derived by starting with a finite parametric model and taking the limit as number of parameters

Finite Mixture Model

• Set of observations:

• Constant clusters,

• Cluster assignment for is

• The probability of each sample:

• The likelihood of samples:

Infinite Mixture Model

• Infinite clusters likelihood

– It is like saying that we have:

• Since we always have limited samples in reality, we will have limited number of clusters used; so we define two sets of clusters:

– numbers of classes for which

– numbers of classes for which

• Assume a reordering, such that

K



K

0

Finite Feature Model

• Generating : binary matrix

– For each column , draw from beta distribution

– For each customer, flip a coin by features

• Distribution of

Integrate out , leaving:

Finite Feature Model

• Generating : binary matrix

– For each column , draw from beta distribution

– For each customer, flip a coin by features

• is sparse

Even , the matrix is expected to have a finite

Number of non-zero elements.

From Finite to Infinite Binary Matrices

• A technical difficulty : the probability for any particular matrix goes to zero as

• However, if consider equivalent classes of matrices in leftordered form obtained by reordering the columns:

–

– is the number of features assigned is the th harmonic number

– This distribution is exchangeable, independent of the ordering.

From Finite to Infinite Binary Matrices a) The binary matrix on the left is transformed to the binary matrix on the right by the function lof().

b) A left-ordered binary matrix generated by Indian

Buffet Process.

Buffet dishes

Indian Buffet Process

“Many Indian restaurants offer lunchtime Buffets with an apparently infinite number of dishes.”

• First customer starts at the left of the buffet, and takes a serving from each dish, stopping after a number of dishes as her plate becomes overburdened.

• The i-th customer moves along the buffet, sampling dishes in proportion to their popularity, with probability , and trying a number of new dishes.

Non-parametric Learning

Infinite attributes – Indian Buffet Process prior

• prob (image n samples attribute k)

• sample new attributes striped white

• Likelihood :

, wheels

, striped furry bright wheels

(Griffiths & Ghahramani, 2006)

Asymptotic Model

• prob(image n samples attribute k)

• sample new attributes

• Asymptotics ѕ 0 is determined automatically

(Broderick, Kulis, Jordan ICML 2013) bright wheels furry striped white

Asymptotics

Mixture of

Gaussians

Bayesian, nonparametric DP mixture flexible, principled cov  zero simple, efficient,

“practical” k-means ??

Principled discrete criteria from BNP:

• Dirichlet Process  k-means + penalty

• Beta Process  squared loss + penalty

• Dependent Dirichlet Process

(Kulis & Jordan, ICML 2012)

(Broderick, Kulis & Jordan, ICML 2013)

(Campbell, Liu, Kulis, How, Carin, NIPS 2013)

Sample Relatedness

polar bear

?

Related samples clown fish

?

(Christiane Fellbaum, WordNet, 1998) motorbike path length in WordNet

Classifiers

Attributes

Full Model

+

+

-

-

Samoyed dog Cat

Pug dog

+

-

… discriminative

… attributes with sample relatedness

Input features

Pug Dog

Positive Samples

Samoyed Dog

Related Samples

Cat

Negative Samples

Joint Learning of Dictionary & Classifiers

• BCD: alternatingly update classifiers & dictionary w h i l e n ot conver ged d o

1: u p d at e z i k

2 f 0; 1g gr eed il y

2: A !

X Z

>

( Z Z

>

)

Ў 1

3: sam p le a n ew at t r i b u t e a k + 1 p( a k + 1

= x i

) / k x

4: A Г [A ; a k + 1

] i

Ў A z i k

2

: en d

Does It Work?

classification accuracy on ImageNet sample-efficient : higher accuracy with fewer training samples generalization better representation of new classes

AwA data

15

20

25

30

50

Why Does It Work?

more related information using related samples increases sample-efficiency

Why Does It Work?

non-parametric: adapts to complexity of the data

 representation-efficient

Conclusions

• Flexible attribute learning method

– generalize to new categories

– adaptive to the dataset complexity

• Efficiently learnable

– sample efficiency

– reduce the user annotation effort

• Perform Well

– recognize existing and new categories well

Thanks!

Q&A

Jiashi Feng

Attribute Learning

Goals I

Goals II

Goals III

Outline

Non-parametric Bayesian Models

Finite Mixture Model

Infinite Mixture Model

Finite Feature Model

Finite Feature Model

Indian Buffet Process

Non-parametric Learning

Asymptotic Model

Asymptotics

Sample Relatedness

Full Model

Does It Work?

Why Does It Work?

Why Does It Work?

Conclusions

Thanks!

Related documents

Products

Support

Jiashi Feng

Attribute Learning

Goals I

Goals II

Goals III

Outline

Non-parametric Bayesian Models

Finite Mixture Model

Infinite Mixture Model

Finite Feature Model

Finite Feature Model

Indian Buffet Process

Non-parametric Learning

Asymptotic Model

Asymptotics

Sample Relatedness

Full Model

Does It Work?

Why Does It Work?

Why Does It Work?

Conclusions

Thanks!

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib