Dirichlet Processes and Mixture Models An interactive tutorial: Part 1 Pfunk Meeting 1

advertisement
Dirichlet Processes and
Mixture Models
An interactive tutorial: Part 1
Pfunk Meeting 1
Fall ’11
*some content adapted from El-Arini 2008 & Teh 2007
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
1
Purpose
• Introduction to the Dirichlet Process with
MINIMAL pre-requisites
• Set up for next week’s hands-on exposure
to training mixture models using EM and
DP priors
Demo Code: http://www.cc.gatech.edu/~jscholz6/resources/code/DP_Tutorial/
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
2
Topics
•
Discrete N-D probability distributions
(categorical, multinomial, dirichlet)
•
Dirichlet Processes Metaphors
•
•
•
Polya Urn
Chinese Restaurant Process
Stick-breaking process
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
3
Topics
•
Discrete N-D probability distributions
(categorical, multinomial, dirichlet)
•
•
Dirichlet Process Definition
Dirichlet Process Metaphors
•
•
•
Polya Urn
Chinese Restaurant Process
Stick-breaking process
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
4
Motivation
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
5
Motivation
!"#$%&#$"'
We are given a bunch of data points and
` !"#$%"#&'(")#$#*$+$#,"+-#$)*#$%"#+./*#+0$+#'+#1$,#
told it was generated by a mixture of
&")"%$+"*#2%.3#$#3'4+5%"#.2#6$5,,'$)#*',+%'75+'.),8
gaussians
•
`
9)2.%+5)$+"/:-#).#.)"#0$,#$):#'*"$#!"#$%&'( 6$5,,'$),#
;%.*5<"*#+0"#*$+$8
J. Scholz2 (RIM@GT)
Friday, September 2, 2011
08/26/2011
5
Motivation
!"#$%&#$"'
We are given a bunch of data points and
` !"#$%"#&'(")#$#*$+$#,"+-#$)*#$%"#+./*#+0$+#'+#1$,#
told it was generated by a mixture of
&")"%$+"*#2%.3#$#3'4+5%"#.2#6$5,,'$)#*',+%'75+'.),8
gaussians
•
`
•
Unfortunately, no one said how many
9)2.%+5)$+"/:-#).#.)"#0$,#$):#'*"$#!"#$%&'(
6$5,,'$),#
gaussians produced the data
;%.*5<"*#+0"#*$+$8
J. Scholz2 (RIM@GT)
Friday, September 2, 2011
08/26/2011
5
#$%&#$"'
Motivation
"#$%"#&'(")#$#*$+$#,"+-#$)*#$%"#+./*#+0$+#'+#1$,#
)"%$+"*#2%.3#$#3'4+5%"#.2#6$5,,'$)#*',+%'75+'.),
• Could it be this?
2.%+5)$+"/:-#).#.)"#0$,#$):#'*"$#!"#$%&'( 6$5,
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
6
#$%&#$"'
Motivation
"#$%"#&'(")#$#*$+$#,"+-#$)*#$%"#+./*#+0$+#'+#1$,#
)"%$+"*#2%.3#$#3'4+5%"#.2#6$5,,'$)#*',+%'75+'.),8
• Or perhaps this?
)2.%+5)$+"/:-#).#.)"#0$,#$):#'*"$#!"#$%&'( 6$5,,
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
7
What to do?
• We can guess the number of components,
run Expectation Maximization (EM) for
Gaussian Mixture Models, look at the
results, and then try again...
• We can run hierarchical agglomerative
clustering, and cut the tree at a visually
appealing level...
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
8
What to do?
• Really, we want to cluster the data in a
statistically principled manner, without
resorting to hacks...
>> for a preview, run demo 5.2
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
9
Real examples
• Brain Imaging: Model an unknown number
of spatial activation patterns in fMRI images
[Kim and Smyth, NIPS 2006]
• Topic Modeling: Model an unknown number
of topics across several corpora of
documents [Teh et al. 2006]
• Filtering and planning in unknown
state spaces (iHMM [Beal et. al. 2003],
iPOMDP [Doshi et. al. 2009])
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
10
Topics
•
Discrete N-D probability distributions
(categorical, multinomial, dirichlet)
•
•
Dirichlet Process Definition
Dirichlet Process Metaphors
•
•
•
Polya Urn
Chinese Restaurant Process
Stick-breaking process
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
11
Preliminaries: N-D
Distributions
• Categorical
• Definition: X ∼ Cat(p) ⇒ P (X = x ) = p
• IE, it’s a distribution on the probability
i
i
of one event from a set of k possible
• Semantics:
• A draw from a categorical RV is a single
event
try it! >> Prelim 1.1
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
12
Preliminaries: N-D
Distributions
•
Multinomial
•
Definition
•
•
X ∼ M ulti(p) ⇒ P (X = x) =
xk
x1
n!
...p
p
k (x
x1 !,...,xk ! 1
∈ Z n)
IE, X is a distribution on the number of occurrences of k
possible events, over n total trials
Semantics
•
•
A draw from a multinomial RV is a vector of event counts
think: goes from event probs to event counts
try it! >> Prelim 1.2
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
13
Preliminaries: N-D
Distributions
•
Dirichlet
•
•
Definition
•
xk
x1
n!
...p
(x
p
1
k
x
!,...,x
!
k
�1
Γ( k αk ) �K
αk −1
�
π
k=1 k
Γ(αk )
k
X ∼ M ulti(p) ⇒ P (X = x) =
X ∼ Dir(α) ⇒ P (Π = π) =
IE, X is a distribution over the event probabilities in
a categorical/multinomial RV
Semantics
•
A draw from a dirichlet RV is a vector of event
probabilities
•
think: goes from event counts to event probs
try it! >> Prelim 1.3
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
14
∈
Preliminaries: N-D
Distributions
•
Dirichlet
•
•
Definition
•
xk
x1
n!
...p
(x
p
1
k
x
!,...,x
!
k
�1
Γ( k αk ) �K
αk −1
�
π
k=1 k
Γ(αk )
k
X ∼ M ulti(p) ⇒ P (X = x) =
X ∼ Dir(α) ⇒ P (Π = π) =
IE, X is a distribution over the event probabilities in
a categorical/multinomial RV
Hence it can be thought of as
Semantics
a distribution on distributions
A draw from a dirichlet RV is a vector of event
probabilities
•
•
think: goes from event counts to event probs
try it! >> Prelim 1.3
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
14
∈
When is the Dirichlet useful?
•
Often appears a bayesian setting when we need a prior
on multinomial params (it’s conjugate to the multinomial)
•
Same as the beta distribution for the binomial, except
N-D
•
E.G.: say we want to figure out the probability of a
trick coin, and we only observe 3 heads
•
The ML estimate of p is 1, but that’s a bit strong,
no?
•
Solution? Place a Beta prior on p, and use bayes’
rule*
* for more, see: http://en.wikipedia.org/wiki/Checking_whether_a_coin_is_fair
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
15
Visualizing
the
Dirichlet
Distribution
Dirichlet Processes
Examples of Dirichlet distributions
Yee Whye Teh
J. Scholz (RIM@GT)
Friday, September 2, 2011
(Gatsby)
DP and HDP Tutorial
08/26/2011
Mar 1, 2007 / CUED
4 / 53
16
Topics
•
Discrete N-D probability distributions
(categorical, multinomial, dirichlet)
•
•
Dirichlet Process Definition
Dirichlet Process Metaphors
•
•
•
Polya Urn
Chinese Restaurant Process
Stick-breaking process
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
17
Dirichlet: from Distribution
to Process
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
18
Dirichlet: from Distribution
to Process
•
“A Dirichlet Process (DP) is a distribution over
probability measures”
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
18
Dirichlet: from Distribution
to Process
•
“A Dirichlet Process (DP) is a distribution over
probability measures”
•
“A DP is a distribution over probability measures such
that marginals on finite partitions are Dirichlet
distributed”
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
18
Dirichlet: from Distribution
to Process
•
“A Dirichlet Process (DP) is a distribution over
probability measures”
•
“A DP is a distribution over probability measures such
that marginals on finite partitions are Dirichlet
distributed”
•
“A probability measure is a function from subsets of a
space X to [0, 1] satisfying certain properties”
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
18
Dirichlet: from Distribution
to Process
•
“A Dirichlet Process (DP) is a distribution over
probability measures”
•
“A DP is a distribution over probability measures such
that marginals on finite partitions are Dirichlet
distributed”
•
“A probability measure is a function from subsets of a
space X to [0, 1] satisfying certain properties”
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
18
Dirichlet: from Distribution
to Process
•
“A Dirichlet Process (DP) is a distribution over
probability measures”
•
“A DP is a distribution over probability measures such
that marginals on finite partitions are Dirichlet
distributed”
•
“A probability measure is a function from subsets of a
space X to [0, 1] satisfying certain properties”
•
If you’re thinking “WTF??”, hang on!
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
18
Dirichlet: from Distribution
to Process
•
“A Dirichlet Process (DP) is a distribution over
probability measures”
•
“A DP is a distribution over probability
measures such that marginals on finite
partitions are Dirichlet distributed”
•
“A probability measure is a function from subsets of a
space X to [0, 1] satisfying certain properties”
This is key. We’ll get back to it
•
If you’re thinking “WTF??”, hang on!
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
19
Dirichlet Processes Dirichlet Processes
A Dirichlet
Process (DP) is aDefinition
distribution over probability
Definition
chlet Processes
measures.
A Dirichlet Process
(DP) is a distribution ov
tion
A
Dirichlet
Process
(DP)
is
a
distribution
over
probability
A DP has two parameters:
measures.
Defining a DP
measures.
Base Process
distribution
H,
which is like
the
mean
of
the DP.
A Dirichlet
(DP)
is
a distribution
over
probability
A
DP
has
two
parameters:
A
DP
has
two
parameters:
• A DP has two parameters:
measures.
Strength parameter
α, which
an mean
inverse-variance
thethe
DP.
distribution
H, which of
is like
mean
Base distribution
H, whichisislike
likeBase
the
of the DP.
A DP has two parameters:
Strength
parameter
Base distribution
H,α,
which
is like
thean
mean
of the DP α, which
Strength
parameter
which
is like
inverse-variance
of the is
DP.like an inv
We write:
•
•
Base distribution H, which is like the mean of the DP.
We write:
We
write:
Strength parameter
α, which
is likeα,an
inverse-variance
of the DP. of the DP
Strength
parameter
which
is like an inverse-variance
We write:
G ∼GDP(α,
H)
∼ DP(α,
H)
G ∼ DP(α, H)
We write:
•
forX:any partition (A1 , . . . , An ) of X:
for any partition
(A G, .∼.(A
.DP(α,
,1 ,A. .n.)H)
ofn )ifX:
if for any partition
,A
of
1
if for any partition (A
, A. .n .) ,of
X: n )) ∼ Dirichlet(αH(A
1 , . .1.),
(G(A1 ),1.),. . ., .G(A
Dirichlet(αH(
(G(A
G(A
,
αH(A
))
n )) n∼
(G(A1 ), . . . , G(An )) ∼ Dirichlet(αH(A1 ), . . . , αH(An ))
(G(A1 ), . . . , G(An )) ∼ Dirichlet(αH(A1 ), . . . , αH(An ))
A4
A1
A1
A1
A3 A2
Yee Whye Teh (Gatsby)
Yee Whye Teh (Gatsby)
Whye
Teh (Gatsby)
J. Scholz
(RIM@GT)
Friday, September 2, 2011
A2
A5
6
3
A2
A6
A3A
A
A5
A1
A4
A4
A4
A6
A3
A2
A5
A5
Yee
Whye
Teh (Gatsby)
DP and
HDP
Tutorial
DP and HDP Tutorial
DP and 08/26/2011
HDP Tutorial
Mar 1, 2007 / CUED
DP and
HDP Tutorial
Mar 1, 2007
/ CUED
5 / 53
5 / 53
try
it! >> Prelim 2.1
Mar 1, 2007 / CUED
205 / 5
A closer look
•
A DP has two parameters:
•
•
•
Base distribution H, which is like the mean of the DP
Strength parameter α, which is like an inverse-variance of the DP
We write:
What is the form of H?
∼
G ∼ DP(α, H) if for any partition (A1 , . . . , An ) of X:
(G(A1 ), . . . , G(A
∼n )) ∼ Dirichlet(αH(A1 ), . . . , αH(An ))
A4
A1
A3
A2
J. Scholz (RIM@GT)
Friday, September 2, 2011
A6
A5
08/26/2011
21
What is the form of H?
• Can be any distribution defined over our
event space (e.g. gaussian)
• continuous or discrete: both legal
• only condition is that it has to return a
density for any partition A we give it
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
22
Topics
•
Discrete N-D probability distributions
(categorical, multinomial, dirichlet)
•
•
Dirichlet Process Definition
Dirichlet Process Metaphors
•
•
•
Polya Urn
Chinese Restaurant Process
Stick-breaking process
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
23
So where is the Process in a DP?
•
3 Metaphors:
• Polya-urn
• Involves drawing and replacing balls from an urn
• Chinese Restaurant
• Involves customers sitting at tables in proportion
to their popularity
• Stick-breaking
• Involves breaking off pieces of a stick of unit
length
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
24
So where is the Process in a DP?
•
3 Metaphors:
• Polya-urn
• Involves drawing and replacing balls from an urn
• Chinese Restaurant
• Involves customers sitting at tables in proportion
to their popularity
•
Stick-breaking
•
J. Scholz (RIM@GT)
Friday, September 2, 2011
Involves breaking off pieces of a stick of unit
length
08/26/2011
25
The Polya-urn Scheme
ya’s Urn Scheme
Pòlya’s urn scheme produces a sequence θ1 , θ2 , . . . with the
Polya
urn scheme produces a sequence θ1, θ2, . . . with the
following
conditionals:
•
following conditionals:
eq 1:
θn |θ1:n−1 ∼
�n−1
δθi + αH
n−1+α
i=1
Imagine picking balls of different colors from an urn:
Imagine picking balls of different colors from an urn:
• Start
with no balls in the urn.
with
probability
∝ α,
draw
θ ∼
H, and add a ball of
Start with no
balls
in the
urn.
•
that color into the urn.
With probability
α,pick
drawa θball∼ at
H,random
and add from
a ball of that
With
∝ n −∝1,
• probability
n
n
color
into the
the urn,
record
θn tourn.
be its color, return the ball into
the urn and place a second ball of same color into
urn. With probability ∝ n − 1, pick a ball at random from the
urn, record θn to be its color, return the ball into the urn
and place a second ball of same color into urn.
•
Yee Whye Teh (Gatsby)
J. Scholz (RIM@GT)
Friday, September 2, 2011
DP and HDP Tutorial
08/26/2011
Mar 1, 2007 / CUED
10 / 53
26
The Polya-urn Scheme
ya’s Urn Scheme
Pòlya’s urn scheme produces a sequence θ1 , θ2 , . . . with the
Polya
urn scheme produces a sequence θ1, θ2, . . . with the
following
conditionals:
•
following conditionals:
eq 1:
θn |θ1:n−1 ∼
�n−1
δθi + αH
n−1+α
i=1
number of θi colored balls
Imagine picking balls of different colors from an urn:
Imagine picking balls of different colors from an urn:
• Start
with no balls in the urn.
with
probability
∝ α,
draw
θ ∼
H, and add a ball of
Start with no
balls
in the
urn.
•
that color into the urn.
With probability
α,pick
drawa θball∼ at
H,random
and add from
a ball of that
With
∝ n −∝1,
• probability
n
n
color
into the
the urn,
record
θn tourn.
be its color, return the ball into
the urn and place a second ball of same color into
urn. With probability ∝ n − 1, pick a ball at random from the
urn, record θn to be its color, return the ball into the urn
and place a second ball of same color into urn.
•
Yee Whye Teh (Gatsby)
J. Scholz (RIM@GT)
Friday, September 2, 2011
DP and HDP Tutorial
08/26/2011
Mar 1, 2007 / CUED
10 / 53
26
Polya sampling in practice
•
Equation 1 is of the form (p)f(Ω) + (1-p)g(Ω)
•
Implies that proportion p of density is
associated with f, so we can split the task in
half:
•
first flip a bern(p) coin. If heads, draw from
f, if tails, draw from g
•
for polya urn, gives us either a sample from
existing balls (f), or a new color (g)*
*if g is a continuous density on Ω, then the probability of
sampling an existing cluster from g is zero. (why?)
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
27
ya’s Urn Scheme
Analyzing Polya Urn
One (infinitely long) “run” of our process
∼ ADP(α,
is a random
probability
measure.
draw GH)
∼ DP(α,
H) is a random
probability
measure.
A draw G ∼ DP(α,H) is a random probability measure
•
Treating
G as a distribution,
consider
i.i.d. draws
from
as
a distribution,
consider
i.i.d. draws
from
G:G:
• Treating G as a distribution, consider i.i.d. draws from G:
θi |G ∼ G
•
θi |G ∼ G
One component drawn from G
Marginalizing
out G, marginally
each
H, while
the conditional
i ∼
Marginalizing
out G, each
θi ∼θH,
while
the conditional
ng
out G, marginally
each θi ∼ H, while the conditional
distributions
are,
distributions
are:
s are,
�n−1
i=1 δθi + αH
θn |θ1:n−1
∼
�n−1
n
−
1
+
α
δ + αH
θn |θ1:n−1 ∼
•
i=1
θi
This is the Pòlya
scheme.
n −we
1 did
+ αin the Polya urn scheme*
This isurn
precisely
what
Pòlya urn
* This is scheme.
why people say the that the DP is the “De Finetti distribution underlying the Urn process. It’s what
makes the θi exchangeable. (Since θi are i.i.d. ∼ G, their joint distribution is invariant to permutations)
J. Scholz (RIM@GT)
Yee Whye Teh (Gatsby)
Friday, September 2, 2011
08/26/2011
DP and HDP Tutorial
Mar 1, 2007 / CUED
9 / 53
28
Chinese
Restaurant
Process
Chinese
Restaurant
Process
The Chinese Restaurant
Process
Chinese Restaurant Process
•
Generating from the CRP:
Generating
from
the
CRP:
Generating
from
the
CRP:
First
customer
sits
at the first table
•
Generating
from
the
CRP:
First
customer
sits
thefirst
firsttable.
table.
First customer sits atatthe
Customer
n
sits
at:
•
First
customer
sits
at
the first table.
Customer
n
sits
at:
Customer
at:
Customernnsits
sits at:
Table
k
with
probability
where
n
is
the
number
of
c
Table
k
with
probability,
where
i
is
the
• Table
Table
with probability
is the number
of customers
k kwith
probability where nwhere
n is the
number of c
nk
n
nk
k
α+n−1
k
α+n−1
α+n−1
customers at table k
k
k
number
at
tablekof
at table
.k.
at table k.
α
α
A
new
table
K
+
1
with
probability
.
A new table K + 1 with probability
.
α+n−1
α
α+n−1
A
new
table
K
+
1
with
probability
.
A
new
table
K
+
1
with
probability
α+n−1
Customers ⇔ integers, tables ⇔ clusters.
•
Customers ⇔ integers, tables ⇔ clusters.
Customers
⇔
integers,
tables
⇔
clusters.
The
CRP
exhibits
the
clustering
property
of the
DP.
The
CRP
exhibits
the
clustering
property
of
the DP:
•
The CRP exhibits the clustering property of the DP.
The CRP
the clustering
8 property of the DP.
5
2 4exhibits
1
2
1
2
4
3
4
1
J. Scholz (RIM@GT)
Friday, September 2, 2011
3
35
9
7
6
8
5
78
6
6
7
9
9
Exhibits a rich-get-richer effect
08/26/2011
29
The Chinese Restaurant Process
• Closely related to the Polya Urn process:
• The CRP is the induced distribution over
partitions from an urn process
• Just take all the balls and sort them by
color
• This defines a partition of 1, . . . , n into
K clusters, such that if i is in cluster k,
*
then θi = θ k
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
30
The
Stick-Breaking
Construction
Stick-breaking
Construction
Stick-breaking Construction
•
But how do But
draws
∼ draws
DP(α,G
H)∼look
like?
howGdo
DP(α,
H) look like?
G is discreteGwith
probability
so:
is discrete
withone,
probability
one, so:
But what do draws G ∼ DP(α,H) look like?
•
∞
�
G one,
= so:
πk δG
θk∗ =
G is discrete with probability
k =1
∞
�
πk δθk∗
k =1
The stick-breaking construction shows that G ∼ DP(α,H) if:
•
The stick-breaking
construction
shows thatshows
G ∼ DP(α,
The stick-breaking
construction
that GH)
∼ if:
DP(α, H) if:
πk = βk
k�
−1
l=1
(1
πk −
= ββlk)
k�
−1
l=1
(1 − βl )
βkα)
∼ Beta(1, α)
βk ∼ Beta(1,
!
∗
∗
θk ∼ H
θk ∼ H
!
!(3)
!(4)
!(4)
(5)
(6)
•
!(6)
!(2)
!(3)
!(1)
!(2)
!(1)
!(5)
WeGEM(α)
write
π∼
π
(π
. . distributed
.) is distributed
We write
∼
if πGEM(α)
= if(ππ
πif2(π
, .=
.1,.)π
is21, ,distributed
as above.
1, =
Weπwrite
π ∼ GEM(α)
. π. .)2 , is
as as above.
above
Yee Whye Teh (Gatsby)
Yee Whye Teh (Gatsby)
J. Scholz (RIM@GT)
Friday, September 2, 2011
DP and HDP Tutorial
DP and HDP Tutorial
08/26/2011
Mar 1, 2007
/ CUED
Mar 1, 2007 / CUED
15 / 53
31
15
The
Stick-Breaking
Construction
Stick-breaking
Construction
Stick-breaking Construction
•
But how do But
draws
∼ draws
DP(α,G
H)∼look
like?
howGdo
DP(α,
H) look like?
G is discreteGwith
probability
so:
is discrete
withone,
probability
one, so:
mixing proportion
But what do draws G ∼ DP(α,H) look like?
•
∞
�
G one,
= so:
πk δG
θk∗ =
G is discrete with probability
k =1
∞
�
point mass
πk δθk∗
k =1
The stick-breaking construction shows that G ∼ DP(α,H) if:
•
The stick-breaking
construction
shows thatshows
G ∼ DP(α,
The stick-breaking
construction
that GH)
∼ if:
DP(α, H) if:
πk = βk
k�
−1
l=1
(1
πk −
= ββlk)
k�
−1
l=1
(1 − βl )
βkα)
∼ Beta(1, α)
βk ∼ Beta(1,
!
∗
∗
θk ∼ H
θk ∼ H
!
!(3)
!(4)
!(4)
(5)
(6)
•
!(6)
!(2)
!(3)
!(1)
!(2)
!(1)
!(5)
WeGEM(α)
write
π∼
π
(π
. . distributed
.) is distributed
We write
∼
if πGEM(α)
= if(ππ
πif2(π
, .=
.1,.)π
is21, ,distributed
as above.
1, =
Weπwrite
π ∼ GEM(α)
. π. .)2 , is
as as above.
above
Yee Whye Teh (Gatsby)
Yee Whye Teh (Gatsby)
J. Scholz (RIM@GT)
Friday, September 2, 2011
DP and HDP Tutorial
DP and HDP Tutorial
08/26/2011
Mar 1, 2007
/ CUED
Mar 1, 2007 / CUED
15 / 53
31
15
The Stick-Breaking Construction
Stick-breaking Construction
•
But how do draws G ∼ DP(α, H) look like?
Why does this make
G issense?
discrete with probability one, so:
•
•
∞
�
Draws from the beta(1,alpha) give
G =a π δ
distribution over the interval (0,1), which we
The stick-breaking construction shows that G ∼ DP(α, H) if:
can think of as where to break the stick
k θk∗
k=1
k−1
�
th sample
The product
scales
the
k
!
l=1
!
β
∼
Beta(1,
α)
according to how
much has been broken
off
!
k
!
∗
θk ∼ H
!
already
πk = βk
(1 − βl )
!(1)
(2)
(3)
(4)
(5)
(6)
•
We write π ∼ GEM(α) if π = (π1 , π2 , . . .) is distributed as above.
In the limit we get another infinite partitioning
of our interval [0,1], and therefore a (discrete)
probability measure
Yee Whye Teh (Gatsby)
J. Scholz (RIM@GT)
Friday, September 2, 2011
09/02/2011
DP and HDP Tutorial
Mar 1, 2007 / CUED
32
1
End Part 1
Questions?
J. Scholz (RIM@GT)
Friday, September 2, 2011
08/26/2011
33
Download