Chapter 1 Dirichlet distribution

advertisement
Chapter 1
Dirichlet distribution
The Dirichlet distribution is intensively used in various fields: biology ([11]), astronomy
([24]), text mining ([4]), ...
It can be seen as a random distribution on a finite set. Dirichlet distribution is a very
popular prior in Bayesian statistics because the posterior distribution is also a Dirichlet
distribution. In this chapter we give a complete presentation of this interesting law: representation by Gamma’s distribution, limit distribution in a contamination model. (The
Polya urn scheme), ...
1.1
Random probability vectors
Consider a partition of a nonvoid finite set E with cardinality ♯E = n ∈ N∗ into d nonvoid disjoint subsets. To such a partition corresponds a partition of the integer n, say
c1 , . . . , cd , that is a finite family of positive integers, such that c1 + . . . + cd = n. Thus,
if pj =
cj
,
n
we have p1 + . . . + pd = 1.
In biology for example, pj can represent the percentage of the j th specy in a population.
1
CHAPTER 1. DIRICHLET DISTRIBUTION
2
So we are lead to introduce the following d-dimentional simplex:
△d−1 = {(p1 , . . . , pd ) : pj ≥ 0,
d
X
pj = 1}.
j=1
When n tends to infinity, this yields to the following notion:
Definition 1.1.1. One calls mass-partition any infinite numerical sequence
p = (p1 , p2 , . . .)
such that p1 ≥ p2 ≥ . . . and
P∞
1
pj = 1.
The space of mass-partitions is denoted by
∇∞ = {(p1 , p2 , . . .) : p1 ≥ p2 ≥ . . . ; pj ≥ 0, j ≥ 1,
∞
X
pj = 1}.
j=1
Lemma 1.1.1. (Bertoin [28] page 63) Let x1 , . . . , xd−1 be d − 1 i.i.d. random variables
uniformly distributed on [0, 1] and let x(1) < . . . < x(d−1) denote its order statistic, then
the random vector
(x(1) , . . . , x(d−1) − x(d−2) , 1 − x(d−1) )
is uniformly distributed on △d−1 .
1.2
Polya urn (Blackwell and MacQueen ) [3]
We consider an urn that contains d colored balls numbered from 1 to d. Initially, there is
only one ball of each color in the urn. We draw a ball, we observe its color and we put
it back in the urn with another ball having the same color. Thus at the instant n we have
n + d balls in the urn and we have added n = N1 + . . . + Nd balls with Nj balls of color
j.
We are going to show that the distribution of ( Nn1 ,
distribution.
N2
, . . . , Nnd )
n
converges to a limit
1.2. POLYA URN (BLACKWELL AND MACQUEEN ) [3]
3
1.2.1 Markov chain
Proposition 1.2.1.
lim (
n−→∞
Nd d
N1
,...,
) = (Z1 , Z2 , . . . , Zd )
n
n
where (Z1 , Z2 , . . . , Zd ) have a uniform distribution on the simplex △d−1 .
Proof
Denote the projection operation
πi : Rd → R
x = (x1 , . . . , xd ) 7→ xi
and
θi (x) = (x1 , . . . , xi−1 , xi + 1, xi+1 , . . . , xd ).
Let
S(x) =
d
X
xi
i=1
and
fi (x) =
πi (x) + 1
.
S(x) + d
Define a transition kernel as follows
P (x, θi (x)) =
P (x, y) = 0,
πi (x) + 1
S(x) + d
y 6∈ {θ1 (x), . . . , θd (x)}.
if
Recall that for any non-negative (resp. bounded) measurable function g defined on Rd ,
the function P g is defined as
P g(x) =
Z
Rd
g(y)P (x, dy).
CHAPTER 1. DIRICHLET DISTRIBUTION
4
Here we see that
P g(x) =
d
X
g(θj (x))
i=1
First step :
πi (x) + 1
.
S(x) + d
Consider Yn = (Yn1 , . . . , Ynd ) where (Yni )0≤i≤d is the number of balls of color i added to
the urn at nth step. We clearly see that (Yn+1 ) only depends on the nth step so that (Yn )n
is Markov chain with transition kernel
P (Yn , θi (Yn )) =
πi (Yn ) + 1
S(Yn ) + d
and
Y0 = (0, . . . , 0).
On the other hand,
P fi (Yn ) =
since
Pd
πi (Yn )+1 πi (θj (Yn ))+1
j=1 S(Yn )+d S(θj (Yn ))+d



 πi (θj (Yn )) = πi (Yn ) if i 6= j,
πi (θi (Yn )) = πi (Yn ) + 1 if i = j,



S(θi (Yn )) = S(Yn ) + 1.
Then
P fi (Yn ) =
P
πi (Yn )+1 πj (Yn )+1
i6=j S(Yn )+d S(Yn )+d+1
+
πi (Yn )+1 πi (Yn )+2
S(Yn )+d S(Yn )+d+1
P
=
πi (Yn )+1
[π (Y )
(S(Yn )+d)(S(Yn )+d+1) i n
+2+
=
πi (Yn )+1
[π (Y )
(S(Yn )+d)(S(Yn )+d+1) i n
+ 2 + (S(Yn ) + d − 1 − πi (Yn ))]
= fi (Yn ).
i6=j
πj (Yn ) + 1]
(1.1)
1.2. POLYA URN (BLACKWELL AND MACQUEEN ) [3]
5
implies that fi (Yn ) is a positive martingale which converges almost sure towards a random variable Zi . Since fi (Yn ) is bounded by 1, it is also convergent in the Lp spaces,
according to the bounded convergence theorem. We then see that :
n+d
1
πi (Yn )
=
fi (Yn ) −
n
n
n
converges to the same limit Zi almost surely and in Lp .
By the martingale properties we have moreover that
E(fi (Yn )) = E(fi (Y0 )).
Consequently
E(limn→∞ f (Yn )) = limn→∞ E(f (Yn ))
= E(f (Y0 )),
so
1
E(Zi ) = E(fi (Y0 )) = .
d
Second step:
Let
∧d−1 = {(p1 , . . . , pd−1 ) : pi ≥ 0
d−1
X
pi ≤ 1},
i=1
and
hu (Yn ) =
(S(Yn ) + d − 1)! π1 (Yn )
π (Y )
u1
. . . ud d n
Qd
i=1 πi (Yn )!
The uniform measure λd on △d−1 is defined as follows: for any borelian bounded function F (u1 , . . . , ud ) we have:
Z
Z
F (u)λd (du) =
F (u1 , . . . , ud−1 , 1 − u1 − u2 − . . . − ud−1 )du1 . . . . .dud−1
△d−1
∧d−1
CHAPTER 1. DIRICHLET DISTRIBUTION
6
Now, let us compare the moments of (Z1 , Z2 , . . . , Zd ) with the ones of λd .
Using formula (1.1)
hu (θi (Yn )) =
S(Yn )+d
u h (Y ).
πi (Yn )+1 i u n
hence
P
P hu (Yn ) = hu (Yn )( di ui ) = hu (Yn ).
implies that (hu (Yn )) is a martingale and similarly
Z
gk (Yn ) =
hu (Yn )uk11 . . . ukdd λd (du)
△d−1
is a martingale because
P gk (Yn ) =
=
Pd
i
R
P (Yn , θi (Yn ))
△d−1
R
△d−1
hu (Yn )uk11 . . . ukdd λd (du)
P hu (Yn )uk11 . . . ukdd λd (du)
= gk (Yn ).
This gives
E(gk (Yn )) = E(gk (Y0 )).
On the other hand
gk (Yn ) =
Qi=d
=
Qi=d
i=1 [πi (Yn )
+ 1] . . . [πi (Yn ) + ki ]
(n + d) . . . (n + s(k) + d − 1)
[πi (Yn )+1]
. . . [πi (Ynn)+ki ]
i=1
n
(n+d)
. . . (n+s(k)+d−1)
n
n
so that
0 ≤ gk (Yn ) ≤
d
Y
2ki = 2S(k) .
i=1
Therefore by the bounded convergence theorem
1.2. POLYA URN (BLACKWELL AND MACQUEEN ) [3]
7
limn→∞ E(gk (Yn )) = E(limn→∞ gk (Yn ))
= E(Z1k1 . . . Zdkd )
=
Q
(d−1)! di=1 ki !
(S(k)+d−1)!
= cd
where cd = (d − 1)!
Indeed if
mk =
Z
△d−1
R
△d−1
uk11 . . . ukdd λd (du),
uk11 . . . ukdd cd λd (du)
integrations and recurrences yield,
mk =
Qd
ki !
.
(S(k) + d − 1)!
i=1
Taken (k1 , . . . , kd ) = (0, . . . , 0), we see that cd = (d − 1)!.
Further, if µ is the distribution of (Z1 , . . . , Zd ), then cd λd and µ have the same moments
and since △d−1 is compact, the theorem of monotone class yields, µ = cd λd .
1.2.2 Gamma, Beta and Dirichlet densities
Let α > 0, the gamma distribution with parameter α, denoted Γ(α, 1), is defined by the
probability density function:
f (y) = y α−1
e−y
11{y>0} .
Γ(α)
Let Z1 , . . . , Zd be d independent real random variables with gamma distributions Γ(α1 , 1), . . . , Γ(αd , 1),
respectively, then it is well-known that Z = Z1 + . . . + Zd has distribution Γ(α1 + . . . +
αd , 1).
CHAPTER 1. DIRICHLET DISTRIBUTION
8
Let a, b > 0, a beta distribution with parameter (a, b), denoted β(a, b), is defined by
the probability density function:
Γ(a + b) a−1
x (1 − x)b−1 11{0<x<1} .
Γ(a)Γ(b)
From these densities it is easily seen that the following function is a density function:
Definition 1.2.1. For any α = (α1 , . . . , αd ) where αi > 0 for any i = 1, . . . , d, the
density function d(y1 , y2 , . . . , yd−1 | α) defined as
d−1
X
Γ(α1 + . . . + αk ) α1 −1
αd−1 −1
y
. . . yd−1 (1 −
yh )αd −1 11∧d−1 (y)
Γ(α1 ) . . . Γ(αk ) 1
h=1
(1.2)
is called the Dirichlet density with parameter (α1 , . . . , αd ).
Proposition 1.2.2. Let (Z1 , Z2 , . . . , Zd ) be uniformly distributed on △d−1 . Then the
random vector (Z1 , Z2 , . . . , Zd−1 ) has the the Dirichlet density (1.2)
Proof
Let λi ∈ N for any i ∈ {1, . . . , d}.
Let (Y1 , Y2 , . . . , Yd−1 ) be a random vector with Dirichlet density defined in (1.2).
Pd−1
Let Yd = 1 − i=1
yi . Then
λ
E(Y1λ1 . . . Ydλd ) = E(Y1λ1 . . . Yd d−1 [1 −
=
Γ(α1 +...+αk )
Γ(α1 )...Γ(αd )
[1 −
=
Pd−1
i=1
R
Pd−1
i=1
Yi ]λd )
α
∧d−1
d−1
y1α1 +λ1 −1 . . . yd−1
yi ]αd +λd −1 dy1 . . . dyd−1
Γ(α1 +...+αd )Γ(α1 +λ1 )...Γ(αd +λd )
P
.
Γ(α1 )...Γ(αd )Γ((α1 +...+αd )+ di=1 λi )
+λd−1 −1
1.3. DIRICHLET DISTRIBUTION
9
Consequently, if λi , i ∈ {1, . . . , d} are non-negative integers and α1 = . . . = αd = 1,
then
E(Y1λ1
. . . Ydλd )
Q
(d − 1)! di=1 λi !
.
=
((d − 1) + S(λ))!
Now the proof of the preceding proposition 1.2.1 shows that (Z1 , Z2 , . . . , Zd ) and
(Y1 , . . . , Yd ) have the same moments, and thus the same distribution.
Consequently (Z1 , Z2 , . . . , Zd−1 ) has the same distribution as (Y1 , . . . , Yd−1 ) which is
by construction d(y1 , y2 , . . . , yd−1 | α).
1.3
Dirichlet distribution
The Dirichlet density is not easy to be handled and the following theorem gives an
interesting construction where appears this density.
Theorem 1.3.1. Let Z1 , . . . , Zd be d independent real random variables with gamma
distributions Γ(α1 , 1), . . . , Γ(αd , 1) respectively and let Z = Z1 + . . . + Zd . Then the
random vector ( ZZ1 , . . . , Zd−1
) has a Dirichlet density with parameters (α1 , . . . , αd ).
Z
Proof
The mapping
(y1 , . . . , yd ) 7→ (
y1
yd−1
,...,
, y1 + . . . + yd )
y1 + . . . + yd
y1 + . . . + yd
is a diffeomorphism from [0, ∞)d , to ∧d−1 ×]0, ∞) with Jacobian ydd−1 and reciprocal
function:
(y1 , . . . , yd ) 7→ (y1 yd , . . . , yd−1 yd , yd [1 −
d−1
X
i=1
yi ]).
CHAPTER 1. DIRICHLET DISTRIBUTION
10
The density of (Z1 , . . . , Zd−1 , Z) at point (y1 , . . . , yd ) is therefore equal to:
α
−1
d−1
(1 −
e−yd y1α1 −1 . . . yd−1
d−1
X
yi )αd −1
i=1
Integrating w.r.t. yd and using the equality
R∞
0
ydα1 +...+αd −d d−1
y .
Γ(α1 ) . . . Γ(αd ) d
e−yd ydα−1 dyd = Γ(α1 + . . . + αd ), we see
that the density of ( ZZ1 , . . . , Zd−1
) is a Dirichlet density with parameters (α1 , . . . , αd ). Z
Definition 1.3.1. Let Z1 , . . . , Zd be d independent real random variables with gamma
distributions Γ(α1 , 1), . . . , Γ(αd , 1), respectively, and let Z = Z1 + . . . + Zd . The
Dirichlet distribution with parameters (α1 , . . . , αd ) is the distribution of the random
vector ( ZZ1 , . . . , ZZd ).
Not that the Dirichlet distribution is singular w.r.t Lebesgue measure in Rd since it
is supported by ∆d−1 which has Lebesgue measure 0.
The following can be easily proved
Proposition 1.3.1. With the same notation as in Theorem 1.3.1 let Yi =
Zi
,
Z
i = 1, . . . , d
then Yi has a beta distribution β(αi , α1 + . . . + αi−1 + αi+1 + . . . + αd ) and
E(yi ) =
1.4
αi αj
αi
, E(yi yj ) =
.
α1 + . . . + αd
(α1 + . . . + αk )(α1 + . . . + αd + 1)
Posterior distribution and Bayesian estimation
Consider the Dirichlet distribution D(α1 , . . . , αd ) as a prior on p = (p1 , p2 , . . . , pd ) ∈
∆d−1 .
Let X be a random variable assuming values in {1, . . . , d}, such that P (X = i | p) = pi .
Then the posterior distribution p | X = i is Dirichlet D(α1 , . . . , αi−1 , αi + 1, . . . , αd ).
P
Indeed let Ni = nj=1 11Xj =i , 1 ≤ i ≤ d. The likelihood of the sample is
d−1
Y
i=1
Ni
p (1 −
d−1
X
i=1
pi )Nd .
1.4. POSTERIOR DISTRIBUTION AND BAYESIAN ESTIMATION
11
If the prior distribution of p is D(α1 , . . . , αd ), the posterior density will be proportional
to
d−1
Y
piαi +Ni (1 −
i=1
d−1
X
pi )Nd +αd .
i=1
Thus the posterior distribution of p is D(α1 + N1 , α2 + N2 , . . . , αk + Nd ).
If (X1 , . . . , Xn ) is a sample of law p = (p1 , . . . , pd ) on {1, . . . , d} then the average
Bayesian estimation of p is:
α1 + N1
α2 + N2
αk + Nd
p′ = ( Pd
, Pd
, . . . , Pd
).
i=1 αi + 1
i=1 αi + 1
i=1 αi + 1
Proposition 1.4.1. ([19]) Let r1 , . . . , rl be l integers such that 0 < r1 < . . . < rl = d.
1. If (Y1 , . . . , Yd ) ∼ D(α1 , . . . , αd ), then
rl
rl
r2
r2
r1
r1
X
X
X
X
X
X
αi , . . . ,
Yi , . . . ,
αi .
Yi ∼ D
αi ,
Yi ,
1
r1 +1
rl−1
1
r1 +1
rl−1
2. If the prior distribution of (Y1 , . . . , Yd ) is D(α1 , . . . , αd ) and if
P (X = j | Y1 , . . . , Yd ) = Yj
a.s for j = 1, . . . , d, then the posterior distribution of (Y1 , . . . , Yd ) given X = j
(j)
(j)
is D(α1 , . . . , αk ) where
(j)
αi =


αi if i 6= j
 αj + 1 if i = j
3. Let D(y1 , . . . , yd | α1 , . . . , αd ) denote the distribution function of the Dirichlet
distribution D(α1 , . . . , αd ), that is
Z
0
z1
...
Z
0
D(y1 , . . . , yd | α1 , . . . , αd ) = P (Y1 ≤ y1 , . . . , Yd ≤ yd ).
zd
yj dD(y1 , . . . , yd | α1 , . . . , αd ) =
αj
(j)
(j)
D(z1 , . . . , zd | α1 , . . . , αd ).
α
CHAPTER 1. DIRICHLET DISTRIBUTION
12
Proof
1. Recall that: if Z1 ∼ Γ(α1 ), Z2 ∼ Γ(α2 ), and if Z1 and Z2 are independent then
Z1 + Z2 ∼ Γ(α1 + α2 ). We conclude by recurrence.
2. Is obtained then by induction.
3. Using 2
P (X = j, Y1 ≤ z1 , . . . , Yd ≤ zd ) = P (X = j)P (Y1 ≤ z1 , . . . , Yd ≤ zd | X = j)
= E(E(11X=j | Y1 , . . . , Yd ))
(j)
j
× D(z1 , . . . , zd | α1 , . . . , α(d)
)
(j)
(j)
= E(Yj )D(z1 , . . . , zd | α1 , . . . , αd )
=
αj
D(z1 , . . . ,
α
(j)
(j)
zd | α1 , . . . , αd ).
1.4. POSTERIOR DISTRIBUTION AND BAYESIAN ESTIMATION
13
On the other hand
P (X = j, Y1 ≤ z1 , . . . , Yd ≤ zd ) = E(11{X=j, Y1 ≤z1 ,..., Yd ≤zd } )
= E(E(11{X=j, Y1 ≤z1 ,..., Yd ≤zd } | Y1 , . . . , YK )
= E(11{Y1 ≤z1 ,..., Yd ≤zd } E(11{X=j} | Y1 , . . . , Yd ))
= E(11{Y1 ≤Z1 ,..., Yd ≤zd } Yj ))
=
R z1
0
...
R zd
0
Yj dD(Y1 , . . . , Yd | α(1) , . . . , α(d) ).
14
CHAPTER 1. DIRICHLET DISTRIBUTION
Chapter 2
Poisson-Dirichlet distribution
The Poisson-Dirichlet distribution is a probability measure introduced by J.F.C Kingman
[31] on the set
▽∞ = {(p1 , p2 , . . .); p1 ≥ p2 ≥ . . . , pi ≥ 0,
∞
X
pj = 1}.
j=1
It can be considered as a limit of some specific Dirichlet distributions and is also, as
shown below, the distribution of the sequence of the jumps of a Gamma process arranged
by decreasing order and normalized .
We will also see how Poisson-Dirichlet distribution is related to Poisson processes.
2.1
Poisson- Dirichlet distribution (J.F.C Kingman)
2.1.1 Gamma process and Dirichlet distribution
Definition 2.1.1. We say that X = (Xt )t∈R+ is a Levy process if for every s, t ≥ 0, the
increment Xt+s − Xt is independent of the process (Xv , 0 ≤ v ≤ t) and has the same
law as Xs , in particular, P(X0 = 0) = 1.
15
CHAPTER 2. POISSON-DIRICHLET DISTRIBUTION
16
Definition 2.1.2. A subordinator is a Levy process taking values in [0, ∞), which implies that its sample paths are increasing.
Definition 2.1.3. A Gamma process is a subordinator such that its Levy measure is
γ(dx) = x−1 e−x dx.
Remark 2.1.1. Let ξ be a gamma process. Let α1 , . . . , αn > 0, t0 = 0, tj = α1 + . . . +
αj , for 1 ≤ j ≤ n and Yj = ξ(tj ) − ξ(tj−1 ) then
Yj ∼ Γ(αj ).
Moreover, Y1 , Y1 . . . , Yn are independent.
Let Y = Y1 + . . . + Yn = ξ(tn ) and p = (p1 , . . . , pn ) with pj =
Yj
Y
then p is a random
vector on ∧n−1 having D(α1 , . . . , αn ) distribution. Therefore we get a random vector
having Dirichlet distribution.
2.1.2 The limiting order statistic
Let D(α1 , . . . , αn ) be a Dirichlet distribution defined as in chapter 1 and let:
fα1 , ..., αd (p1 , p2 , . . . , pd ) =
Γ(α1 + . . . + αd ) α1 −1
p
. . . pαd d −1 11△d−1 .
Γ(α1 ) . . . Γ(αd ) 1
(2.1)
Assume that the αi are equal, then fα1 , ..., αd (p1 , p2 , . . . , pd ) reduces to
d(p1 , p2 , . . . , pd | α) =
Γ(N α)
(p1 . . . pd )α−1 .
Γ(α)d
(2.2)
In this section we prove the following theorem which exhibits the limiting joint distribution of the order statistics p(1) ≥ p(2) ≥ . . . an element of the subset ▽∞ of the set △∞ .
Consider the following mapping
ψ : △∞ −→ ▽∞
2.1. POISSON- DIRICHLET DISTRIBUTION (J.F.C KINGMAN)
17
(p1 , p2 , . . .) 7−→ (p(1) , p(2) , . . .).
If P is any probability measure on ▽∞ , and n is any positive integer, then the random nvector (p(1) , p(2) , . . . , p(n) ) has a distribution depending on P ,which might be called the
nth marginal distribution of P . The measure P is uniquely determined by its marginal
distributions.
Theorem 2.1.1. For each λ ∈]0, ∞[, there exists a probability measure Pλ on ▽∞
with the following property. If for each N the random vector p is distributed over △N
according to the distribution (2.1) with α = αN , and if N αN → λ as N → ∞, then for
any n the distribution of the random vector p = (p(1) , p(2) , . . . , p(n) ) converges to the
nth marginal distribution of Pλ as N → ∞.
Proof
Let y1 , y2 , . . . , yN be independent random variables, each having a gamma distribution
Γ(λ, 1). We know that if S = y1 + y2 + . . . + yN , then (y1 /S, y2 /S, . . . , yn /S) has a
Dirichlet distribution D(λ, . . . , λ).
To exploit this fact, consider as above a gamma process ξ, that is a stationary random
process (ξ(t), t ≥ 0) with ξ(0) = 0. The process ξ increases only in jumps. The positions of these jump forms a random countable dense subset J(ξ) of (0, ∞), with
P {t ∈ J(ξ)} = 0
(2.3)
for all t > 0. For each value of N , write
qj (N ) =
ξ(jαN ) − ξ (j − 1)αN
(2.4)
ξ(N αN )
by the result cited above, the vector q = (q(1) , q(2) , . . . , q(N ) ) has the same distribution
as p and it therefore suffices to prove the theorem with p replaced by q. We shall in fact
prove that
lim q(j) (N ) = δξ(j) /ξ(λ)
N →∞
(2.5)
CHAPTER 2. POISSON-DIRICHLET DISTRIBUTION
18
where the δξ(j) ’s are the magnitudes of the jumps in (0, λ) arranged in descending order.
This will suffice to prove the theorem, with Pλ the distribution of the sequence
(δξ(j) /ξ(λ); j = 1, 2 . . .)
(2.6)
since this sequence lies in ▽∞ as a consequence of the equality
ξ(λ) =
∞
X
δξ(j) .
(2.7)
j=1
For any integer n, choose N0 so large that, for any N ≥ N0 , the discontinuities of height
δξ(j) (j = 1, 2, . . . , n) are contained in distinct intervals ((i − 1)αN , iαN ). Then
ξ(N αN )q(j) ≥ δξ(j)
(1 ≤ j ≤ n,
N ≥ N0 ),
so that
lim q(j) ≥ δξ(j) /ξ(λ).
(2.8)
For j = 1, 2, . . . , n. Since n is arbitrary, (2.8) holds for all j, and moreover, Fatou’s
lemma and (2.7) give
lim q(j) = lim{1 −
X
q(i) } ≤ 1 −
i6=j
X
lim q(j) ≤ 1 −
i6=j
X
{δξ(i) /ξ(λ)} = δξ(j) /ξ(λ).
i6=j
Hence,
δξ(j) /ξ(λ) ≤ lim q(j) ≤ lim q(j) ≤ δξ(j) /ξ(λ).
Thus,
lim q(j) = δξ(j) /ξ(λ).
By definition of δξ(j) /ξ(λ), we have
δξ(1) /ξ(λ) ≥ δξ(2) /ξ(λ) ≥ . . . ,
2.1. POISSON- DIRICHLET DISTRIBUTION (J.F.C KINGMAN)
and
∞
X
19
δξ(k) /ξ(λ) = 1.
k=0
We will write
(δξ(1) /ξ(λ), δξ(2) /ξ(λ), . . .) ∼ PD(0, λ)
where PD(0, λ) is the Poisson-Dirichlet distribution define as follows:
Definition 2.1.4. Let 0 < λ < ∞. Let (ξ(t), t ∈ [0, λ]) be a gamma subordinator and
let J1 ≥ J2 ≥ . . . ≥ 0 be the ordered sequence of its jumps. The distribution on ∧∞
J1
of the random variable ( ξ(λ)
,
J2
,
ξ(λ)
. . .) is called the Poisson-Dirichlet distribution with
parameter λ and is denoted by PD(0, λ).
Theorem 2.1.1 shows that if
(p1 , . . . , pN ) ∼ D(αN , . . . , αN )
then the distribution of (p(1) , . . . , p(N ) ) approximates PD(0, λ), if N is fairly large, the
αN being uniformly small and N αN closed to λ.
20
CHAPTER 2. POISSON-DIRICHLET DISTRIBUTION
Chapter 3
Dirichlet processes
In a celebrated paper [19], Thomas S. Ferguson introduced a random distribution, called
a Dirichlet process, such that its marginal w.r.t. any finite partition has a Dirichlet Distribution as defined in Chapter 1.
A Dirichlet process is a random discrete distribution which is a very useful tool in nonparametric Bayesian statistics.
The remarkable properties of a Dirichlet process are described below.
3.1
The Dirichlet process
We want to generalize the Dirichlet distribution.
Let H be a set and let A be a σ−field on H. We define below a random probability, on
(H, A) by defining the joint distribution of the random variables (P (A1 ), . . . , P (Am ))
for every m and every finite sequence of measurable sets (Ai ∈ A for all i). We then
verify the Kolmogorov consistency conditions to show there exists a probability, P, on
([0, 1]A, BFA) yielding these distributions. Here [0, 1]A represents the space of all functions from A into [0, 1], and BFA represents the σ-field generated by the field of cylinder
21
CHAPTER 3. DIRICHLET PROCESSES
22
sets .
It is more convenient to define the random probability P , by defining the joint distribution of (P (B1 ), . . . , P (Bm )) for all k and all finite measurable partitions (B1 , . . . , Bm )
of H.
If Bi ∈ A for all i,
Bi ∩ Bj = ∅ for i 6= j, and ∪kj=1 Bj = H. From these distributions,
the joint distribution of (P (A1 ), . . . , P (Am )) for arbitrary measurable sets A1 , . . . , Am
may be defined as follows.
Given arbitrary measurable sets A1 , . . . , Am , we define Bx1 , ..., xm where xj = 0 or 1, as
x
j
Bx1 , ..., xm = ∩m
j=1 Aj
where A1j = Aj , and A0j = Acj . Thus {Bx1 , ..., xm } form a partition of H . If we are given
the joint distribution of
{P (Bx1 , ..., xm ); xj = 0,
or 1 j = 1, . . . , m}
(3.1)
then we may define the joint distribution of (P (A1 ), . . . , P (Am )) as
X
P (Ai ) =
P (Bx1 , ..., xi =1, ..., xm ).
(3.2)
{(x1 , ..., xm ); xi =1}
We note that if A1 , . . . , Am is a measurable partition to start with, then this does not
lead to contradictory definitions provided P (∅) is degenerate at 0.
If we are given a system of distribution of (P (B1 ), . . . , P (Bk )) for all k and all measurable partitions B1 , . . . , Bk , is one consistency criterion that is needed; namely,
CONDITION
C:
If (B1′ , . . . , Bk′ ), and (B1 , . . . , Bk ) are measurable partitions, and if (B1′ , . . . , Bk′ ) is a
refinement of (B1 , . . . , Bk ) with
B1 = ∪r11 Bi′ ,
′
B2 = ∪rr21 +1 Bi′ , . . . , Bk = ∪krk−1 +1 Bi′ ,
3.1. THE DIRICHLET PROCESS
23
then the distribution of
′
r1
r2
k
X
X
X
′
′
(
P (Bi ),
P (Bi ), . . . ,
P (Bi′ )),
1
r1 +1
rk−1
as determined from the joint distribution of (P (B1′ ), . . . , P (Bk′ ′ )), is identical to the
distribution of (P (B1 ), . . . , P (Bm ))
Lemma 3.1.1. If a system of joint distributions of (P (B1 ), . . . , P (Bm )) for all k and
measurable partition (B1 , . . . , Bk ) satisfies condition C, and if for arbitrary measurable sets A1 , . . . , Am , the distribution of (P (A1 ), . . . , P (Am )) is defined as in (3.2),
then there exists a probability P on, ([0, 1]A, BFA) yielding these distribution.
Proof
See [21] page 214.
Definition 3.1.1. Let α be a non-null finite measure on (H, A) .
We say P is a Dirichlet process on (H, A) with parameter α if for every k = 1, 2, . . . ,
and measurable partition (B1 , . . . , Bk ) of H, the distribution of (P (B1 ), . . . , P (Bk ))
is Dirichlet D(α(B1 ), . . . , α(Bk )).
Proposition 3.1.1. Let P be a Dirichlet process on (H, A) with parameter α and let
A ∈ A. If α(A) = 0, then P (A) = 0 with probability one. If α(A) > 0, then P (A) > 0
with probability one. Furthermore, E(P (A)) =
α(A)
.
α(H)
Proof
By considering the partition (A, Ac ), it is seen that P (A) has a beta distribution, β(α(A), α(Ac )).
Therefore
E(P (A)) =
α(A)
.
α(H)
Proposition 3.1.2. Let P be a Dirichlet process on (H, A) with parameter α and let Q
be a fixed probability measure on (H, A) with Q ≪ α. Then, for any positive integers
CHAPTER 3. DIRICHLET PROCESSES
24
m and measurable sets A1 , . . . , Am and ε > 0
P{| P (Ai ) − Q(Ai ) |< ε, for i = 1, . . . , m} > 0.
Proof
X
P (Ai ) =
P (Bx1 , ..., xi =1, ..., xm ),
{(x1 , ..., xm ); xi =1}
and note that
P{| P (Ai ) − Q(Ai ) |< ε, for i = 1, . . . , m} ≥
P{
X
| P (Bx1 , ..., xm ) − Q(Bx1 , ..., xm ) |< ε for i = 1, . . . , m}.
{(x1 , ..., xm ); xi =1}
Therefore, it is sufficient to show that
P{| P (Bx1 , ..., xm ) − Q(Bx1 , ..., xm ) |< 2m ε, for all, (x1 , . . . , xm )} > 0.
If α(Bx1 ,..., xm ) = 0, then Q(Bx1 ,..., xm ) = 0 with probability one, so that
| P (Bx1 ,..., xm ) − Q(Bx1 ,..., xm ) |= 0 with probability one. For those (x1 , . . . , xm ) for
which α(Bx1 , ..., xm ) > 0, the distribution of the corresponding P (Bx1 , ..., xm ) gives positive weight to all open sets in the set
X
P (Bx1 , ..., xm ) = 1.
(α(Bx1 , ..., xm )>0)∈(x1 , ..., xm )
This proposition states that the support of Dirichlet process on (H, A) with parameter α contains the set of all probability measures which are absolutely continuous with
respect to α.
Definition 3.1.2. Let P be a random probability measure on (H, A) .We say that X1 , . . . , Xn
is a sample of size n from P if for any m = 1, 2, . . . and measurable A1 , . . . , Am , C1 , . . . , Cn ,
3.1. THE DIRICHLET PROCESS
25
P{X1 ∈ C1 , . . . , Xn ∈ Cn | P (A1 ), . . . , P (Am ), P (C1 ), . . . , P (Cn )} =
n
Y
P (Cj ), a.s
j=1
(3.3)
Roughly, X1 , . . . , Xn is a sample of size n from P , if, given P (C1 ), . . . , P (Cn ), the
events {X1 ∈ C1 }, . . . , {Xn ∈ Cn } are independent of the rest of the process, and are
independent among themselves, with P{Xj ∈ Cj | P (C1 ), . . . , P (Cn )} = P (Cj ) a.s.
for j = 1, . . . , n.
This definition determines the joint distribution of
X1 , . . . , Xn , P (A1 ), . . . , P (Am ),
once the distribution of the process is given, since
P{X1 ∈ C1 , . . . , Xn ∈ Cn , P (A1 ) ≤ y1 , . . . , P (Am ) ≤ ym }
(3.4)
may be found by integrating (3.3) with respect to the joint distribution of
P (A1 ), . . . , P (Am ), P (C1 ), . . . , P (Cn ) over the set [0, y1 ] × . . . × [0, ym ] × [0, 1] ×
. . . × [0, 1]. The Kolmogorov consistency conditions may be checked to show that (3.4)
determines a probability P over (Hn × [0, 1]A, An × FA).
Proposition 3.1.3. Let P be a Dirichlet process on (H, A) with parameter α let X be
sample of size 1 from P . Then for A ∈ A,
P(X ∈ A) =
α(A)
.
α(H)
Proof
Since P{X ∈ A | P (A))} = P (A),
a.s.,
P(X ∈ A) = E(P(X ∈ A | P (A))) = E(P (A)) =
α(A)
.
α(H)
CHAPTER 3. DIRICHLET PROCESSES
26
Proposition 3.1.4. Let P be a Dirichlet process on (H, A) with parameter α, and let
X be a sample of size 1 from P . Let (B1 , . . . , Bk ) be a measurable partition of H, let
A ∈ A. Then,
k
X
α(Bj ∩ A)
(j)
(j)
P{X ∈ A, P (B1 ) ≤ y1 , . . . , P (Bk ) ≤ yk } =
D(Y1 , . . . , Yk | α1 , . . . , αk )
α(H)
j=1
(3.5)
where
(j)
αi =


α(Bi ) if i 6= j
 α(Bj ) + 1 if i = j
.
Proof
Define Bj, 1 = Bj ∩ A, and Bj, 0 = Bj ∩ Ac for j = 1 , . . . , k.
Let Yj, x = P (Bj, x ) for j = 1, . . . , k and x = 0 or 1. Then, from (3.3)
P{X ∈ A | Yj, x , j = 1, . . . , k and x = 0 or 1} = P (A) =
k
X
j=1
Hence for arbitrary Yj, x ∈ [0, 1], for j = 1, . . . , k and x = 0 or 1
P{X ∈ A | Yj, x , j = 1, . . . , k, and x = 0, or1}
can be found by integrating
Yj, 1
a.s.
3.1. THE DIRICHLET PROCESS
27
P{X ∈ A, P (Bi ) ≤ yi , i = 1, . . . , k} = E(11{X∈A, P (B1 )≤y1 , ..., P (Bk )≤yk )} )
= E E(11{X∈A, P (Bi )≤yi ,i=1,..., k} | (P (Bi ))1≤i≤k )
= E 11{P (B1 )≤y1 , ..., P (Bk )≤yk } ×
E(11{X∈A} | P (B1 ), . . . , P (Bk ))
= E(11{P (B1 )≤y1 , ..., P (Bk )≤yk )}
=
=
=
Pk
j=1
Pk
j=1
Pk
j=1
Pk
j=1
Yj, 1 )
E(11{P (Bi )≤yi , i=1,..., k} P (Bj ∩ A))
R y1
0
...
R yk
0
α(Bj, 1 )
D(y
α(H)
Yj, 1 dD(Y | α(j) )
| α(j) )
(j)
(j)
(j)
(j)
where y = (y1, 0 , . . . , y0, k , y1, 1 , . . . , yk, 1 )) and α(j) = (α1, 0 , . . . , αk,0 , α1, 1 , . . . , αk, 1 ),
and where
(j)
αi, x =
Monotone Class Theorem


α(Bi, x ) if i 6= j
 α(Bj, x ) + 1 if i = j
A monotone vector space W on a space Ω is defined to be a collection of bounded, realvalued functions on Ω satisfying
• W is a vector space over R;
CHAPTER 3. DIRICHLET PROCESSES
28
• constant functions are in W;
• if (fn )n ⊂ W and 0 ≤ f1 ≤ . . . fn ≤ . . . , fn ↑ f and f is bounded, then f is in
W
A collection of M of real functions defined on W is said to be multiplicative if f g ∈ M
for all f, g ∈ M.
The following theorem is quoted from page 7 of P. Protter.
Theorem 3.1.1. Let M be a multiplicative class of bounded real-valued functions defined on Ω and let A = σ(M) be a σ-field generated by M. If W is a monotone vector
space containing M, then W contains all bounded A-measurable functions.
The following very useful theorem states that posterior of a Dirichlet process is still
a Dirichlet process.
Theorem 3.1.2. ([19]) Let P be a Dirichlet process on (H, A) with parameter α, and
let X1 , . . . , Xn be a sample of size n from P . Then the conditional distribution of P
P
given X1 , . . . , Xn , is as Dirichlet process with parameter α + n1 δXi .
Proof
It is sufficient to prove the theorem for n = 1, since the theorem would then follow by
induction upon repeated application of the case n = 1. Let (B1 , . . . , Bk ) be a measurable partition of H. To show that D(α + δX ) is the conditional distribution of P | X,
we need to show
• For fixed ω, D(α + δX(ω) ) is a probability measure.
• For any fixed measurable C ⊂ P(H), D(α + δX ) is a version of P r(P ∈ C | X),
i.e., for all B ∈ A
E(D(α + δX )(C)11X∈B ) = P r(P ∈ C, X ∈ B)
3.1. THE DIRICHLET PROCESS
29
This is equivalent to, for all C ∈ P(H) and B ∈ A,
R
D(α + δX )(C)α(dx) =
B
=
α(.)
.
α(H)
where α =
And also to
Z Z
R
R
P r(X ∈ B | P (B), P ∈ C)D(α)(dP )
P ∈C
P (B)D(α)(dP )D(α)(dP )
11(P ∈C) D(α + δX )(dP )α(dx) =
B
Z
P (B)11(P ∈C) D(α)(dP )
(3.6)
Let us prove that, for any f : P(H) −→ R, bounded and B(P(H))-measurable,
Z
Z Z
(3.7)
f (P )D(α + δX )(dP )α(dx) = f (P )P (B)D(α)(dP )
B
Let
S = {f : P(H) −→ R,
bounded B(P(H))-measurable and satisfying (3.7)}
and let
M = {P (B1 )r1 P (B2 )r2 . . . P (Bk )rk : k ∈ N, Bi ∈ A, ri ∈ N, i = 1, . . . , k}.
Since S is a monotone vector space containing M and M is a multiplicative class
of functions and σ(M) = B(P(H)), by the monotone class theorem, it suffices to
show (3.6) for f ∈ M.
Note that further that f ∈ M is a linear combination of functions in
M∗ = {P (B1 )r1 , . . . , P (Bk )rk : k ∈ N, (B1 , . . . , Bk ) a measurable partition of H }
Finally, it suffices to show (3.6) for f ∈ M∗ . thus, we need to show, for a measurable partition (B1 , B2 , . . . , Bk ) of H and ri , i = 1, . . . , k,
R R Qk
R Qk
ri
ri
i=1 P (Bi ) D(α + δX )(dP )α(dx) =
i=1 P (Bi ) P (B)D(α)(dP )
B
First, when some of α(Bi ) = 0, both sides are 0 and the equality holds. For the
CHAPTER 3. DIRICHLET PROCESSES
30
rest of the proof, assume α(Bi ) > 0 for i = 1, . . . , k.
Let α′ = (α(B1 ), α(B2 ), . . . , α(Bk )) = (α1 , α2 , . . . , αk ).
R R Qk
B
i=1
P (Bi )ri D(α + δX )(dP )α(dx)
=
=
=
=
Pk
j=1
Pk
j=1
Pk
j=1
Pk
j=1
Pk
j=1
B
T
α(B
α(B
Bj
Bj
T
T
R Qk
i=1
R Qk
i=1
Bj )
P (Bi )ri D(α + δX )(dP )α(dx)
yiri D(α′ + δj )(dy)α(dx)
R Qk
i=1
α(B
T
Bj ) α(H)
αj
T
α(B Bj )
j=1
α(Bj )
Pk
On the other hand
yiri D(α′ + δj )(dy)
R
r
Bj ) y1r1 . . . yj j . . . ykrk
R
R
α +1−1
. . . y1 j
r +1
y1r1 . . . yj j
Γ(α(H))
y α1 −1
Γ(α1 )...Γ(αj +1)...Γ(αk ) 1
×
=
R
B
T
Γ(α(H)+1)
y α1 −1
Γ(α1 )...Γ(αj +1)...Γ(αk ) 1
×
=
R
α −1
. . . y1 j
r +1
y1r1 . . . yj j
. . . y1αk −1 dy
. . . ykrk
. . . y1αk −1 dy
. . . ykrk D(α′ )(dy).
3.2. AN ALTERNATIVE DEFINITION OF DIRICHLET PROCESS
=
=
=
=
=
3.2
R Qk
i=1
Pk
j=1
Pk
j=1
P (Bi )ri P (B)D(α)(dP )
R Qk
i=1
R
P (Bi )ri P (B
T
α(B Bj )
j=1
α(Bj )
Pk
T
Bj )D(α)(dP )
P (B1 )r1 . . . P (Bj )rj +1 . . . P (Bk )rk
T
α(B Bj )
j=1
α(Bj )
Pk
31
R
R
T
P (B Bj )
D(α)(dP )
P (Bj )
P (B1 )r1 . . . P (Bj )rj +1 . . . P (Bk )rk D(α)(dP )
r +1
y1r1 . . . yj j
. . . ykrk D(α′ )(dy).
An alternative definition of Dirichlet process
In this section, we will see how the Dirichlet process can be derived from the
Gamma process on (H, A) with parameter α.
The basic idea is that the Dirichlet distribution is defined as the joint distribution of
independent Gamma variables divided by their sum. Hence the Dirichlet process
should be defined from a Gamma process with independent "increments" divided
by their sum. Using a representation of a process with independent increments
as a sum of countable number of jumps of random height at countable number
of random points, we may divided by the total height of the jumps and obtain a
discrete probability measure, which should be distributed as a Dirichlet process.
More precisely,
Let Γ(α, 1), α > 0, denote the Gamma distribution with characteristic function
−α
ϕ(u) = (1 − iu)
= exp
Z
0
∞
(eiux − 1)dN (x),
(3.8)
CHAPTER 3. DIRICHLET PROCESSES
32
where
N (x) = −α
Z
∞
e−y y −1 dy, for, 0 < x < ∞.
(3.9)
x
Let define the distribution of random variables J1 , J2 , . . . as follows.
P(J1 ≤ x1 ) = eN (x1 ) , for, x1 > 0,
(3.10)
and for j = 2, 3, . . .
P(Jj ≤ xj | Jj−1 = xj−1 , . . . , J1 = x1 ) = exp[N (xj )−N (xj−1 )], for 0 < xj < xj−1 .
(3.11)
In other words, the distribution function of J1 is exp N (x1 ) and for j = 2, 3, . . .,
the distribution of Jj−1 given Jj−1 , . . . , J1 , is the same as the distribution of J1
truncated above at Jj−1 .
Theorem 3.2.1. Let G(t) be a distribution function on [0, 1]. Let
Zt =
∞
X
Jj 11[0, G(t)) (Uj )
(3.12)
j=1
where
i) The distribution of J1 , J2 , . . . is given in (3.10) and (3.11), and
ii) U1 , U2 , . . . are independent identically distributed variables, uniformly distributed on [0, 1], and independent of J1 , J2 , . . . . Then, with probability one, ξt
converges for all t ∈ [0, 1] and is a gamma process with independent increments,
with ξt ∼ Γ(α(G(t)), 1).
In particular, ξ1 =
If we define
P∞
1
Jj converges with probability one and ξ1 ∼ Γ(α, 1)
Pj = Jj /ξ1 ,
then Pj ≥ 0 and
P∞
1
Pj = 1 with probability one.
We now define the Dirichlet process.
(3.13)
3.2. AN ALTERNATIVE DEFINITION OF DIRICHLET PROCESS
33
As before, let (H, A) be a measurable space, and let α(.) be a finite non-null
measure on A. Let V1 , V2 , . . . be a sequence of independent identically random variables with values in H, and with probability measure Q, where Q(A) =
α(A)/α(H).
We identify the α in formulas (3.8) and (3.9) with α(H) define the random probability measure, P , on (H, A), as
P (A) =
∞
X
Pj δVj (A).
(3.14)
j=1
Theorem 3.2.2. The random probability measure defined by (3.10) is a Dirichlet
process on (H, A) with parameter α.
Proof
Let (B1 , . . . , BK ) be a partition of H. Then
∞
1 X
(P (B1 ), . . . , P (BK )) =
Jj (δVj (B1 ), . . . , δVj (Bk ))
ξ1 j=1
From the assumption on the distribution of V1 , V2 , . . . ,
Mj = (δVj (B1 ), . . . , δVj (Bk )),
are i.i.d random vectors having a multinomial distribution with probability vector
P
(Q(B1 ), . . . , Q(BK )). Hence the distribution of ∞
1 Jj Mj must be the same as
the distribution of
(ξ1/k , ξ2/k − ξ1/k , . . . , ξ1 − ξ(k−1)/k ) =
∞
X
i=1
and
Ji (11[0,G(1/k)) (Uj ), . . . , 11[G(i−1/k), G(i/k)) (Uj ), . . . , 11[G(k−1/k), G(1)) (Uj ))
CHAPTER 3. DIRICHLET PROCESSES
34
where ξt is the gamma process defined by the above theorem with G(t) chosen so
that
G(j/k) − G(j − 1/k) = Q(Bj ), j = 1, . . . , k.
Hence,
P∞
j=1
Jj δVj (Bi ) are, for i = 1, . . . , k, independent random variables, with
∞
X
Jj δVj (Bi ) ∼ Γ(α(Bi ), 1).
j=1
(because α(G(j/k) − G(j − 1/k)) = α(Q(Bj )) = α(Bj )).
Since ξ1 is the sum of these independent gamma variables,
(P (B1 ), . . . , P (BK )) ∼ D(α(B1 ), . . . , α(Bk ))
form the definition of the Dirichlet distribution. Thus, P satisfies the definition of
the Dirichlet process.
Theorem 3.2.3. Let P be the Dirichlet process defined by (3.10) ,and let Z be a
R
measurable real valued function defined on (H, A). If | Z | dα < ∞, then
R
| Z | dP < ∞ with probability one, and
Z
Z
Z
1
Zdα.
E( ZdP ) = ZdE(P ) =
α(H)
Proof
From P (A) =
P∞
j=1
Pj δVj (A),
Z
∞
X
| Z | dP =
| Z(Vj ) | Pj ,
(3.15)
j=1
so that the monotone convergence theorem gives
R
P∞
E( | Z | dP ) =
j=1 E(| Z(Vj ) |)E(Pj )
=
1
α(H)
=
1
α(H)
R
R
| Z | dα
P∞
| Z | dα,
j=1
E(Pj )
3.2. AN ALTERNATIVE DEFINITION OF DIRICHLET PROCESS
35
where we have used the independence of the Vj and Pj . Therefore,
Z
ZdP =
∞
X
Z(Vj )Pj
j=1
is absolutely convergent with probability one. Since this series is bounded by
(3.11), which is integrable, the bounded convergence theorem implies
R
P∞
E( ZdP ) =
j=1 E(Z(Vj ))E(Pj )
1
α(H)
=
R
Zdα.
Theorem 3.2.4. Let P be the Dirichlet process defined by (3.10), and let Z1 and
R
Z2 be measurable real valued functions defined on (H, A). If | Z1 | dα < ∞,
R
R
| Z2 | dα < ∞ and | Z1 Z2 | dα < ∞, then
Z
Z
σ12
+ µ1 µ2
E( ξ1 dP ξ2 dP ) =
α(H) + 1
where
1
µi =
α(H)
Z
ξi dα, i = 1, 2
σ12
1
=
α(H)
Z
ξ1 ξ2 dα − µ1 µ2 .
Z
ξ2 dP =
and
Proof
As in theorem (3.2.3)
Z
Z1 dP
=
∞
X
j=1
∞ X
∞
X
i=1 j=1
Z1 (Vj )Pj
∞
X
i=1
Z1 (Vj )Z2 (Vi )Pj Pi ,
Z2 (Vi )Pi
(3.16)
CHAPTER 3. DIRICHLET PROCESSES
36
since both series are absolutely convergent with probability one. This is bounded
in absolute value by
∞ X
∞
X
| Zj (Vj )Zi (Vi ) | Pj Pi .
(3.17)
i=1 j=1
If this is an integrable random variable, we may take an expectation of (3.12)
inside the summation sign and obtain
R
R
P∞ P∞
E( Z1 dP Z2 dP ) =
j=1 E(Zj (Vj )Zi (Vi ))E(Pj Pi )
i=1
=
+
PP
i6=j
P
i
E(Z1 (Vj ))E(Z2 (Vi ))E(Pj Pi )
E(Z1 (Vi )Z2 (Vi ))E(Pi2 ).
Using the independence of the Pi and the Vi , and the independence of the Vi among
themselves. The equation continues
= µ1 µ2
XX
E(Pj Pi ) + (σ12 + µ1 µ2 )
X
E(Pi2 ).
i
i6=j
An analogous equation shows that (3.12) is integrable. The proof will be complete
when we show
∞
X
E(
Pi2 ) =
j=1
1
.
α(H) + 1
This seems difficult to shows directly from the definition of the Pi , so we proceed
as follows. The distribution of the P depends on α only through the value of
α(H). So choose H to be the real line, α to give mass α(H)/2 to −1 and mass
α(H)/2 to +1, and Z1 (x) = Z2 (x) to be identically x. Then µ1 = µ2 = 0 and
3.3. SETHURAMAN’S REPRESENTATION
37
σ12 = 1. Hence
R
P
2
2
E( ∞
j=1 Pi ) = E(( xdP (x)) )
= E(−1.P {−1} + P {1})= E(2P ({1}) − 1)2
=
1
α(H)+1
since P ({1}) ∼ β(α(H)/2, α(H)/2). 3.3
Sethuraman’s Representation
Let α be a finite measure on H and let
V1 , V2 , . . . ∼ β(1, α(H))
α(.)
α(H)
and they are independent of each other. Define
Y1 , Y2 , . . . ∼
p1 = V1
..
.
pn = (1 − V1 )... (1 − Vk−1 )Vk , k = 2, · · · , n
..
.
Then
P =
∞
X
pi δVi ∼ D(α)
i=1
where the convergence holds a.e.
Note that, since for any A ∈ A, P (A) =
measurable.
P∞
i=1
pi 11Yi ∈A is measurable, P is
In practice, the above representation is applied as follows:
CHAPTER 3. DIRICHLET PROCESSES
38
Corollary 3.3.1. (Stick − breaking
construction) Let α be any finite mea-
sure on H. Let c = α(H) and H = α/c. For any integer N , let V1 , · · · , VN −1
be iid Beta(1, c) and VN = 1. Let p1 = V1 , pk = (1 − V1 )...(1 − Vk−1 )Vk , k =
PN
2, · · · , N . Let Vk be iid ∼ H. Then, PN =
i=1 pi δZi converges a.e. to a
Dirichlet process D(α).
Proof
It suffices to show that, for given measurable partition (B1 , . . . , Bk ) of H,
P (B1 ), . . . , P (Bk ) ∼ D(α(B1 ), . . . , α(Bk )).
Let Ui = (11(Yi ∈B1 ) , . . . , 11(Yi ∈Bk ) ), i = 1, 2, . . .. We need to show that
∞
X
pi Ui ∼ D(α(B1 ), . . . , α(Bk )).
i=1
Let P ∼ D and independent of Vi ’s and Yi ’s. By a property of Dirichlet distribution,
Vn Un + (1 − Vn )P ∼ D(α(B1 ), . . . , α(Bk )).
Again
Vn−1 Un−1 + (1 − Vn−1 )(Vn Un + (1 − Vn )P ) ∼ D(α(B1 ), . . . , α(Bk ))
and it is
Vn−1 Un−1 + (1 − Vn−1 )Vn Un + (1 − Vn−1 )(1 − Vn )P ∼ D(α(B1 ), . . . , α(Bk )).
Applying the same property, we get
Vn−2 Un−2 + (1 − Vn−2 )[Vn−1 Un−1 + (1 − Vn−1 )Vn Un
+ (1 − Vn−1 )(1 − Vn )P ] ∼ D(α(B1 ), . . . , α(Bk )),
3.4. SOME APPLICATIONS
39
which is
Vn−2 Un−2 + (1 − Vn−2 )Vn−1 Un−1 + (1 − Vn−2 )(1 − Vn−1 )Vn Un
+ (1 − Vn−2 )(1 − Vn−1 )(1 − Vn )P ∼ D(α(B1 ), . . . , α(Bk )).
Doing this operation until Vn and noting 1 −
Pn =
n
X
pi Ui + (1 −
i=1
Since (1 −
Pn
i=1
n
X
Pn
1=1
Pn =
Qn
i=1 (1
− Vi ), we get
pi )P ∼ D(α(B1 ), . . . , α(Bk )).
i=1
D
D
pi ) −→ 0, and Pn −→ D,
∞
X
pi Ui ∼ D(α(B1 ), . . . , α(Bk )).
i=1
3.4
Some applications
In this paragraph, α is taken to denote a σ-additive non-null finite measure on
(H, A). We write P ∈ D(α) as a notation for the phrase "P ∈ D(α) is a Dirichlet process on (H, A) with parameter α". In most of the applications we take
(H, A) = (R, β). The nonparameter statistical decision problems we consider
are typically described as follows. The parameter space is the set of all probability
measures P on (H, A). The statician is to choose an action a in some space, there
by incurring a loss, L(P, a). There is a sample X1 , . . . , Xn from P available to
the statician, upon which he may base his choice of action. He seeks a Bayes rule
with respect to the prior distribution, P ∈ D(β).
With such a prior distribution, the posterior distribution of P given the observaP
tions is D(α + ∞
1 δXi ). Thus if we can find a Bayes rule for the no-sample probP
lem (with n = 0), in problem may be found by replacing α with α+ n1 δXi (Theorem
CHAPTER 3. DIRICHLET PROCESSES
40
3.1.2). In problems considered below, we first find the Bayes rule for the nosample problem, and then state the Bayes rule for the general problem.
Mixtures of Dirichlet process as introduced by Antoniak are very intensively used
in various applied fields.
3.5
Estimation of a distribution function
Let (H, A) = (R, β), and let the space of actions of the statician be the space of
all distribution functions on R. Let the loss function be
Z
L(P, Fb) = (F (t) − Fb(t))2 dW (t)
where W is a given finite measure on (R, β), and where
F (t) = P ((−∞, t]).
If P ∼ D(β), then F (t) ∼ β(α(−∞, t]), α((t, ∞))) for each t. The Bayes risk
for the no-sample problem,
E(L(P, Fb)) =
Z
(E(F (t) − Fb(t))2 )dW (t),
is minimized by choosing Fb(t) for each t to minimize E((F (t) − Fb(t))2 ). This is
achieved by choosing Fb(t) to be E(F (t)). Thus, the Bayes rule for the no-sample
problem is
where
Fb(t) = E(F (t)) = F0 (t)
F0 (t) = α((−∞, t])/α(R)
( we used F (t) ∼ β(α(−∞, t]), α((t, ∞)))
represents our prior guess at the shape of the unknown F (t).
3.6. ESTIMATION OF THE MEAN
41
For a sample of size n, the Bayes rule is therefore
Fbn (t | X1 , . . . , Xn ) =
P
α((−∞, t])+ n
1 δXi ((−∞, t])
α(R)+n
(3.18)
= pn F0 (t) + (1 − pn )Fn (t | X1 , . . . , Xn ).
where
pn =
and
α(R)
(α(R) + n)
n
1X
δXi ((−∞, t]))
Fn (t | X1 , . . . , Xn ) =
n 1
is the empirical distribution function of the sample.
The Bayes rule (3.18) is a mixture of our prior guess at F and of the empirical
distribution function, with respective pn and (1 − pn ). If α(R) is large compared
to n, little weight is given to the prior guess at F . One might interpret α(R)
as a measure of faith in the prior guess at F measured in units of numbers of
observations. As α(R) tends to zero, the Bayes estimate
Fbn (t | X1 , . . . , Xn )
converges to it uniformly almost surely. This follows from the Glivenko-Cantelli
theorem and the observation that pn → 0 as n → ∞.
The results for estimating a k-dimensional distribution function are completely
analogous.
3.6
Estimation of the mean
Again let (H, A) = (R, β), and suppose the statistic is to estimate the mean with
squared error loss
CHAPTER 3. DIRICHLET PROCESSES
42
where
L(P, µ
b) = (µ − µ
b)2 ,
µ=
Z
xdP (x).
We assume P ∼ D(α), where α has finite first moment. The mean of the corresponding probability measure α(.)/α(H) is denoted by µ0 :
µ0 =
Z
xdα(x)/α(R).
(3.19)
By Theorem (3.2.4), the random variable µ exists. The Bayes rule for the nosample problem is the mean of µ, which, again by Theorem (3.2.4), is µ
b = µ0 .
For a sample of size n, the Bayes rule is therefore
−1
µ
b(X1 , . . . , Xn ) = (α(R) + n)
Z
xd(α(x) +
n
X
δXi (x) )
(3.20)
1
= pn µ0 + (1 − pn )X̄n ,
where X̄n is the sample mean,
n
1X
Xi .
X̄n =
n i=1
The Bayes estimate is thus between the prior guess at µ, namely µ0 , and the sample
mean. As α(R) → 0, µ
bn , converges to X̄n . Also, as n → ∞, pn → 0 so that,
in particular, the Bayes estimate (3.20) is strongly consistent within the class of
distribution with finite first moment. More generally, for arbitrary (H, A), if Z is
real-valued measurable defined on (H, A), and if we are to estimate
Z
θ = ZdP,
3.6. ESTIMATION OF THE MEAN
43
with squared error loss and prior P ∼ D(α), where α is such that
Z
θ0 = Zdα/α(H) < ∞,
then the estimate θb = θ0 is Bayes for the no-sample problem. For a sample of size
n,
θbn (X1 , . . . , Xn ) = pn θ0 + (1 − pn )1/n
n
X
Z(Xi )
1
is Bayes, where pn = α(H)/(α(H) + n). Results for estimating a mean vector in
k-dimensions are completely analogous.
44
CHAPTER 3. DIRICHLET PROCESSES
Chapter 4
Mixtures of continuous-time
Dirichlet processes
In this chapter, we first define, in section 1, continuous-time Dirichlet processes.
In section 2 we examine the case of the Brownian-Dirichlet process (BDP) whose
parameter is proportional to a standard Wiener measure.
Next we show that some stochastic calculus formulas (Ito’s formula, local time
occupation formula) hold for BDP’s.
Next, in section 3, we define mixtures of continuous-time Dirichlet processes and
we extend some, rather nontrivial computations of Antoniak (1974) [1].
4.1
Continuous-time Dirichlet processes
From now, we take for H any standard Polish space of real functions defined on an
interval I ⊂ [0, ∞), for example the space C(I) (resp. D(I)) of continuous (resp.
cadlag) functions. For any t ∈ I, let πt : x −→ x(t) denote the usual projection
45
46 CHAPTER 4. MIXTURES OF CONTINUOUS-TIME DIRICHLET PROCESSES
at time t from the space H to R. Recall that πt maps any measure µ on H into a
measure πt µ on R defined by πt µ(A) = µ(πt−1 (A)) for any Borel subset A of R.
The following proposition defines a continuous-time process (Xt ) such that each
Xt is a Ferguson-Dirichlet random distribution.
Proposition 4.1.1. (Emilion, 2005) Let α be any finite measure on H, let X be
a Ferguson-Dirichlet random distribution D(α) on H and let Xt = πt X. Then
the time continuous process (Xt )t∈I is such that for each t ∈ I, Xt is a FergusonDirichlet random distribution on R D(αt ) where αt = πt α. Moreover if V (i) is
α
α(H)
any iid sequence on H such that V (i) ∼
X(ω) =
∞
X
and
pi (ω)δV (i) (ω)
i=1
where the sequence (pi ) is independent of the V (i) ’s and has a Poisson-Dirichlet
distribution PD(α(H)), then
Xt (ω) =
∞
X
pi (ω)δV (i) (ω)(t) .
i=1
For sake of simplicity we deal with just one parameter α, but it can be noticed that
two-parameter Xt,α,β continuous-time Dirichlet process can be defined similarly
by using two-parameter Poisson-Dirichlet distributions introduced in Pitman Yor
(1997) [44].
Proof
Let k ∈ {1, 2, 3, ...} and A1 , ..., Ak a measurable partition of R.
Then for any t ∈ R, πt−1 (A1 ), ..., πt−1 (Ak ) is a measurable partition of H so that,
by definition of X, the joint distribution of the random vector
(X(πt−1 (A1 )), ..., X(πt−1 (Ak )))
4.1. CONTINUOUS-TIME DIRICHLET PROCESSES
47
is Dirichlet with parameters (α(πt−1 (A1 )), ..., α(πt−1 (Ak )). In other words
(Xt (A1 )), ..., Xt (Ak )) is Dirichlet with parameters (αt (A1 ), ..., αt (Ak )) and Xt ∼
D(αt ).
A consequence of the definition of πt is that
∞
∞
X
X
πt (
µi ) =
πt µi
i=1
i=1
for any sequence of positive measures on H and πt (λµ) = λπt (µ) for any positive
real number λ. Hence if V (i) is any i.i.d. sequence on H such that V (i) ∼
and
X(ω) =
∞
X
α
α(H)
pi (ω)δV (i) (ω)
i=1
where (pi ) has a Poisson-Dirichlet distribution PD(α(H)), then
Xt (ω) = πt (X(ω)) =
∞
X
i=1
pi (ω)πt (δV (i) (ω) ) =
∞
X
pi (ω)δV (i) (ω)(t)
i=1
the last equality being due to the fact that πt (δf ) = δf (t) for any f ∈ H, as easα
)=
ily seen. In addition the V (i) (t)’s are iid with V (i) (t) ∼ πt ( α(H)
1
α.
αt (R) t
1
π (α)
α(H) t
=
Moreover (pi ) has a Poisson-Dirichlet distribution PD(α(H)) = PD(αt (R))
so that the preceding expression of Xt (ω) is exactly the expression of a FergusonDirichlet random distribution D(αt ) as a random mixture of random Dirac masses.
As a corollary of the above proof and of Sethuraman stick-breaking construction
(1994) (see chapter 3 section 3.3), we observe the following result which is of
interest for simulating continuous-time Dirichlet processes. It shows that such
processes of random distributions can be used to generate stochastic paths and to
classify random curves.
Corollary 4.1.1. (Continuous-time stick-breaking construction) Let α be any finite
measure on H and αt = πt α. Let c = α(H) and H = α/c. For any integer N , let
48 CHAPTER 4. MIXTURES OF CONTINUOUS-TIME DIRICHLET PROCESSES
V1 , · · · , VN −1 be iid Beta(1, c) and VN = 1. Let p1 = V1 , pk = (1 − V1 ) . . . (1 −
P
Vk−1 )Vk , k = 2, · · · , N . Let Zk be iid H. Then, PN,t = N
k=1 pk δZk,t converges
a.e. to a continuous-time Dirichlet process D(αt ).
Corollary 4.1.2. Let Xt be as in the preceding proposition, then for any Borel
subset A of R, (Xt (A))t≥0 is a Beta process, ie for any t ≥ 0
Xt (A) ∼ Beta(αt (A), αt (Ac )).
4.2
Brownian-Dirichlet process
We suppose here that the parameter α is proportional to a standard Wiener measure
W so that the V (i) ’s above are i.i.d. standard Brownian motions that we denote
by B i . The sequence (pi ) is assumed to be Poisson-Dirichlet(c) independent of
(B i )i=0,1,...
Definition 4.2.1. Let X be a Dirichlet process such that X ∼ D(cW ), then the
continous-time process (Xt ) defined by Xt = πt X, for any t > 0, is called a
Brownian-Dirichlet process (BDP).
As observed in the previous proposition, Xt is a random probability measure such
that Xt ∼ D(cN(0, t)) and if we have a representation
X(ω) =
∞
X
pi (ω)δB i (ω) ,
∞
X
pi (ω)δBti (ω) .
i=1
then we also have
Xt (ω) =
i=1
We show that stochastic calculus can be extended to such processes (Xt ). Consider the filtration defined by
F0 = σ(pi , i ∈ N∗ ),
4.2. BROWNIAN-DIRICHLET PROCESS
49
and for any s > 0,
Fs = F0 ∪ (∪i σ(Bui , u < s)).
4.2.1
Ito’s formula
Proposition 4.2.1. Let f ∈ C 2 be such that there exist a constant c ∈ R such that
Rs ′ i 2
(f (Bu ) du < c for any i and any s > 0. Then,
0
1. Mt =
2. Vt =
P∞
i=1
1
2
pi (ω)
P+∞
i=1
variation, and
Rt
f ′ (Bui )dBui is a well-defined (Fs ) − martingale,
0
pi (ω)
Rt
0
f ”(Bui )du is a well-defined process with bounded
3. < Xt − X0 , f >= Mt + Vt .
Proof. Let
Mtn (ω)
=
n
X
pi (ω)
t
0
i=1
(k)
Z
(k)
f ′ (Bui )dBui ,
(k)
and let s < t. Let 0 = t1 < t2 < . . . < trk = t be a sequence of subdivisions
of [0, t] such that
Z
0
t
f
′
(Bui )dBui
= lim
k−→+∞
rk
X
l=1
f ′ (Bti(k) )(Bti(k) − Bti(k) ),
l
l+1
l
the limit being taken in L2 -norm. We now show that Mtn is a martingale. Note
that we don’t use below the fact that the sequence pi has a Poisson-Dirichlet distribution. For sake of simplicity, in what follows, we omit the superscript (k) in
50 CHAPTER 4. MIXTURES OF CONTINUOUS-TIME DIRICHLET PROCESSES
(k)
tl . We have
E(Mtn | Fs ) =
R
t ′
i
i
f
(B
)dB
|
F
E
p
s
i
i
u
i=1
0
Pn
P
Pn
′
i
i
i
= limk→∞ { i=1 E pi {l : tl <s} f (Btl )(Btl+1 − Btl ) | Fs
+
P
′
i
i
i
E
p
f
(B
)(B
−
B
)
|
F
i
s }.
tl
tl+1
tl
i=1
{l:tl >s}
Pn
In the case tl < s, if we have in addition tl+1 < s then
E f ′ (Btil )(Btil+1 − Btil ) | Fs = f ′ (Btil )(Btil+1 − Btil )
while if tl+1 > s, writing Btil+1 − Btil = Btil+1 − Bsi + Bsi − Btil , we see that
E f ′ (Btil )(Btil+1 − Btil ) | Fs = f ′ (Btil )(Bsi − Btil ).
On the other hand in the case tl > s we have
E f ′ (Btil )(Btil+1 − Btil ) | Fs = E E(f ′ (Btil )(Btil+1 − Btil ) | Ftl ) | Fs
′
i
i
i
= E f (Btl )E(Btl+1 − Btl | Ftl ) | Fs
= E f ′ (Btil )E(Btil+1 − Btil ) | Fs = 0.
Hence,
E(Mtn | Fs ) =
Pn
i=1 pi limk−→∞
P
′
i
i
i
f
(B
)(B
−
B
)
tl
tl+1
tl
{l:tl+1 <s}
+ f ′ (Btis )(Bsi − Btis )
(k)
(k)
(k)
such that tl < s and tl+1 > s. Therefore
Z s
n
X
n
E(Mt | Fs ) =
f ′ (Bui )dBui = Msn
pi (ω)
where ts denotes the unique tl
i=1
0
4.2. BROWNIAN-DIRICHLET PROCESS
51
proving that Mtn is a martingale. Moreover, since
Rs
Rs
P
(n)
E (Ms )2 = 2 {1≤i<j≤n} E pi pj 0 f ′ (Bui )dBui 0 f ′ (Buj )dBuj
+
=
=
i
h R
s ′
i
i 2
2
i=1 E pi ( 0 f (Bu )dBu )
Pn
Pn
i=1
Rs
E(p2i )E( 0 f ′ (Bui )dBui )2 )
R
P∞
s
′
i 2
2
(f
(B
))
du
≤
c
E(p
)E
u
i
i=1 E(pi ) = c
i=1
0
Pn
the martingale convergence theorem implies that Mtn converges to a martingale
Z t
∞
X
Mt =
f ′ (Bui )dBui .
pi (ω)
0
i=1
Finally, applying Ito’s formula to each B i , we get
P∞
i
i
< Xt (ω) − X0 (ω), f > =
i=1 pi (ω)(f (Bt ) − f (B0 ))
=
+
P∞
i=1
1
2
pi (ω)
P∞
Rt
0
i=1 pi (ω)
f ′ (Bui )dBui
Rt
0
f ′′ (Bui )du
= Mt + Vt
where Vt is obviously a bounded variation process.
Corollary 4.2.1. (Stochastic integral) Let Xt be a BDP given by
∞
X
Xt (ω) =
pi (ω)δBti (ω) .
i=1
Let (Yt ) be a real valued stochastic process and φ a bounded function defined on
R
R. Then the stochastic integral φ(Yt )dXt is defined as the measure such that
Z
∞ Z
∞ Z
X
X
′
i
i 1
< φ(Yt )dXt , f >=
φ(Yt )pi (ω)f (Bt )dBt +
φ(Yt )pi (ω)f ′′ (Bti )dt,
2
i=1
i=1
52 CHAPTER 4. MIXTURES OF CONTINUOUS-TIME DIRICHLET PROCESSES
for any function f verifying the conditions of the preceding proposition.
4.2.2 Local time
The following result exhibits the local time of a Brownian-Dirichlet process as a
density of occupation time.
Proposition 4.2.2. Let (Xt ) be a BDP such that
Xt (ω) =
∞
X
pi (ω)δBti (ω) .
i=1
Then for each (T, x) ∈ R+ × R, there exist a random distribution L(T, x) such
that
Z
L(T, x)f (x)dx =
Z
T
< Xs , f > ds,
0
R
for any f Borel measurable and locally integrable on R.
Proof. Let Li (T, x) be the local time w.r.t. to B (i) so that for any i ∈ N we have
Z
Li (T, x)f (x)dx =
R
and
Z X
n
pi Li (T, x)f (x)dx =
Then, if f ∈
T
f (Bsi )ds
0
Z
T
0
R i=1
L+
∞,
Z
n
X
pi f (Bsi )ds.
i=1
the monotone convergence theorem yields
Z X
∞
R i=1
pi Li (T, x)f (x)dx =
Z
0
T
∞
X
pi f (Bsi )ds
i=1
and the same holds if f ∈ L∞ by using f = f+ − f− . Letting L(T, x) =
P∞
i=1 pi Li (T, x) we get the desired result.
4.3. MIXTURES OF DIRICHLET PROCESSES
53
4.2.3 Diffusions
Definition 4.2.2. A stochastic process (ψt ) is called a diffusion w.r.t. to the BDP
(Xt ) if it has a.s. continuous paths and can be represented as
ψt = ψ0 +
Z
t
a(s)ds +
0
∞
X
pi (ω)
Z
t
0
i=0
bi, s dBsi
where a ∈ L1 (R+ ) and bi ∈ L2 (R+ ) for any integer i.
The following result can be proved using the Banach fixed point theorem, similar
to the classical case of a single Brownian motion.
Proposition 4.2.3. Suppose that f and gi , i = 0, 1, . . . are Lipshcitz functions
from R to R. Let u0 be an F0 -measurable square integrable r.v. Then there exist a
diffusion (ψt ) w.r.t. to the BDP (Xt ) such that
dψt = f (ψt )dt +
ψ0 = u0 .
4.3
P∞
i=0
pi gi (ψt )dBti ,
(4.1)
Mixtures of Dirichlet processes
The following definitions are due to C. Antoniak [1].
4.3.1 Antoniak mixtures
Let (U, B, H) be a probability space called the index space. Let (Θ, A) be a
measurable space of parameters.
Definition 4.3.1. A transition measure on U × A is a mapping α from U × A into
[0, ∞) such that
54 CHAPTER 4. MIXTURES OF CONTINUOUS-TIME DIRICHLET PROCESSES
1. for any u ∈ U, α(u, .) is a finite, nonnegative non-null measure on (Θ, A)
2. for every A ∈ A, α(., A) is measurable on (U, B).
Note that this differs from the definition of a transition probability in that α(u, Θ)
need not be identically one as we want α(u, .) to be a parameter for a Dirichlet
process.
Definition 4.3.2. A random distribution P is a mixture of Dirichlet processes on
(Θ, A) with mixing distribution H and transition measure α, if for all k = 1, 2, ...
and any measurable partition A1 , A2 , . . . , Ak of Θ we have
P{P (A1 ) ≤ y1 , . . . , P (Ak ) ≤ yk } =
Z
D(y1 , . . . , yk |α(u, A1 ), . . . , α(u, Ak ))dH(u),
U
where D(y1 , . . . , yk |α1 , . . . , αk ) denotes the distribution function of Dirichlet distribution with parameters (α1 , . . . , αk ).
In concise symbols we will use the heuristic notation:
P ∼
Z
D(α(u, .))dH(u).
U
Roughly, we may consider the index u as a random variable with distribution
H and given u, P is a Dirichlet process with parameter α(u, .). In fact U can
be defined as the identity mapping random variable and we will use the notation
|u f or ”U = u”. In alternative notation
where αu = α(u, .).

 u∼H
 P |u ∼ D(αu )
(4.2)
4.3. MIXTURES OF DIRICHLET PROCESSES
55
4.3.2 Mixtures of continuous-time Dirichlet processes
We now consider the case where αu is a finite measure on a function space like
C(I) and D(I) (spaces defined in section 1).
The following proposition defines a continuous-time process (Pt )t such that each
Pt is a mixture of Dirichlet processes.
Proposition 4.3.1. Let P be a mixture of Dirichlet distributions
P ∼
Z
D(αu )dH(u).
U
Let Pt = πt P . Then, for each t ≥ 0, Pt is a mixture of Dirichlet processes:
Pt ∼
Z
D(αu, t )dH(u)
U
where αu, t = αu (πt−1 (.)).
Proof
Let A1 , A2 , . . . , Ak be a partition of R.
P[Pt (A1 ) ≤ y1 , . . . , Pt (Ak ) ≤ yk ] = P[P πt−1 (A1 ) ≤ y1 , . . . , P πt−1 (Ak ) ≤ yk ]
=
R
U
D(y1 , y2 , . . . , yk | (αu (πt−1 Ai ))1≤i≤k )dH(u),
since πt−1 (A1 ), πt−1 (A2 ), . . . , πt−1 (Ak ) is a partition of Θ.
Therefore
Pt ∼
Z
U
D(αu, t )dH(u).
56 CHAPTER 4. MIXTURES OF CONTINUOUS-TIME DIRICHLET PROCESSES
4.3.3 Posterior distributions
We suppose now that the sample space of observations is X = C(R+ ), where
C(R+ ) denote the space of continuous functions from R+ to R.
Let F be a transition probability from Θ × ζ into [0, 1].
Let θt be a sample from Pt , i.e. θt |Pt , u ∼ Pt and X(t) |Pt , θt , u ∼ F (θt , .).
Let Hx denote the conditional distribution of (θt , u) given X(t) = x.
Let Hθt denote the conditional distribution of u given θt .
The following proposition shows that if (Pt ) is a mixture of Dirichlet processes
then for each t ∈ R+ the posterior probability of Pt is also a mixture of Dirichlet
processes.
Proposition 4.3.2. If for any t ∈ R+


Pt |u ∼ D(αu, t )







 u ∼ HR
P ∼ U D(αu, t )dH(u)
 t



θt |Pt , u ∼ Pt




 X(t) |
∼ F (θ , .)
Pt , θt , u
then
Pt |X(t)=x ∼
Z
(4.3)
t
D(αu, t + δθt )dHx (θt , u).
Θ×U
Proof
Let A1 , A2 , . . . , Ak be a partition of R
P[Pt (Ai ) ≤ yi , 1 ≤ i ≤ k |X(t)=x ] = E[P[Pt (Ai ) ≤ yi , i = 1, . . . , k |X(t)=x, θt , u ] |X(t)=x ]
= E[D(y1 , y2 , . . . , yk |βu, t (A1 ),..., βu, t (Ak ) ) |X(t)=x ]
=
R
U ×Θ
D(y1 , . . . , yk |βu, t (A1 ),..., βu, t (Ak ) )dHx (u, θ).
4.3. MIXTURES OF DIRICHLET PROCESSES
57
where βu, t (Ai ) = αt, u (Ai ) + δθt (Ai ), for any i = 1, . . . , k.
Therefore
Pt |X(t)=x ∼
Z
D(αu, t + δθt )dHx (θt , u).
U
As a corollary, let us show that the same result holds, if (Pt ) is simply a continuoustime Dirichlet process: the posterior distribution of Pt given X(t) = x is still a
mixture of continuous-time Dirichlet processes.
Corollary 4.3.1. If
then



 Pt ∼ D(αt )



θt ∼ Pt
(4.4)
X(t) |Pt , θt ∼ F (θt , .)
Pt |X(t)=x ∼
Z
D(αu, t + δθt )dHx (θt ).
U
Proof
Let A1 , A2 , . . . , Ak be a partition of R
P[Pt (Ai ) ≤ yi , 1 ≤ i ≤ k |X(t)=x ] = E[P[Pt (Ai ) ≤ yi , 1 ≤ i ≤ k |X(t)=x, θt , u ] |X(t)=x ]
= E[D(y1 , y2 , . . . , yk |βA1 , t , βA2 , t ,..., βAk , t ) |X(t)=x ]
=
R
Θ
D(y1 , y2 , . . . , yk |βA1 , t ,βA2 , t ,..., βAk , t )dHx (θt ),
where βAi , t = αt, u (Ai ) + δθt (Ai ), i ∈ {1, 2, . . . , k}. Therefore
Z
Pt |X(t)=x ∼
D(αt + δθt )dHx (θt ).
Θ
Corollary 4.3.2. If for any t ∈ R+
Z
Pt ∼
D(αu, t )dH(u)
U
58 CHAPTER 4. MIXTURES OF CONTINUOUS-TIME DIRICHLET PROCESSES
and
θt ∼ Pt
then for any t ∈ R+
Pt |θt ∼
Z
D(αu, t + δθt )dHθt (u).
U
Proof
Let A1 , A2 , . . . , Ak be a partition of R
P[Pt (Ai ) ≤ yi , i = 1, . . . , k |θt ] = E[P[Pt (Ai ) ≤ yi , i = 1, . . . , k |θt , u ] |θt ]
= E[D(y1 , y2 , . . . , yk | βu, t (A1 ), . . . , βu, t (Ak )) | θt ]
=
Therefore
Pt |θt ∼
Z
U
R
U
D(y1 , y2 , . . . , yk | βu, t (A1 ), . . . , βu, t (Ak ))dHθt (u
D(αu, t + δθt )dHθt (u).
Chapter 5
Continuous-time Dirichlet
hierarchical models
In some recent and interesting papers, hierarchical models with a Dirichlet prior,
shortly Dirichlet hierarchical models, were used in probabilistic classification applied to various fields such as biology ([1]), astronomy ([24]) or text mining ([4]).
Actually, these models can be seen as complex mixtures of real Gaussian distributions fitted to non-temporal data.
The aim of this chapter is to extend these models and estimate their parameters
in order to deal with temporal data following a stochastic differential equation
(SDE).
The chapter is organized as follows. In section 2 we briefly recall Dirichlet hierarchical models. In section 3 we consider the case of a Brownian motion with
a Dirichlet prior on its variance which is shown to be a limit of a random walk
in Dirichlet random environment. As an application, we estimate, in section 4,
regime switching models with stochastic drift and volatility.
In section 5, we consider the case of functional data such as signals or solutions
59
60
CHAPTER 5. CONTINUOUS-TIME DIRICHLET HIERARCHICAL MODELS
of SDE’s. Computing some posterior distributions in the multivariate case, the
preceding method is extended in order to classify such functional data.
5.1
Dirichlet hierarchical models
Let P ∼ D(cH) denote a Dirichlet process with precision parameter c > 0 and
mean parameter H, where H is a probability measure on a Polish space X. It is
well-Known that P can be approximated by
P =
N
X
pk δXk (.)
k=1
where

iid


 Xi ∼ H



(pi ) ∼ SB(c, N ) (5.1)
(pi ) ⊥ (Xi ),
SB(c, N ) denoting the stick-breaking scheme of Sethuraman. We will say that
(Xi )1, 2,..., follows a Dirichet hierarchical model if

 X | P iid
∼P
i
(5.2)
 P ∼ D(c, H).
5.2. BROWNIAN MOTION IN DIRICHLET RANDOM ENVIRONMENT
5.2
61
Brownian motion in Dirichlet random environ-
ment
5.2.1 Random walks in random Dirichlet environment
Let D(cα) denote a Dirichlet process with parameters c > 0 and α, a finite measure on a polish space X.
Consider a random variable H and a sequence (Ui ) of random variables defined
by the following hierarchical model

iid

Ui | V = σ ∼ N(0, σ 2 )




 V−1 | P ∼ P

P | c ∼ D(cΓ(ν1 , ν2 ))




 c ∼ Γ(η , η ).
1
2
(5.3)
Since V is sampled from a Dirichlet process, we have σ < ∞ a.e. because
P(V < ∞) = E(E(V ∈ R | P, P (R))) = E(P (R)) = 1
Hence, we are allowed to consider the following random walk (Sn )n∈N in Dirichlet random environment, starting from 0:
Sn = U1 + U2 + . . . + Un .
For any real number t ≥ 0 let
Stn =
1
n1/2
S[nt]
(5.4)
where [x] denotes the integer part of x.
Let B σ = σB denote a zero mean Brownian motion with variance σ 2 , B denoting
a standard Brownian motion independent from V.
62
CHAPTER 5. CONTINUOUS-TIME DIRICHLET HIERARCHICAL MODELS
Proposition 5.2.1.
d
(Stn )t≥0 −→ VB.
Proof
Let E = C(R+ ) be the space of real-valued continuous functions defined on R+ .
For any bounded continuous function f defined on E we have
Z
Z Z
n
f ((St ))dP = ( f (x)dPStn |σ′ =σ )dP(σ).
E
R
But, a standard result on the convergence of Gaussian random walks is that
Z
Z
f (x)dPStn |V=σ −→
f (x)dPB σ
E
E
and this integral is dominated by k f k.
Hence by the dominated convergence theorem we have
R
R R
(f (Stn )t≥0 )dP −→ R E f (x)dPB σ (x) dPσ (σ)
=
=
R R
R
R
f
(σx)dP
dPσ (σ)
B
E
f (σB)dP
the last equality being due to the fact that B and σ ′ are independent.
Definition 5.2.1. A Brownian motion in Dirichlet random environment (BMDE)
is a process Z such that


Z | V = σ = L(B σ )




 V−1 | P ∼ P

 P | c ∼ D(cΓ(ν1 , ν2 ))



 c ∼ Γ(η , η ).
1
2
5.2. BROWNIAN MOTION IN DIRICHLET RANDOM ENVIRONMENT
63
So, the above random walks in Dirichlet environment converge to a BMDE
Proposition 5.2.2. If Z is BMDE then its conditional increments are independent
Gaussians
Zti − Zti−1 | V = σ = N(0, (ti − ti−1 )σ 2 ).
The increments Zti − Zti−1 are orthogonal mixtures of Gaussians but need not be
independent.
5.2.2 Simulation algorithm
An order to simulate a M paths Z 1 , . . . , Z M of BM DE, proceed as follows:
A path of a BMDE process (Z0 = 0, Zt1 , . . . , Ztn ) can be simulated as follows:
Let dt = ti+1 − ti > 0 be small enough and let K be the stick-breaking precision.
Draw c from Γ(η1 , η2 ) and draw q = (q1 , q2 , . . . , qK ) from SB(c, N ).
iid
Draw x = (x1 , x2 , . . . , xK ) with xi ’s ∼ Γ(ν1 , ν2 ).
Repeat M times:
P
iid
Draw σ −1 from K
i=1 qi δxi , draw Z0 = 0 and n points Zti such that Zti+1 − Zti ∼
N(0, σ 2 dt).
• Simulations
5.2.3 Estimation
Due to proposition 1, given an observed path (zti of a BMDE, an estimation of
its parameters can be obtained by performing Ishwaran and James blocked Gibbs
64
CHAPTER 5. CONTINUOUS-TIME DIRICHLET HIERARCHICAL MODELS
Figure 5.1: M Paths of BMDE and non Gaussian density of (Zt1i , . . . , ZtMi ).
algorithm with 0 means and equal variances on the data zti+1 − zti (see Ishwaran
- James paper, Section 3).
5.3
Description of the model
let (ω, F, Ft , P ) be a stochastic basis and (Wt ) a one dimensional Winer process
adapted to (ω, F, Ft , P ). We consider a stochastic process satisfying the following SDE:
dXt = b(t, Xt )dt + θ(t)h(Xt )dWt
where the function h(.) is assumed to be known, the volatility coefficient θ(.) is
a known function of time and has to be correctly estimated, the drift coefficient
b(t, x) may be unknown. We observe one sampling path of the process (Xt , t ∈
5.3. DESCRIPTION OF THE MODEL
65
[0, T ]) at the discrete times ti = i△ for i = 1, . . . , N . The sampling interval △
is small in comparison of T . Let assume that N := T △−1 is an integer.
We will use the following assumptions:
• (A0): θ(t) is adapted to the filtration Ft , b(t, .) is non-anticipative map, b ∈
C −1 (R+ , R) and the exist LT > 0 such that ∀LT > 0 such that ∀t ∈ [0, T ],
E(θ4 ) ≤ LT and E(θ8 ) ≤ LT .
P
• (A1): θ(.) = fρ=0 θρ 11[tρ , tρ+1 ) (.) where tρ is the volatility jump times.
• (A2): ∃ > 0 such that θ2 (.) is almost surely Hölder continuous of order m with
a constant K(ω) and E(K(ω)2 ) < +∞.
If we assume that the volatility jump times correspond to the sampling times ti =
i△, we have
• (A1’): θ(.) =
PN
1[ti , ti+1 ) (.)
i=0 θi 1
2
we denote δθ2 = θi+1
− θi2 .
and if moreover there is at most one change time in each window we get (A3).
• (A3): (A1) and (A1’) are satisfied and inf ρ=0,..., f |tρ+1 − tρ | ≥ A△.
Remark 5.3.1. If θ(t) satisfies a S.D.E. then (A2) is fulfilled, see e.g [A. Revuz
and M.Yor, (1991)].
Rt
We need to control tii+1 b4 (s, Xs )ds, so we will use:
(B1) ∃KT > 0,
∀t ∈ [0, T ],
E(b(t, Xt )4 ) ≤ KT In all the sequel we work on
the simplified model:
dXt = bt (t, Xt )dt + θ(t)dWt .
Under some natural assumptions, the model (2) becomes (3) after the following
change of variable:
Proposition 5.3.1. (Pierre Bertrand) Assume that there exists a domain D ⊆
R such that h ∈ C(D, R+ − {0}) the space of continuous function from D to
R+ − {0}, h−1 ∈ L1loc (D) and for (Xt ) solution of (2) satisfying P(Xt ∈ D, ∀ t ∈
66
CHAPTER 5. CONTINUOUS-TIME DIRICHLET HIERARCHICAL MODELS
[0, T ]) = 1.
Let H(x) =∈ h−1 (ξ)dξ. Then Yt = H(Xt ) satisfies the S.D.E (3) with b1 (t, x) =
′
h−1 (x)a(t, x) − 21 h (x)θ2 (t).
5.4
Estimation of the Volatility using Haar wavelets
basis
Since the size of the window appears in numerical applications as a free parameter to be arbitrarily chosen, we give a description of the Estimator introduced by
Pierre Bertrand
N/A−1
HA,△ (t) =
X
k=1
5.5
(
)
A−1
X
A−1
(XtkA+i+1 − XtkA+i )2 11[tkA ;t(k+1)A ) (t).
(5.5)
k=1
SDE in Dirichlet random environment
More generally, consider the following model. During the observation time interval [0, T ] the process Xt , evolves according to various regimes. Regime Rj holds
during a random time interval [Tj−1 , Tj ) where
0 = T0 < T1 < T2 < . . . < TL = T.
The drift and the variance are randomly chosen in each regime but they do not
change during this regime, so
dXt =
L
X
j=1
µRj 1[Tj−1 , Tj ) (t)dt +
L
X
j=1
σRj 1[Tj−1 , Tj ) (t)dBt
5.5. SDE IN DIRICHLET RANDOM ENVIRONMENT
67
where the Rj ’s ∈ {1, . . . , N } are random positive integers such that

iid P

Rj |p ∼ N

k=1 pk δk (.)






 (µk , σk ) | θ ∼ N(θ, σµ ) ⊗ Γ(η1 , η2 ),
p | α ∼ SB(α, N )




α ∼ Γ(ν1 , ν2 )




 θ ∼ N(0, A).
k = 1, . . . , L
5.5.1 Estimation
The above process (Xt ) is observed at discrete times, say idt, i = 0, 1, 2, . . . , n
It is also assumed that the regime changes occur at these times. The estimation
of the above parameters can be done through Ishwaran and James Blocked Gibbs
algorithm where their class label variable K is our regime R.

ind


∆Xi | R, µ, σ ∼ N(µRi , σRi )



iid P


Ri |p ∼ N

k=1 pk δk (.)






 µi | θ ∼ N(θ, σµ )
σi ∼ Γ(η1 , η2 )




p | α ∼ SB(α, N )






α ∼ Γ(ν1 , ν2 )




 θ ∼ N(0, A).
5.5.2 Option pricing in a regime switching market
The above setting can be used in the option pricing problem with Xt = log(St )
where (St )t≥0 is the stock price process governed by a geometric Brownian mo-
68
CHAPTER 5. CONTINUOUS-TIME DIRICHLET HIERARCHICAL MODELS
tion, and σRi is a stochastic volatility during regime Ri . Observe that the estimations are done here without using any sliding windows technique and without
assuming that Tj − Tj−1 is exponentially distributed, as it is done with Markov
chains in regime switching markets.
Definition 5.5.1. Suppose X is an n×p matrix, each row of which is independently
drawn from p-variate normal distribution with zero mean:
X(i) = (x1i , . . . , xpi )T ∼ Np (0, V ).
Then the Wishart distribution is the probability distribution of the p × p random
matrix
T
W = XX =
n
X
T
.
X(i) X(i)
i=1
One indicates that W has that probability distribution by writing
W ∼ W(n, V ).
The positive integer n is the number of degrees of freedom.
5.6
Classification of trajectories
We consider the problem of classifying a set of n functions representing signals,
stock prices and so on. Each function is known through a finite dimensional vector
of observed points. In order to classify these functions, we now extend the blocked
Gibbs algorithm to vector data. First let us precise our model.
5.6.1 Hierarchical Dirichlet Model for vector data
In the finite d-dimensional normal mixture problem, we observe data
f = (f1 , f2 , . . . , fn ), where fi are iid random curves with finite Wiener mixture
5.6. CLASSIFICATION OF TRAJECTORIES
69
density, the curves fi can be represented and approximated by the vector f̃i =
(△1 fi , △2 fi , . . . , △L fi )
Z
ψP (f ) =
R×R+
φ(f |σ(y) )dP (y) = Σdk=1 pk, 0 φ(f |σk )
(5.6)
where φ(f |σ ) represents a d-dimensional normal distribution with mean 0 and
variance matrix σ.
Based on the data , we would like to estimate the unknown mixture distribution
P . We can devise a Gibbs sampling scheme for exploring the posterior PN | f.
Notice that the model derived from (5) also contains hidden variables
K = {K1 , . . . , Km } since it can also be expressed as

iid

f̃i | K, W, µ ∼ NL (µKi , △ti WKi )



PN




 Ki | p ∼ k=1 pk δk (.)
(5.7)
µk | θ ∼ NL (θ, σµ )




Wk ∼ W(s, V )




 θ ∼ N (0, A)
k
where W(s, V ) and NL (µ, σ) denote a Wishart and a multivariate Gaussian distribution respectively, and p ∼ SB(c, N ).
Note that a similar model for vector data appear in Caron F. et al. (2006) but in
our case the parameters of the Whishart prior are updated at each iteration. In
addition, we have a problem of clustering which justifies the use of the hidden
variables Ki ’s. In particular we will need to compute the posterior distribution of
the class variable K and of the weight variable p. To implement the blocked Gibbs
sampler we iteratively draw values from the following conditional distributions:
µ | K, W, θ, f
W | K, µ, K, f
70
CHAPTER 5. CONTINUOUS-TIME DIRICHLET HIERARCHICAL MODELS
K | p, σ, Z, f
p | K, α
α|p
θ | µ.
5.6.2 Posterior computations
Blocked Gibbs Algorithm for vector data .
⋆
} denote the current m unique values of K. In each iteration of
Let {K1⋆ , . . . , Km
the Gibbs sampler we simulate:
⋆
(a) Conditional for µ: For each j ∈ {K1⋆ , . . . , Km
}, draw
ind
µj | W, K, θ, f ∼ Nl (µ⋆j , Wj⋆ )
where µ⋆j =
P
{i:Ki =j}
f̃i + θ and Wj⋆ = σµ , also for each j ∈ K − K ⋆ , indepen-
dently simulate µj ∼ Nl (θ, σµ ).
⋆
• (b) Conditional for W : For each j ∈ {K1⋆ , . . . , Km
}, draw
ind
Wj | µ, K, f ∼ W(s,
X
(f̃i − µj )(f̃i − µj )T + V )
{i:Ki =j}
where W(V, p) denote the Wishart distribution with parameters V and p.
• (c) Conditional for K:
iid
Ki | p, µ, W, f ∼
N
X
h=1
ph, i δh (.),
i = 1, . . . , l
5.6. CLASSIFICATION OF TRAJECTORIES
71
where for each h = 1, 2, . . . , N
ph, i ∝ ph
1
1/2
(2π)l/2 det(Wh )
nh
exp
D
X
E
(f˜d − µh )(f˜d − µh )T , Wh ,
{d, Kd⋆ =h}
and < A, B > is the trace of AB.
• (d) Conditional for p:
For any integer N , let V1 , . . . , VN −1 be iid β(1, c) and VN = 1. Let p1 =
⋆
V1⋆ , pk = (1 − V1⋆ ) . . . (1 − Vk−1
)Vk⋆ , k = 2, . . . , N
where
Vk⋆
N
X
= β 1 + rk , α +
rl ,
f or
k = 1, . . . , N − 1
l=k+1
and (as before) rk records the number of Ki values which equal k.
• (e) Conditional for α:
N
−1
X
α | p ∼ Γ N + η1 − 1, η2 −
log(1 − Vk⋆ ) ,
k=1
for the same values of Vk⋆ used in the simulation for p.
• (f) Conditional for θ:
θ | µ ∼ NL (θ⋆ , σ ⋆ ),
where
θ⋆ =
N
X
µk
and σ ⋆ = A.
k=1
Proof
⋆
Let φ denote the distribution function, for every j ∈ {K1⋆ , . . . , Km
}
72
CHAPTER 5. CONTINUOUS-TIME DIRICHLET HIERARCHICAL MODELS
• (a) Conditional for µ:
φµj |W, K, θ, f (y) = φf |µj =y, W, K,θ (y)φµj |W, K,θ (y)φW,K,θ
=
=
Q
{d, Kd⋆ =j}
Q
iy T
= e
φf˜d |µj =y, W, K, θ (y)φµj |W, K, θ (y)φW, K, θ
{d, Kd⋆ =j}
P
eiy
˜
{d, K ⋆ =j} fd
d
T f˜
d
− 12
e
1
˜ T Wj f˜d
e− 2 fd
P
1 T
T
eiy θ− 2 y σµ y
˜ T Wj f˜d )
{d, K ⋆ =s} (fd
d
eiy
T θ− 1 y T σ y
µ
2
T P
1P
T
iy (θ+ {d, K ⋆ =j} f˜d )− 21 y T σµ y
− 2 {d, K ⋆ =s} (f˜d Wj f˜d )
d
d
e
= e
hence
ind
µj | W, K, θ, f ∝ Nl (θ +
X
{d, Kd⋆ =j}
f˜d , σµ )
5.6. CLASSIFICATION OF TRAJECTORIES
73
⋆
• (b) Conditional for W : For each j ∈ {K1⋆ , . . . , Km
}
φWj−1 |µ, K, f (M ) = φX|Wj =M, K (M )φWj−1 |K, µ (M )φµ, K (z, t)
=
×
Q
n−l−1 n−l−1
2
) 2
nl
n
)
2 2 det(V ) 2 Γp ( n
2
= e
=
e− 2 (fd −µj )
det(M
− 21 T r
×
˜
1
{d, Kd⋆ =j}
P
T M (f˜ −µ )
j
d
1
e− 2 T r(V
−1 M )
φµ, K (z, t)
T
˜
˜
{d,K ⋆ =j} (fd −µj )(fd −µj ) M
d
n−l−1 n−l−1
2
) 2
nl
n
2 2 det(V ) 2 Γp ( n
)
2
det(M
n−l−1 n−l−1
det(M 2 ) 2
nl
n
)
2 2 det(V ) 2 Γp ( n
2
1
e− 2 T r(V
− 21 T r
e
−1 M )
(
P
φµ,K (z, t)
T
−1 )M
˜
˜
{d, K ⋆ =j} (fd −µj )(fd −µj ) +V
d
× φµ, K (z, t)
therefore,
X
ind
Wj | µ, K, f ∝ W n, (
(f̃i − µj )(f̃i − µj )T + V )−1 .
{i:Ki =j}
• (c) Conditional for K:
P {Ki = j | p, µ, W, f } = P {f | p, W, Ki = j, µ}P {Ki = s | W, µ}P {µ}P {W }
∝ P {f | p, W, Ki = j, µ}P {Ki = s | W, µ}
=
Q
{d, Kd⋆ =s}
ps
(2π)l/2 det(Ws )
1
˜
1/2 e− 2 (fd −µs )
TW
˜
s (fd −µs )
.
74
CHAPTER 5. CONTINUOUS-TIME DIRICHLET HIERARCHICAL MODELS
Hence,
ps, i ∝ ps
1
1/2
(2π)l/2 det(Ws )
ns
exp
D
X
(f˜d − µs )(f˜d − µs )T , Ws
{d, Kd⋆ =s}
E
where ns is the number of time Ks⋆ occurs in K.
(d) Conditional for θ:
φθ|µ=µ′ (θ) ∝ φµ|θ (µ′ )φθ (θ)
=
=
=
QN
j=1
Q
ei
φµ|θ (µ′j )φθ (θ)
′
N
iθT µ′j − 21 µ′T
e j σ µ µj
j=1 e
PN
j=1
θT µ′j − 21 θT Aθ
e
e − 21 θT Aθ
P N 1 ′T ′
e j=1 − 2 µj σµ µj .
P
Hence the distribution of θ | µ ∝ NL ( N
j=1 µj , A).
5.6.3 Classes of volatility
Let (St ) be the stock price process and suppose that Xt = log(St ), satisfies:
dXt = b(t, Xt )dt + θ(t)h(Xt )dBt
(5.8)
where the function h(.) is assumed to be known, the volatility coefficient θ(.) is a
random function of time and has to be estimated and the drift coefficient b(t, x) is
unknown. We observe a path of the process (Xt , t ∈ [0, T ]) sampled at discrete
times ti = i△, for i = 1, . . . , N .
Under some conditions and after a change of variable (see e.g. [5]), equation (5.8)
reduces to
dXt = bt (t, Xt )dt + θ(t)dBt .
5.7. CONCLUSION
75
A refined method to estimate θ(t) consists in using wavelets. Consider (Vj , j ∈ Z)
an r-regular Multi Resolution Analysis of L2 (R) such that the associated scale
function Φ and the wavelet function ψ are compactly supported. For all j, the
family {Φj, k (t) = 2j/2 Φ(2j t − k), k ∈ Z} is an orthogonal basis of Vj . Time
being sampled with △ = 2−n , St , the estimator is then:
θ2 (t) =
X
µj(n), k Φj(n), k (t)
(5.9)
k
for j(n) < n, where
µj(n), k =
N
−1
X
Φj(n), k (ti )(Xti+1 − Xti )2 .
(5.10)
i=1
Suppose that we have observed n trajectories X1 , . . . , Xl , . . . , Xn sampled as
above, and that we want to classify them according to their volatility component,
that is, we want to classify the θl ’s estimated by (5.9).
We then see that we have just to apply the preceding algorithm to the vectors
µlj(n), k which are finite dimensional representations of the θl ’s.
5.7
Conclusion
We have extended Dirichlet hierarchical models in order to deal with temporal data
such as solutions of SDE with stochastic drift and volatility. It can be thought that
the process on which are based these parameters belongs to a certain well-known
class of processes, such as continuous time Markov chains. Then, we think that
a Dirichlet prior can be put on the path space, that is a functional space. The
estimation procedure in such a context is the topic the next chapter.
76
CHAPTER 5. CONTINUOUS-TIME DIRICHLET HIERARCHICAL MODELS
Chapter 6
Markov regime switching with
Dirichlet Prior. Application to
Modelling Stock Prices
We have seen in a preceding Chapter, some examples of continuous-time Dirichlet processes with parameters proportional to the distribution of continuous-time
processes, such as the Wiener measure one.
In the present Chapter, motivated by some mathematical models in finance dealing
with ’Regime switching markets’, we consider the case where the continuous-time
process is a continuous-time Markov chain whose state at time t modellizes the
state of the market at time t.
Indeed, while in preceding Chapter 5, volatility was constant during some time interval of random length without any hypothesis on the switching process, here the
switching depends on a Markov chain which states represent the different regimes.
Also, the various values of the trend and the volatility depend on the state of this
chain which ’chooses’ these values among some i.i.d. ones. Clearly, we deal with
77
78CHAPTER 6. MARKOV REGIME SWITCHING WITH DIRICHLET PRIOR. APPLICATION
stochastic volatility
In our approach, the regimes play the same role as the classes play in classification: each temporal observation therefore belongs to a class that is to a regime.
Our contribution consists in placing a Dirichlet process prior on the path space of
the Markov chain, which is a cadlag function space. This idea is new as it has
never been used in the literature.
In the first Section, we present our model. Section 2 deals with the estimation
procedure, the computations of the posteriors follow from those done in Chapter
5. In the last Section 3, we give some indications on the implementation of the
algorithm in C language and some numerical results are presented.
6.1
Markov regime switching with Dirichlet prior
In this section , we take ᾱ = H, the distribution of a continuous time Markov
chain on a finite set of states and we propose a new hierarchical model that is
specified, as an example, in the setting of mathematical finance. Of course, this
can be similarly used in many other cases. We consider the Black-Scholes SDE
in random environment with a Dirichlet prior on the path space of the chain, the
states of the chain representing the environment due to the market. We model the
stock price using a geometric Brownian motion with drift and variance depending
on the state of the market. The state of the market is modeled as a continuous time
Markov chain with a Dirichlet prior. In what follows, the notation σ will be used
to denote the variance rather than the standard deviations.
The following notations will be adopted:
1. n will denote the number of observed data and also the length of an observed
path.
6.1. MARKOV REGIME SWITCHING WITH DIRICHLET PRIOR
79
2. M will denote the number of states of the Markov chain.
3. The state space of the chain will be denoted by S = {i : 1 ≤ i ≤ M }.
4. N will denote the number of simulated paths.
5. m will denote the number of distinct states of a path.
– The stock price follows the following SDE:
p
dSt
= β(Xt )dt + σ(Xt )dBt ,
St
t ≥ 0,
where Bt is a standard Brownian motion. By the Ito’s formula, the process
Zt = log(St ) satisfies the SDE,
dZt = µ(Xt )dt +
p
σ(Xt )dBt ,
t ≥ 0,
where µ(Xt ) = β(Xt )− 21 σ(Xt ). The observed data is of the form Z0 , Z1 , . . . , Zn .
– The process (Xt ) is assumed to be a continuous time Markov process taking
values in the set S = {i : 1 ≤ i ≤ M }. The transition probabilities of this
chain are denoted by pij , i, j ∈ S and the transition rate matrix is Q0 =
(qij )i,j∈S with
λi > 0,
qij = λi pij
if
i 6= j,
and
qii = −
X
qij ,
i, j ∈ S.
j6=i
Then, conditional on the path {Xs , 0 ≤ s ≤ n}, Yt = Zt − Zt−1 =
log(St /St−1 ) are i.i.d. N(µXt , σXt ), t = 1, 2, . . . , n.
– For each i = 1, 2, . . . , M, the priors on µi = µ(i) and σi = σ(i) are specified by
µi
ind
σi
ind
∼ N(θ, τ µ ),
∼ Γ(ν1 , ν2 ).
with
θ ∼ N(0, A),
A > 0,
(6.1)
(6.2)
80CHAPTER 6. MARKOV REGIME SWITCHING WITH DIRICHLET PRIOR. APPLICATION
– The Markov chain {Xt , t ≥ 0} has prior D(α H), where H is a probability
measure on the path space of cadlag functions D([0, ∞), S). The initial distribution according to H is the uniform distribution π0 = (1/M, . . . , 1/M ),
and the transition rate matrix is Q with pij = 1/(M − 1) and λi = λ > 0.
Thus the Markov chain under Q will spend an exponential time with mean
1/λ in any state i and then jump to state j 6= i with probability 1/(M − 1).
A realization of the Markov chain from the above prior is generated as follows: Generate a large number of paths Xi = {xis : 0 ≤ s ≤ n}, i =
1, 2, . . . , N, from H. Generate the vector of probabilities (pi , i = 1, . . . , N )
from a Poisson Dirichlet distribution with parameter α, using stick breaking.
Then draw a realization of the Markov chain from
p=
N
X
pi δXi ,
(6.3)
i=1
which is a probability measure on the path space D([0, n), S). The parameter λ is chosen to be small so that the variance is large and hence we obtain a
large variety of paths to sample from at a later stage. The prior for α is given
by,
α ∼ Γ(η1 , η2 ).
6.2
(6.4)
Estimation
Estimation is done using the simulation of a large number of paths of the Markov
chain which will be selected according to a probability vector (generated by stickbreaking) and then using the blocked Gibbs sampling technique. This technique
uses the posterior distribution of the various parameters.
6.2. ESTIMATION
81
To carry out this procedure we need to compute the following conditional distributions. We denote by µ, and σ, the current values of the vectors (µ1 , µ2 , . . . , µn ),
(σ1 , σ2 , . . . , σn ), respectively. Let Y be the vector of observed data (Y1 , . . . , Yn ).
Let X = (x1 , x2 , . . . , xn ) be the vector of current values of the states of the
Markov chain at times t = 1, 2, . . . , n, respectively. Let X ∗ = (x∗1 , . . . , x∗m ) be
the distinct values in X.
– Conditional for µ. For each j ∈ X ∗ draw
ind
(µj |σ, X, θ, Y ) ∼ N(µ∗j , σj∗ ),
where
µ∗j = σj∗
X Yt
θ
+ µ
σj τ
t:X =j
t
σj∗
=
nj
1
+ µ
σj τ
−1
!
(6.5)
,
,
and nj being the number of times j occurs in X. For each j ∈ X \ X ∗ ,
independently simulate µj ∼ N(θ, τ µ ).
– Conditional for σ. For each j ∈ X ∗ draw
ind
(σj |µ, K, Y ) ∼ Γ(ν1 +
where
ν2,∗ j = ν2, j +
nj ∗
, ν ),
2 2, j
(6.6)
X (Yt − µj )2
.
2
t:X =j
t
Also for each j ∈ X \ X ∗ , independently simulate σj ∼ Γ(ν1 , ν2 ).
– Conditional for X.
(X|p) ∼
N
X
i=1
p∗i δXi ,
(6.7)
82CHAPTER 6. MARKOV REGIME SWITCHING WITH DIRICHLET PRIOR. APPLICATION
where
p∗i
∝
m Y
g=1
Y
{d, xi,∗
d =g}
1
− 2σ1 (Yd −µg )2
g
e
pi ,
(2σg π)1/2
(6.8)
i,∗
where (xi,∗
1 , . . . , xm ) denote the current m = m(i) unique values of Xi ,
i = 1, . . . , N.
– Conditional for p.
∗
p1 = V1∗ , and pk = (1−V1∗ ) · · · (1−Vk−1
)Vk∗ , k = 2, 3, . . . , N −1, (6.9)
where
ind
Vk∗ ∼
N
X
β 1 + rk , α +
rℓ ,
ℓ=k+1
rk being the number of xil ’s which equal k.
– Conditional for α.
(α|p) ∼ Γ N + η1 − 1, η2 −
N
−1
X
log(1 −
i=1
!
Vi∗ )
,
where the V ∗ values are those obtained in the simulation of p in the above
step.
– Conditional for θ.
(θ|µ) ∼ N(θ∗ , τ ∗ ),
(6.10)
where
M
τ∗ X
µj ,
θ = µ
τ j=1
∗
and
∗
τ =
Proof.
M
1
+
µ
τ
A
−1
.
6.2. ESTIMATION
83
(a) The computation of the posterior distributions for µ, σ and θ follow in the
same manner as in Ishwaran and James (2002) and Ishwaran and Zarepour
(2000). Here, Xt = s means that the class variable is equal to s.
(b) Conditional for X:
P {X = Xi | p, µ, σ, Y } = P {Y | p, σ, X = Xi , µ}P {X = Xi | σ, µ, p}P {µ, σ}
∝
Hence,
p∗i ∝
m Y
g=1
Y
Qm Q
g=1
1
{d,xi,∗
d =g}
− 1 (Y −µ )2
1
e 2σg d g
(2πσg )1/2
− 2σ1 (Yd −µg )2
e
1/2
g
(2πσg
{d, xi,∗ =g}
d
pi
pi
i,∗
where Xi = (xi1 , . . . , xin ) and (xi,∗
1 , . . . , xm ) denote the current m unique
values in the path Xi .
(c) Conditional for p : The Sethuraman stick-breaking scheme can be extended to
the two-parameter Beta distributions, see Walker and Muliere (1997, 1998):
ind
Let Vk ∼ β(ak , bk ), for each k = 1, . . . , N . Let
p1 = V1 , and pk = (1 − V1 ) · · · (1 − Vk−1 )Vk , k = 2, 3, . . . , N − 1.
We will write the above random vector, in short as
p ∼ SB(a1 , b2 , . . . , aN −1 , bN −1 ).
By Connor and Mosimann (1969), the density of p is
−1
NY
Γ(ak − bk ) a1 −1
a −1 −1 bN −1 −1
p1 . . . pNN−1
pN
×
Γ(a
k )Γ(bk )
k=1
×(1 − P1 )b1 −(a2 −b2 ) . . . (1 − PN −2 )bN −2 −(aN −1 −bN −1 ) ,
where Pk = p1 + . . . + pk .
84CHAPTER 6. MARKOV REGIME SWITCHING WITH DIRICHLET PRIOR. APPLICATION
From this, it easily follows that the distribution is conjugate for multinomial
sampling, and consequently the posterior distribution of p given X, when
ak = 1 and bk = α for each k, is
SB(a∗1 , b∗2 , . . . , a∗N −1 , b∗N −1 ),
where
b∗k = α +
N
X
rℓ
ℓ=k+1
a∗k = 1 + rk ,
and rk is the number of xil ’s which equal k, k=1,. . . , N-1.
6.3
Implementation
The algorithm presented in the previous section was implemented in C language.
The implementation includes:
- functions that simulate standard probability distributions: Uniform, Normal,
Gamma, Beta, Exponential.
- a function that returns an index ∈ {1, . . . , n} according to a vector of probability
p1 , . . . , pn .
- a function that simulates a probability vector according to stick-breaking scheme.
- a function that simulates n paths of a Markov chain.
- a function that records the number of times a state appears in a path.
- a function that chooses one of the paths according to a vector of probability.
- a function that modifies the parameters of prior distributions according to the
formulas of the posteriori distributions.
6.3. IMPLEMENTATION
85
After having simulated a number of paths, we perform the iterations. At each
iteration a path is randomly selected and the parameters are updated according
to posteriori formulas. At the end of each iteration of the Gibbs sampling, we
obtain a path X of the Markov chain. From this, the parameters π and Q0 can be
re-estimated. From Q0 the parameters λi and pij can be derived.
6.3.1 Simulated data
We fit the model, using the algorithm developed above, to a simulated series of
lenght n = 480, with a number of states (regimes) M = 4, mean and variance in
each state being chosen as follows:
(µ1 , σ1 ) = (−1.15, 0.450)
(µ2 , σ2 ) = (−0.93, 0.450)
(µ3 , σ3 ) = (−0.60, 0.440)
(µ4 , σ4 ) = (1.40, 0.500).
We have performed our algorithm on that series with number of states M = 10,
number of paths N = 100 and number of iterations = 25,000. Then, we have observed that the algorithm is able to put most of the mass (in terms of the stationary
distribution of the MC) on 4 regimes, which are close to the ones chosen above.
At the end of the iterations we compute a confidence interval for the mean and
for the variance w.r.t. each regime. We can conclude that the algorithm is able to
identify the parameters of the simulated data set.
The confidence intervals for the mean and the variance are given below.
Regime 1:
Im = [−1.208, −1.12423] and Iv = [0.431, 0.4738].
86CHAPTER 6. MARKOV REGIME SWITCHING WITH DIRICHLET PRIOR. APPLICATION
Regime 2:
Im = [−0.9351, −0.9296]
and
Iv = [0.442, 0.4538].
Im = [−0.63446, −0.5140]
and
Iv = [0.4319, 0.4491].
Regime 3:
Regime 4:
Im = [1.30114, 1.43446] and
Iv = [0.4949, 0.5081].
6.3.2 Real data
We have also applied our algorithm to the Bsemidcap index data of the Indian National Stock Exchange (NSE) from 21/12/2006 to 15/11/2007 (www.nseindia.com).
For this dataset we have, n = 250, ∆t = 1, and we deal with N = 100 of paths
while Gamma(2, 4) is the prior for α.
With the above choice, we obtain 6 regimes for which the estimates for the mean,
variance and stationary probabilities are as follows:
R1
R2
R3
R4
R5
R6
µ
0.001124
-0.009479
0.000629
-0.004579
0.000829
0.001109
σ
2.9132 e-05
7.2166 e-05
2.3023 e-05
7.3800 e-05
1.186 e-05
3.3372 e-05
π
20 %
3%
29%
5%
10 %
33 %
The most frequent Markov chain path, its parameters λi s and the matrix of transition probability (pi,j )1≤i6=j≤6 are respectively equal to:
6.3. IMPLEMENTATION
87
35363636165136353366563611416133666313336333
45666646111666661333161335633165413646335636
23613361665511535336165616631631162366633266
61336631366166116153513534133531366613565336
36135666516331166636136366666363664636116461
3 4 3 6.

λ1
λ2
λ3
λ4
λ5
λ6
0.8
1
0.7
1
0.95
0.75
0
0.48
0.03
0.06
0.42





0
0.66
0
0
0.33




 0.16 0.02

0.062
0.2
0.54




 0.375
0
0
0.125 0.5 




 0.157

0
0.42
0.052
0.36


0.36 0.038 0.384 0.077 0.134
It is interesting to note that in the high volatility states, the index has a negative
drift as is usually observed in analysis of empirical data. A by-product of our
algorithm is the distribution of the current state of the volatility, which is required
to compute the price of an option ( see [12] and references therein).
88CHAPTER 6. MARKOV REGIME SWITCHING WITH DIRICHLET PRIOR. APPLICATION
Chapter 7
Conclusion and Perspectives
Our main subject of interest was to investigate Dirichlet processes when the parameter is proportional to the distribution of a stochastic process (Brownian motion, jump processes, ...) and to propose continuous-time hierarchical models
involving continous-time Dirichlet processes.
Although this area requires some rather nontrivial techniques, we have shown that
such a setting can be of interest in modelling SDEs in random environment and
that the proposed estimation procedure works.
Let us finally mention some perspectives.
It is clear that it would be interesting to extend the method to other SDEs and
to other kind of processes, we think of replacing, in the last chapter, the markov
chain by a diffusion, a spatio-temporal process or a multivariate process.
It would be also of interest to use the estimated model for prediction and to compare this prediction with other models.
Concerning the algorithm in the last chapter it can be observed that for each iteration, an option price w.r.t. the selected path can be computed by using for example
the formula in Ghosh and Deshpande. After performing all the iterations, we will
89
90
CHAPTER 7. CONCLUSION AND PERSPECTIVES
have a distribution of option prices that can be used for decision-making on the
final option price. This should be compared to other decision procedures.
Bibliography
[1] A NTONIAK , C.E. (1974). Mixtures of Dirichlet processes. Ann. Statist. 2, 6,
1152-1174.
[2] B ERTOIN , J (2006). Random fragmantation and coagulation processes.
[3] B LACKWELL , D. and M AC Q UEEN , J. B. (1973). Ferguson distributions via
Polya urn schemes. Ann. Statist. 2, 1, 353-355.
[4] B LEI , D. and J ORDAN ., I. J. (2005). Variational inference for Dirichlet
process mixtures. Bayesian Analysis, 1 121-144.
[5] B LEI , D., N G , A. and J ORDAN , M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993-1022.
[6] B ERTRAND , P (1996). Estimation of the Stochastic Volatility of a Diffussion
Process I. Comparison of Haar basis Estimator and Kernel Estimators. INRIA.
[7] B RUNNER , L.J. and L O , A. Y. (2002). Bayesian classification To appear.
[8] C IFARELLI , D. M. and M ELILLI , E. (2000). Some new results for Dirichlet
priors, Journal of the American Statistical Association. Ann. Statist. 28 13901413.
[9] C IFARELLI , D. and R EGAZZINI , E. (1978). Problemi Statistici Non Parametrici in Condizioni di Scambiabilit‘ a Parziale e Impiego di Medie Asso-
91
BIBLIOGRAPHY
92
ciative. Tech. rep., Quaderni Istituto Matematica Finanziaria dell-Universitàdi
Torino.
[10] C IFARELLI , D. M. and R EGAZZINI , E. (1990). Distribution functions of
means of a Dirichlet process. Ann. Statist. 18 429-442.
[11] DAHL , D. B. (2003). Modeling differential gene expression using a Dirichlet Process mixture model. in Proceedings of the American Statistical Association, Bayesian Statistical Sciences Section. American Statistical Association,
Alexandria, VA.
[12] D ESHPANDE , A. AND G HOSH , M. K. (2007). Risk Minimizing Option
Pricing in a Regime Switching Market. Stochastic Analysis and Applications.
Vol. 28, 2008. To appear.
[13] DONNET, S., and SAMSON, A. (2005). Parametric Estimation for Diffusion Processes from Discrete-time and Noisy Observations. Journal of the
American Statistical Association, 92, 894-902.
[14] D OSS , H. and S ELLKE , T (1982). The tails of probabilities chosen from a
Dirichlet prior. Ann. Statist. 10 1302-1305.
[15] E MILION , R. (2001). Classification and mixtures of processes. SFC 2001
and C.R. Acad. Sci. Paris, série I 335, 189-193.
[16] E MILION , R. and PASQUIGNON , D. (2005). Random distributions in image
analysis. Preprint.
[17] E MILION , R. (2005). Process of random distributions. Afrika Stat, vol 1, 1,
pp. 27-46, http://www.ufrsat.org/jas (contenus).
[18] F ERGUSON , T.S (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209-230.
BIBLIOGRAPHY
93
[19] F ERGUSON , T.S. (1974). Prior distributions on spaces of probability measures. Ann. Statist. 2 615-629.
[20] F ERGUSON , T. S. (1983). Bayesian density estimation by mixtures of normal distributions. Recent Advances in Statistics (H. Rizvi and J. Rustagi, eds.),
New York: Academic Press.
[21] H AL DAUME III and DANIEL M ARCU (2005). A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior. Journal of Machine Learning
Research 6 1, 1551-1577.
[22] H UILLET, T and C HRISTIAN PAROISSIN (2005). A Bayesian Model for
Supervised Clustering with the Dirichlet Process Prior. preprint.
[23] I N D EY, D., M ULLER , P. and S INHA , D. EDS . (1969). Computational methods for mixture of Dirichlet process models. Practical Nonparametric and
Semiparametric Bayesian Statistics, 23-44. Springer.
[24] I SHWARAN , H. and JAMES , L.F. (2002). Approximate Dirichlet processes
computing in finite normal mixtures: smoothing and prior information. J.
Comp. Graph. Stat. 11 209-230.
[25] I SHWARAN H., JAMES , L.F and S UN , J. (2000). Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions. conditionally accepted by Journal of the American Statistical Association.
[26] I SHWARAN , H. and Z AREPOUR , M. (2000). Markov Chain Monte Carlo in
Approximate Dirichlet and Beta Two-Parameter Process Hierarchical Models,.
EJP 1, 1-28.
[27] JASON A. DUAN, M ICHELE GUINDANI, and A LAN E. GELFAND
(2007). Generalized spatial Dirichlet process models. Biometrika, 94, 4, pp.
809825.
BIBLIOGRAPHY
94
[28] KOTTAS , T., A. D UAN , J. and G ELFAND , A. E. (2007). Modeling Disease
Incidence Data with Spatial and Spatio-Temporal Dirichlet Process Mixtures.
Biometrical Journal, 5, 114 DOI: 10.1002/bimj.200610375.
[29] K INGMANN , J. F. C. and JAMES , L.F. (2002). Approximate Dirichlet processes computing in finite normal.
[30] K INGMANN , J. F. C. (1993). P OISSON P ROCESSES . C LARENDON , OX FORD
U NIVERSITY P RESS .
[31] K INGMAN , J. F. C. (1975). Random discrete distributions. J. Roy. Statist.
Soc. B, 37, 1-22.
[32] R ADFORD M. NEAL (1996). Density Modeling and Clustering Using
Dirichlet Diffusion Trees. BAYESIAN STATISTICS 7, pp. 619-629.
[33] ROEDER , K. and WASSERMAN , L. (1997). Practical Bayesian Density Estimation Using Mixtures of Normals. Journal of the American Statistical Association , 92, 894-902.
[34] S ETHURAMAN , J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica 4 639-650.
[35] S ETHURAMAN , J. and W EST, M. (1994). Bayesian Density Estimation and
Inference Using Mixtures. Journal of the American Statistical Association, 90,
pp, 577-588.
[36] S ETHURAMAN , J. and T IWARI , R. C. (1982). Convergence of Dirichlet
Measures and the Interpretation of Their Parameters. Statistical Decision Theory and Related Topics III, 2, 305-315.
[37] S OULE , A., S ALAMATIAN , K. and E MILION , R. (2004). Classification of histograms. Sigmetrics 2004,
rp.lip6.fr/site_npa/site_rp/publications.php.
New York. http://www-
BIBLIOGRAPHY
95
[38] S OULE , A., S ALAMATIAN , K., TAFT, N., Emilion, R. and PAPAGIANNAKI ,
K. (2004). Classification of Internet flows. Sigmetrics’04, New-York.
[39] ROEDER , K. and WASSERMAN , L. (1997). Practical Bayesian Density Estimation Using Mixtures of Normals. Journal of the American Statistical Association , 92, 894-902.
[40] M AC LOSKEY, J.W. (1965). Model for the Distribution of Individuals by
Species in an Environment. Unpublished Ph.D. thesis, Michigan State University.
[41] M ULIERE , P. and TARDELLA , L. (1998). Approximating Distributions of
Random Functionals of Ferguson-Dirichlet Priors. Canadian Journal of Statistics, 26, 283-297.
[42] N. J OHNSON and S. KOTZ . (1978). Urn Models and Their Applications: an
approach to Modern Discrete Probability Theory. Technometrics, Vol. 20, No.
4, Part 1 (Nov., 1978), p. 501.
[43] P. DAMIEN , J. C. WAKEFIELD , and S. G.WALKER . (1999). S. G.Walker.
Gibbs sampling for Bayesian nonconjugate and hierarchical models using auxiliary variables. Journal of the Royal Statistical Society Series B, 61: 331-344.
[44] P ITMAN , J. and YOR , M. (1996). Random discrete distributions derived
from self-similar random sets. EJP 1, 1-28.
[45] P ITMAN , J. and YOR , M. (1997). The two-parameter Poisson-Dirichlet distribution derived from stable subordiantors. Ann. Proba 25, 2, 855-900.
[46] P ITMAN , J. (2003). Poisson-Kingman partitions. In: Science and Statistics:
A Festschrift for Terry Speed. D. R. Goldstein editor. Lecture Notes - Monograph Series 30 1-34. Institute of Mathematical Statistics Hayward, California.
2 Lecture Notes - Monograph Series 30 1-34.
Download