The maximum entropy framework The maximum entropy principle — an example

advertisement
The maximum entropy framework
The maximum entropy principle — an example
Suppose we have a random variable X with known states (values of the observations, x1 , . . . , xn ) but
unknown probabilities p1 , . . . , pn ; plus some extra constrains, eg. hXi is known. We are given the task to
attempt to have a good guess for the probabilities.
Example: X can take 1, 2 or 3 with unknown probabilities, and hXi = x is known. What is the “best
guess” for the probabilities?
Need to find the maximum of H(p1 , p2 , p3 ) as a function of p1 , p2 , p3 , under two constraints:
hXi = 1p1 + 2p2 + 3p3 = x, and p1 + p2 + p3 = 1.
"
!
!#
3
3
X
X
0 = d H(p1 , p2 , p3 ) − λ
ipi − x − µ
pi − 1
i=1
"
=d −
pi log pi − λ
i=1
3
X
i=1
ipi − µ
3
X
pi
i=1
i=1
i=1
3
X
=
3
X
#
− log pi − 1 − λi − µ dpi = 0
the curly brackets need to be zero:
− log(pi ) − 1 − λi − µ = 0 ,
i = 1, 2, 3
which with the notation λ0 = µ + 1 gives
pi = e−λ0 −λi .
The constraint on the sum of probabilities:
1=
3
X
pi = e−λ0
i=1
3
X
e−λi
e−λ0 =
⇒
i=1
so
pi =
1
e−λ + e−2λ + e−3λ
eλ(1−i)
e−λi
=
e−λ + e−2λ + e−3λ
1 + e−λ + e−2λ
The other constraint, hXi = x:
x=
3
X
ipi =
i=1
1 + 2e−λ + 3e−2λ
1 + e−λ + e−2λ
(1)
Multiplying the equation with the denominator gives a second degree equation for e−λ :
(x − 3) e−λ
with this p2 becomes
p2 =
−1 +
p
2
4 − 3(x − 2)2
,
3
+ (x − 2)e−λ + x − 1 = 0
p1 =
3 − x − p2
,
2
p3 =
x − 1 − p2
2
Maximum entropy principle — general form
Suppose we have a random variable X taking (known) values x1 , . . . , xn with unknown probabilities
p1 , . . . , pn . In addition, we have m constraint functions fk (x) with 1 ≤ k ≤ m < n, where
hfk (X)i = Fk ,
1
the Fk s are fixed. Then the maximum entropy principle assigns probabilities in such a way that maximises
the information entropy of X under the above constraints.


0 = d H(p1 , . . . , pn ) −
=
n
X
i=1
(
m
X
n
X
λk
i=1
k=1
− log(pi ) − 1 −
fk (xi )pi − Fk
m
X
!
)
− µ
|{z}
λ0 −1
n
X
i=1
!


pi − 1 
λk fk (xi ) − (λ0 − 1) dpi
k=1
Since this is zero for any dpi , all n braces have to be zero:
pi = exp −λ0 −
m
X
λk fk (xi )
k=1
!
(2)
The sum of probabilities give
1=
n
X
pi = e
−λ0
i=1
n
X
exp −
m
X
i=1
k=1
n
X
m
X
λk fk (xi )
!
Introducing partition function:
Z(λ1 , . . . , λm ) :=
exp −
i=1
With this notation
e−λ0 =
λk fk (xi )
k=1
1
,
Z(λ1 , . . . , λm )
!
(3)
λ0 = log Z(λ1 , . . . , λm )
(4)
The other constraints are
Fk =
n
X
fk (xi )pi = e
−λ0
i=1
n
X
fk (xi ) exp −
i=1
m
X
λk fk (xi )
k=1
∂ log Z(λ1 , . . . , λm )
,
=−
∂λk
!
=−
1 ∂Z(λ1 , . . . , λm )
Z
∂λk
(5)
m
X
1
pi =
λk fk (xi )
exp −
Z(λ1 , . . . , λm )
k=1
!
(6)
The value of the maximised information entropy:
S(F1 , . . . , Fm ) = H(p1 , . . . , pn ) = −
| {z }
= λ0 +
from (6)
m
X
λk
k=1
n
X
n
X
pi log(pi ) = −
n
X
pi
−λ0 −
i=1
i=1
fk (xi )pi = log Z(λ1 , . . . , λm ) +
i=1
m
X
λk fk (xi )
k=1
m
X
λ k Fk
!
(7)
k=1
Now calculate the partial derivatives of S w.r.t. the Fk s, being careful about what is kept constant in the
partial derivatives
m
m
X
X
∂ log Z ∂λ` ∂λ` ∂S =
+
F` + λ k = λ k
(8)
∂Fk {F }
∂λ` {λ} ∂Fk {F }
∂Fk {F }
`=1
`=1
|
{z
}
F`
2
Download