(1)

advertisement
PHYSICS 210A : STATISTICAL PHYSICS
HW ASSIGNMENT #1 SOLUTIONS
(1) Let p(x) = Pr[X = x] where X is a discrete random variable and both X and x are
taken from an ‘alphabet’ X . Let p(x, y) be a normalized joint probability distribution on
two
Prandom variables X ∈ X and Y ∈ Y. The entropy of the joint distribution is S(X, Y ) =
− x,y p(x, y) log p(x, y). The conditional
probability p(y|x) for y given x is defined as
P
p(y|x) = p(x, y)/p(x), where p(x) = y p(x, y).
(a) Show that the conditional entropy S(Y |X) = −
P
x,y
p(x, y) log p(y|x) satisfies
S(Y |X) = S(X, Y ) − S(X) .
Thus, conditioning reduces the entropy, and the entropy of a pair of random variables is the sum of the entropy of one plus the conditional entropy of the other.
(b) The mutual information I(X, Y ) is
I(X, Y ) =
X
x,y
p(x, y)
p(x, y) log
p(x) p(y)
.
Show that
I(X, Y ) = S(X) + S(Y ) − S(X, Y ) .
(c) Show that S(Y |X) ≤ S(Y ) and that the equality holds only when X and Y are independently distributed.
Solution :
(a) We have
S(Y |X) = −
=−
since
P
y
X
x,y
X
p(x, y)
p(x, y) log
p(x)
p(x, y) log p(x, y) +
x,y
X
p(x, y) log p(x)
x,y
= S(X, Y ) − S(X) ,
p(x, y) = p(x).
(b) Clearly
I(X, Y ) =
X
x,y
=
X
x,y
p(x, y) log p(x, y) − log p(x) − log p(y)
p(x, y) log p(x, y) −
X
x
= S(X) + S(Y ) − S(X, Y ) .
1
p(x) log p(x) −
X
y
p(y) log p(y)
(c) Let’s introduce some standard notation1 . Denote the average of a function of a random
variable as
X
Ef (X) =
p(x)f (x) .
x∈X
Equivalently,
we could use the familiar angular bracket notation for averages and write
Ef (X) as f (X) . For averages over several random variables, we have
Ef (X, Y ) =
XX
p(x, y) f (x, y) ,
x∈X y∈Y
et cetera. Now here’s a useful fact. If f (x) is a convex function, then
Ef (X) ≥ f (EX) .
For continuous functions, f (x) is convex if f ′′ (x) ≥ 0 everywhere2 . If f (x) is convex on
some interval [a, b], then for x1,2 ∈ [a, b] we must have
f λ x1 + (1 − λ) x2 ≤ λ f (x1 ) + (1 − λ) f (x2 ) .
This is easily generalized to
f
X
n
where
P
n pn
X
pn f (xn ) ,
pn xn ≤
n
= 1, which proves our claim - a result known as Jensen’s theorem.
Now log x is concave, hence − log x is convex, and therefore
X
p(X) p(Y )
p(x) p(y)
= −E log
p(x, y) log
I(X, Y ) = −
p(x, y)
p(X, Y )
x,y
X
p(X) p(Y )
p(x) p(y)
≥ − log E
p(x, y) ·
= − log
p(X, Y )
p(x, y)
x,y
X
X
p(x) ·
p(y) = − log(1) = 0 .
= − log
x
y
So I(X, Y ) ≥ 0. Clearly I(X, Y ) = 0 when X and Y are independently distributed, i.e.
when p(x, y) = p(x) p(y). Using the results from part (b), we then have
I(X, Y ) = S(X) + S(Y ) − S(X, Y )
= S(Y ) − S(Y |X) ≥ 0 .
1
2
See e.g. T. M. Cover and J. A. Thomas, Elements of Information Theory , 2nd edition (Wiley, 2006).
A concave function g(x) is one for which −g(x) is convex.
2
(2) Compute the information entropy in the Fall 2012 Physics 140A grade distribution. See
http://www-physics.ucsd.edu/students/courses/fall2012/physics140a/index.html.
Solution :
P
n Nn
= 49
Nn
−pn log2 pn
A+
2
0.188
A
10
0.468
A3
0.247
B+
8
0.427
B
11
0.484
B2
0.188
C+
7
0.401
C
1
0.115
C0
0
D
4
0.295
Table 1: F12 Physics 140A final grade distribution.
Assuming the only possible grades are A+, A, A-, B+, B, B-, C+, C, C-, D, F (11 possibilities),
then from the chart we produce the entries in Tab. 1. We then find
S=−
For maximum information, set pn =
11
X
pn log2 pn = 2.93 bits
n=1
1
11
for all n, whence Smax = log2 11 = 3.46 bits.
(3) Study carefully Fall 2011 Physics 140A homework problem 2.2. Suppose I have three
bags. Initially, bag #1 contains a quarter, bag #2 contains a dime, and bag #3 contains
two nickels. At each time step, I choose two bags randomly and
Prandomly exchange one
coin from each bag. The time evolution satisfies Pi (t + 1) =
j Yij Pj (t), where Yij =
P (i , t + 1 | j , t) is the conditional probability that the system is in state i at time t + 1 given
that it was in state j at time t.
(a) How many configurations are there for this system?
P
(b) Construct the transition matrix Yij and verify that i Yij = 1.
(c) Find the eigenvalues of Y (you may want to use something like Mathematica).
(d) Find the equilibrium distribution Pieq .
Solution :
(a) There are seven possible configurations for this system, shown in Table 2 below.
3
F
1
0.115
bag 1
bag 2
bag 3
g
1
Q
D
NN
1
2
Q
N
DN
2
3
D
Q
NN
1
4
D
N
QN
2
5
N
Q
DN
2
6
N
D
QN
2
7
N
N
DQ
2
Table 2: Configurations and their degeneracies for problem 3.
(b) The transition matrix is

0


1
3


1
3



Y = 0



0


1
3


0
1
6
1
3
0
0
1
6
0
1
6
0
0
1
6

1
6
0
1
3
0
1
6
1
6
0
1
6
1
3
1
6
0
1
3
1
3
1
3
0
1
6
1
6
0
0
1
3
1
6
1
6
1
6
0
1
6
1
6
1
6




0



1
6


1

6

1

6

1
3
(c) Interrogating Mathematica, I find the eigenvalues are
λ1 = 1 ,
λ2 = − 23
,
λ3 =
1
3
,
λ4 =
1
3
,
λ5 = λ6 = λ7 = 0 .
(d) We may decompose Y into its left and right eigenvectors, writing
Y =
7
X
λa | Ra ih La |
Yij =
7
X
λa Ria Laj
a=1
a=1
The full matrix of left (row) eigenvectors is


1
1
1
1
1 1 1
−2 1
2 −1 −1 1 0


−1 0 −1 0
0 0 1



L=
0
−1
0
1
−1
1
0


 1 −1 1 −1 0 0 1


1
0 −1 −1 0 1 0
−1 −1 1
0
1 0 0
4
The corresponding matrix of right (column) eigenvectors is

2 −3 −6 0
4
1
4 3
0
−6
−4
−1

2 3 −6 0
4 −5
1 

R=
4 −3 0
6 −4 −7
24 
4 −3 0 −6 −4 5

4 3
0
6 −4 11
4 0 12 0
8 −4

−5
−7

1

−1

11 

5
−4
Thus, we have RL = LR = I, i.e. R = L−1 , and
Y = RΛL ,
with Λ = diag 1 , − 32 ,
1
3
,
1
3
, 0, 0, 0 .
The right eigenvector corresponding to the λ = 1 eigenvalue is the equilibrium distribution. We therefore read off the first column of the R matrix:
1
1
1
1
1
1
1
(P eq )t = 12
6
12
6
6
6
6 .
Note that
g
Pieq = P i
j gj
,
where gj is the degeneracy of state j (see Tab. 2). Why is this so? It is because our random
choices guarantee that Yij gj = Yji
no sum on repeated indices). Now
Pgi for each i and j (i.e.P
sum this equation on j, and use j Yji = 1. We obtain j Yij gj = gi , which says that the
| g i is a right eigenvector of Y with eigenvalue 1. To P
obtain the equilibrium probability
distribution, we just have to normalize by dividing by j gj .
(4) Study carefully Spring 2010 Physics 210A homework problem 1.4. Consider a modification of this problem, where the Hamiltonian describing two point particles is
Ĥ =
P2
p2
+ 12 KX 2 +
+ 1 kx2 .
2M
2m 2
The particles undergo elastic collisions with a wall at x = 0, and with each other as well.
The particle of mass m is constrained always to lie between the wall and the other particle,
i.e. 0 ≤ x ≤ X. When the particles coincide, they undergo an elastic
pcollision. To analyze
this system, choose an energy scale E0 , a momentum scale P0 = M E0 , a length scale
p
X0 = E0 /K, and define the dimensionless variables r = m/M and s = k/K. Then in
rescaled variables
p̄2
+ 1 sx̄2 .
Ĥ = 21 P̄ 2 + 21 X̄ 2 +
2r 2
It is notationally convenient to drop the bars.
5
(a) The microcanonical distribution is
̺(P, p, X, x) =
δ(E − Ĥ) Θ(x) Θ(X − x)
Tr δ(E − Ĥ) Θ(x) Θ(X − x)
Find the microcanonical average hXi. Compute the corresponding average if the
particles are allowed to pass through one another.
(b) Compute the microcanonical average rate γ at which the mass m particle undergoes
collisions with the wall.
(c) Write a computer program to simulate this system. Compare microcanonical and
numerical averages for several sets of (r, s) values. Plot the Poincaré section of P
versus X for those times that x = 0.
Solution :
(a) We’ll need some Laplace transforms here. The Laplace transform of a function K(E) is
Z∞
b
K(β)
= dE K(E) e−βE .
0
The inverse Laplace transform is given by
K(E) =
c+i∞
Z
dβ b
K(β) eβE ,
2πi
c−i∞
where the integration contour, which is a line extending from β = c − i∞ to β = c + i∞,
b
lies to the right of any singularities of K(β)
in the complex β-plane. For this problem, all
we shall need is the following:
K(E) =
E t−1
Γ(t)
b
K(β)
= β −t .
⇐⇒
Consider first the density of states D(E) = Tr ̺(P, p, X, x). The Laplace transform is
Z∞
Z∞
ZX
Z∞
2 /2r
2 /2
2 /2
2
−βp
−βP
−βX
b
dp e
dX e
D(β)
= dP e
dx e−βsx /2
−∞
= 2π
r 1/2
s
−∞
tan−1
0
√ −2
s β .
Here we appeal to 6.285.1 from Gradshteyn and Ryzhik:
Z∞
2 2
tan−1 µ
dx erfc(x) e−µ x = √
,
πµ
0
6
0
where erfc(x) = 1 − erf(x), with
erf(x) =
√2
π
Zx
2
dt e−t .
0
One calls erf(x) the error function (or ’probability integral’), and erfc(x) the complementary
error function. The next integral we need is
b (β) =
N
Z∞
ZX
Z∞
Z∞
2
−βp2 /2r
−βX 2 /2
−βP 2 /2
dp e
dX X e
dx e−βsx /2
dP e
−∞
−∞
r 1/2
√
= 2 π 3/2
β −5/2 .
s+1
0
0
Here we have used Gradshteyn and Ryzhik 6.287.2,
Z∞
α
1
−µ2 x2
.
= 2 1− p
dx x erfc(αx) e
2µ
µ2 + α2
0
Taking the inverse transforms, we find
D(E) = 2π
r 1/2
s
Taking the ratio,
√ s E
tan−1
,
√ r 1/2 3/2
N (E) = 2 2 π
E
.
s+1
s 1/2 √2 E 1/2
·
hXi =
√ .
s+1
tan−1 s
Now suppose we ignore the constraint x ≤ X and let both X and x range freely over
[0, ∞). We then obtain
b 0 (β) =
D
Z∞
Z∞
Z∞
Z∞
2
−βP 2 /2
−βp2 /2r
−βX 2 /2
dP e
dp e
dX e
dx e−βsx /2
−∞
−∞
= 2π 2 β −2
r 1/2
s
0
0
β −2
and
b0 (β) =
N
Z∞
Z∞
Z∞
Z∞
2
−βX 2 /2
−βp2 /2r
−βP 2 /2
dx e−βsx /2
dX X e
dp e
dP e
−∞
= (2π)3/2 π
−∞
r 1/2
We then have
D0 (E) = 2π 2
s
β −5/2 .
r 1/2
s
0
0
E
,
N0 (E) = 25/2 π
7
r 1/2
s
E 3/2 ,
and therefore
hXi0 =
23/2 E 1/2
.
π
Note that the results for hXi and hXi0 agree in the limit s → ∞, where the potential
binding the x position to the origin is infinitely large.
(b) We compute the average Θ(v) v δ(x − 0+ ) = N (E)/D(E). The numerator is now
h
i
p
N (E) = Tr Θ(p) δ(x − 0+ ) δ(E − H) .
r
The Laplace transform of N (E) is
b (β) = 1
N
r
Z∞
Z∞
Z∞
2
2π
−βP 2 /2
−βp2 /2r
dP e
dp p e
dX e−βX /2 = 2 .
β
−∞
0
0
Thus, N (E) = 2πE and the average rate at which the lower particle bounces off x = 0 is
√
1
s
N (E)
√ .
=√ ·
γ(E) =
−1
D(E)
r tan ( s)
(c) Starting at a time t0 with X(t0 ) = X0 , P (t0 ) = P0 , x(t0 ) = x0 , and p(t0 ) = p0 , we
integrate the equations of motion, Ẋ = P , Ṗ = −X, ẋ = p/r, ṗ = −sx to obtain
p0
sin(νt)
µ
p(t) = p0 cos(νt) − µx0 sin(νt) ,
X(t) = X0 cos t + P0 sin t
x(t) = x0 cos(νt) +
P (t) = P0 cos t − X0 sin t
p
√
where µ ≡ sr and ν ≡ s/r. Two things can interrupt this continuous motion. First, we
can have x(t∗ ) = 0, where the lower particle collides with the floor. Solving for t∗ , we have
tan(νt) = −
µx0
.
p0
The smallest positive solution to this equation we call tb . The second thing that can happen
is that the particles collide, meaning x(t) = X(t):
X0 cos t + P0 sin t = x0 cos(νt) +
p0
sin(νt) .
µ
We call the smallest positive solution to this equation tc .
If tb < tc , then the collision with the floor (x = 0) happens first. We then set t0 to tb , and
we take X0′ = X(tb ), P0′ = P (tb ), x′0 = 0, and p′0 = −p(tb ).
8
If tc < tb , then the two particles collide first. We then set t0 to tc , and we apply the elastic
collision formulae:
P0′ =
2
1−r
P (tc ) +
p(tc )
1+r
1+r
p′0 =
2r
1−r
P (tc ) −
p(tc ) ,
1+r
1+r
with X0′ = x′0 = X(tc ) = x(tc ).
TO BE CONTINUED...
9
Download