PHYSICS 210A : STATISTICAL PHYSICS HW ASSIGNMENT #1 SOLUTIONS (1) Let p(x) = Pr[X = x] where X is a discrete random variable and both X and x are taken from an ‘alphabet’ X . Let p(x, y) be a normalized joint probability distribution on two Prandom variables X ∈ X and Y ∈ Y. The entropy of the joint distribution is S(X, Y ) = − x,y p(x, y) log p(x, y). The conditional probability p(y|x) for y given x is defined as P p(y|x) = p(x, y)/p(x), where p(x) = y p(x, y). (a) Show that the conditional entropy S(Y |X) = − P x,y p(x, y) log p(y|x) satisfies S(Y |X) = S(X, Y ) − S(X) . Thus, conditioning reduces the entropy, and the entropy of a pair of random variables is the sum of the entropy of one plus the conditional entropy of the other. (b) The mutual information I(X, Y ) is I(X, Y ) = X x,y p(x, y) p(x, y) log p(x) p(y) . Show that I(X, Y ) = S(X) + S(Y ) − S(X, Y ) . (c) Show that S(Y |X) ≤ S(Y ) and that the equality holds only when X and Y are independently distributed. Solution : (a) We have S(Y |X) = − =− since P y X x,y X p(x, y) p(x, y) log p(x) p(x, y) log p(x, y) + x,y X p(x, y) log p(x) x,y = S(X, Y ) − S(X) , p(x, y) = p(x). (b) Clearly I(X, Y ) = X x,y = X x,y p(x, y) log p(x, y) − log p(x) − log p(y) p(x, y) log p(x, y) − X x = S(X) + S(Y ) − S(X, Y ) . 1 p(x) log p(x) − X y p(y) log p(y) (c) Let’s introduce some standard notation1 . Denote the average of a function of a random variable as X Ef (X) = p(x)f (x) . x∈X Equivalently, we could use the familiar angular bracket notation for averages and write Ef (X) as f (X) . For averages over several random variables, we have Ef (X, Y ) = XX p(x, y) f (x, y) , x∈X y∈Y et cetera. Now here’s a useful fact. If f (x) is a convex function, then Ef (X) ≥ f (EX) . For continuous functions, f (x) is convex if f ′′ (x) ≥ 0 everywhere2 . If f (x) is convex on some interval [a, b], then for x1,2 ∈ [a, b] we must have f λ x1 + (1 − λ) x2 ≤ λ f (x1 ) + (1 − λ) f (x2 ) . This is easily generalized to f X n where P n pn X pn f (xn ) , pn xn ≤ n = 1, which proves our claim - a result known as Jensen’s theorem. Now log x is concave, hence − log x is convex, and therefore X p(X) p(Y ) p(x) p(y) = −E log p(x, y) log I(X, Y ) = − p(x, y) p(X, Y ) x,y X p(X) p(Y ) p(x) p(y) ≥ − log E p(x, y) · = − log p(X, Y ) p(x, y) x,y X X p(x) · p(y) = − log(1) = 0 . = − log x y So I(X, Y ) ≥ 0. Clearly I(X, Y ) = 0 when X and Y are independently distributed, i.e. when p(x, y) = p(x) p(y). Using the results from part (b), we then have I(X, Y ) = S(X) + S(Y ) − S(X, Y ) = S(Y ) − S(Y |X) ≥ 0 . 1 2 See e.g. T. M. Cover and J. A. Thomas, Elements of Information Theory , 2nd edition (Wiley, 2006). A concave function g(x) is one for which −g(x) is convex. 2 (2) Compute the information entropy in the Fall 2012 Physics 140A grade distribution. See http://www-physics.ucsd.edu/students/courses/fall2012/physics140a/index.html. Solution : P n Nn = 49 Nn −pn log2 pn A+ 2 0.188 A 10 0.468 A3 0.247 B+ 8 0.427 B 11 0.484 B2 0.188 C+ 7 0.401 C 1 0.115 C0 0 D 4 0.295 Table 1: F12 Physics 140A final grade distribution. Assuming the only possible grades are A+, A, A-, B+, B, B-, C+, C, C-, D, F (11 possibilities), then from the chart we produce the entries in Tab. 1. We then find S=− For maximum information, set pn = 11 X pn log2 pn = 2.93 bits n=1 1 11 for all n, whence Smax = log2 11 = 3.46 bits. (3) Study carefully Fall 2011 Physics 140A homework problem 2.2. Suppose I have three bags. Initially, bag #1 contains a quarter, bag #2 contains a dime, and bag #3 contains two nickels. At each time step, I choose two bags randomly and Prandomly exchange one coin from each bag. The time evolution satisfies Pi (t + 1) = j Yij Pj (t), where Yij = P (i , t + 1 | j , t) is the conditional probability that the system is in state i at time t + 1 given that it was in state j at time t. (a) How many configurations are there for this system? P (b) Construct the transition matrix Yij and verify that i Yij = 1. (c) Find the eigenvalues of Y (you may want to use something like Mathematica). (d) Find the equilibrium distribution Pieq . Solution : (a) There are seven possible configurations for this system, shown in Table 2 below. 3 F 1 0.115 bag 1 bag 2 bag 3 g 1 Q D NN 1 2 Q N DN 2 3 D Q NN 1 4 D N QN 2 5 N Q DN 2 6 N D QN 2 7 N N DQ 2 Table 2: Configurations and their degeneracies for problem 3. (b) The transition matrix is 0 1 3 1 3 Y = 0 0 1 3 0 1 6 1 3 0 0 1 6 0 1 6 0 0 1 6 1 6 0 1 3 0 1 6 1 6 0 1 6 1 3 1 6 0 1 3 1 3 1 3 0 1 6 1 6 0 0 1 3 1 6 1 6 1 6 0 1 6 1 6 1 6 0 1 6 1 6 1 6 1 3 (c) Interrogating Mathematica, I find the eigenvalues are λ1 = 1 , λ2 = − 23 , λ3 = 1 3 , λ4 = 1 3 , λ5 = λ6 = λ7 = 0 . (d) We may decompose Y into its left and right eigenvectors, writing Y = 7 X λa | Ra ih La | Yij = 7 X λa Ria Laj a=1 a=1 The full matrix of left (row) eigenvectors is 1 1 1 1 1 1 1 −2 1 2 −1 −1 1 0 −1 0 −1 0 0 0 1 L= 0 −1 0 1 −1 1 0 1 −1 1 −1 0 0 1 1 0 −1 −1 0 1 0 −1 −1 1 0 1 0 0 4 The corresponding matrix of right (column) eigenvectors is 2 −3 −6 0 4 1 4 3 0 −6 −4 −1 2 3 −6 0 4 −5 1 R= 4 −3 0 6 −4 −7 24 4 −3 0 −6 −4 5 4 3 0 6 −4 11 4 0 12 0 8 −4 −5 −7 1 −1 11 5 −4 Thus, we have RL = LR = I, i.e. R = L−1 , and Y = RΛL , with Λ = diag 1 , − 32 , 1 3 , 1 3 , 0, 0, 0 . The right eigenvector corresponding to the λ = 1 eigenvalue is the equilibrium distribution. We therefore read off the first column of the R matrix: 1 1 1 1 1 1 1 (P eq )t = 12 6 12 6 6 6 6 . Note that g Pieq = P i j gj , where gj is the degeneracy of state j (see Tab. 2). Why is this so? It is because our random choices guarantee that Yij gj = Yji no sum on repeated indices). Now Pgi for each i and j (i.e.P sum this equation on j, and use j Yji = 1. We obtain j Yij gj = gi , which says that the | g i is a right eigenvector of Y with eigenvalue 1. To P obtain the equilibrium probability distribution, we just have to normalize by dividing by j gj . (4) Study carefully Spring 2010 Physics 210A homework problem 1.4. Consider a modification of this problem, where the Hamiltonian describing two point particles is Ĥ = P2 p2 + 12 KX 2 + + 1 kx2 . 2M 2m 2 The particles undergo elastic collisions with a wall at x = 0, and with each other as well. The particle of mass m is constrained always to lie between the wall and the other particle, i.e. 0 ≤ x ≤ X. When the particles coincide, they undergo an elastic pcollision. To analyze this system, choose an energy scale E0 , a momentum scale P0 = M E0 , a length scale p X0 = E0 /K, and define the dimensionless variables r = m/M and s = k/K. Then in rescaled variables p̄2 + 1 sx̄2 . Ĥ = 21 P̄ 2 + 21 X̄ 2 + 2r 2 It is notationally convenient to drop the bars. 5 (a) The microcanonical distribution is ̺(P, p, X, x) = δ(E − Ĥ) Θ(x) Θ(X − x) Tr δ(E − Ĥ) Θ(x) Θ(X − x) Find the microcanonical average hXi. Compute the corresponding average if the particles are allowed to pass through one another. (b) Compute the microcanonical average rate γ at which the mass m particle undergoes collisions with the wall. (c) Write a computer program to simulate this system. Compare microcanonical and numerical averages for several sets of (r, s) values. Plot the Poincaré section of P versus X for those times that x = 0. Solution : (a) We’ll need some Laplace transforms here. The Laplace transform of a function K(E) is Z∞ b K(β) = dE K(E) e−βE . 0 The inverse Laplace transform is given by K(E) = c+i∞ Z dβ b K(β) eβE , 2πi c−i∞ where the integration contour, which is a line extending from β = c − i∞ to β = c + i∞, b lies to the right of any singularities of K(β) in the complex β-plane. For this problem, all we shall need is the following: K(E) = E t−1 Γ(t) b K(β) = β −t . ⇐⇒ Consider first the density of states D(E) = Tr ̺(P, p, X, x). The Laplace transform is Z∞ Z∞ ZX Z∞ 2 /2r 2 /2 2 /2 2 −βp −βP −βX b dp e dX e D(β) = dP e dx e−βsx /2 −∞ = 2π r 1/2 s −∞ tan−1 0 √ −2 s β . Here we appeal to 6.285.1 from Gradshteyn and Ryzhik: Z∞ 2 2 tan−1 µ dx erfc(x) e−µ x = √ , πµ 0 6 0 where erfc(x) = 1 − erf(x), with erf(x) = √2 π Zx 2 dt e−t . 0 One calls erf(x) the error function (or ’probability integral’), and erfc(x) the complementary error function. The next integral we need is b (β) = N Z∞ ZX Z∞ Z∞ 2 −βp2 /2r −βX 2 /2 −βP 2 /2 dp e dX X e dx e−βsx /2 dP e −∞ −∞ r 1/2 √ = 2 π 3/2 β −5/2 . s+1 0 0 Here we have used Gradshteyn and Ryzhik 6.287.2, Z∞ α 1 −µ2 x2 . = 2 1− p dx x erfc(αx) e 2µ µ2 + α2 0 Taking the inverse transforms, we find D(E) = 2π r 1/2 s Taking the ratio, √ s E tan−1 , √ r 1/2 3/2 N (E) = 2 2 π E . s+1 s 1/2 √2 E 1/2 · hXi = √ . s+1 tan−1 s Now suppose we ignore the constraint x ≤ X and let both X and x range freely over [0, ∞). We then obtain b 0 (β) = D Z∞ Z∞ Z∞ Z∞ 2 −βP 2 /2 −βp2 /2r −βX 2 /2 dP e dp e dX e dx e−βsx /2 −∞ −∞ = 2π 2 β −2 r 1/2 s 0 0 β −2 and b0 (β) = N Z∞ Z∞ Z∞ Z∞ 2 −βX 2 /2 −βp2 /2r −βP 2 /2 dx e−βsx /2 dX X e dp e dP e −∞ = (2π)3/2 π −∞ r 1/2 We then have D0 (E) = 2π 2 s β −5/2 . r 1/2 s 0 0 E , N0 (E) = 25/2 π 7 r 1/2 s E 3/2 , and therefore hXi0 = 23/2 E 1/2 . π Note that the results for hXi and hXi0 agree in the limit s → ∞, where the potential binding the x position to the origin is infinitely large. (b) We compute the average Θ(v) v δ(x − 0+ ) = N (E)/D(E). The numerator is now h i p N (E) = Tr Θ(p) δ(x − 0+ ) δ(E − H) . r The Laplace transform of N (E) is b (β) = 1 N r Z∞ Z∞ Z∞ 2 2π −βP 2 /2 −βp2 /2r dP e dp p e dX e−βX /2 = 2 . β −∞ 0 0 Thus, N (E) = 2πE and the average rate at which the lower particle bounces off x = 0 is √ 1 s N (E) √ . =√ · γ(E) = −1 D(E) r tan ( s) (c) Starting at a time t0 with X(t0 ) = X0 , P (t0 ) = P0 , x(t0 ) = x0 , and p(t0 ) = p0 , we integrate the equations of motion, Ẋ = P , Ṗ = −X, ẋ = p/r, ṗ = −sx to obtain p0 sin(νt) µ p(t) = p0 cos(νt) − µx0 sin(νt) , X(t) = X0 cos t + P0 sin t x(t) = x0 cos(νt) + P (t) = P0 cos t − X0 sin t p √ where µ ≡ sr and ν ≡ s/r. Two things can interrupt this continuous motion. First, we can have x(t∗ ) = 0, where the lower particle collides with the floor. Solving for t∗ , we have tan(νt) = − µx0 . p0 The smallest positive solution to this equation we call tb . The second thing that can happen is that the particles collide, meaning x(t) = X(t): X0 cos t + P0 sin t = x0 cos(νt) + p0 sin(νt) . µ We call the smallest positive solution to this equation tc . If tb < tc , then the collision with the floor (x = 0) happens first. We then set t0 to tb , and we take X0′ = X(tb ), P0′ = P (tb ), x′0 = 0, and p′0 = −p(tb ). 8 If tc < tb , then the two particles collide first. We then set t0 to tc , and we apply the elastic collision formulae: P0′ = 2 1−r P (tc ) + p(tc ) 1+r 1+r p′0 = 2r 1−r P (tc ) − p(tc ) , 1+r 1+r with X0′ = x′0 = X(tc ) = x(tc ). TO BE CONTINUED... 9