Probability Distributions 8 1 Probability Distributions and Statistics The Maxent Principle All macroscopic systems are far too complex to be fully specified,* but usually we can expect the system to have a few well defined average properties. Statistical mechanics can be characterized as the art of averaging microscopic properties to obtain quantities observed in the macroscopic world. Probability distributions {pi} are used to effect these averages. Here we show how to find the probability pi that a system is in state i from the information entropy expression, S k pi ln pi . Averages Consider a system with possible states i 1, 2,, N known to have a quantity Ei associated with each state that contributes to a system average E. We want to show that this average is given by the expression E pi Ei . (1) i Suppose there are G1 occurrences of E1, G2 occurrences of E2, and so on. Then the average is G E G2 E2 G N E N E 1 1 G G G G G where 1 2 N . However, we assign G pi i G and thereby obtain Eq.(1). 1. A two-state system has energy levels of 1.0 eV and 2.0 eV. The probability that the system is in the lower energy state is ¾. Find the average energy of the system. The Maxent Principle The best “guess” or statistical inference that we can make about a system is to require that we (i) adhere to the known facts and (ii) include no spurious information. (One author calls this “the truth, and nothing but the truth.”) * It is impossible even theoretically to fully specify positions and momenta due to the Heisenberg uncertainty principle. Probability Distributions 2 The known facts are usually averages expressed by constraint equations like Eq.(1). We assure that no spurious information is included by maximizing the missing information S. This is the Maxent or maximum entropy principle. Symbolically, the best statistical inference follows from d S p1 , p2 , p N 0 constrained by averages, pi 1 pi i1 1 (2) pi i N N 2. Find the best statistical inference for a system where we know only that the system must be in one of two states (that is, p1 p2 1 ). [ans. ½ , ½ ] 3. (a) Find the best statistical inference for a system that we know has a well defined average energy but at any moment it must be in one of two energy states (that is, p1 p2 1 and p1 E1 p2 E2 E ). (b) Consider a case where E1= –1 and E2= +1 and find expressions for p1 and p2 . Use a computer to find the appropriate undetermined multiplier given that E 0.7616 and use it to evaluate p1. [ans. 0.88] 4. Find the best statistical inference for a system that we know has a well defined standard deviation in energy but at any moment must be in one of two energy states (that is, p1 p2 1 and p1 ( E1 E ) 2 p2 ( E2 E ) 2 2 ). The last three exercises suggest three widely used probability distributions; the equiprobable distribution, the canonical distribution, and the normal distribution. Equiprobable Distribution Consider the case where there are N alternatives with respective probabilities p1 , p2 , , p N . The only thing we know is that the system must be in one of these states so (3) pi 1 i Now insist that missing information is maximized under this condition. We have S S k 1 pi 1 i with Probability Distributions 3 S k pi ln pi . The peculiar choice of multiplier k(+1) is the result of hindsight. It just turns out neater this way. Maximizing S’ with respect to arbitrary pj gives S k ln p j k k 1 0 p j p j e The result is the same for all pj so substituting into the constraint equation (3) gives the equiprobable distribution, (4) 1 pj for N states N 5. Derive the equiprobable distribution using information theory. The result justifies that, in the absence of any information, equiprobable states are the best determination. The equiprobable distribution applies when a system does not exchange energy or particles with the environment. This is refered to as a microcanonical distribution. Distribution with a Known Mean By far the most important probability distribution in statistical mechanics is the Canonical Distribution where the system possesses a well defined average energy while it continually exchanges some energy with its surroundings. The canonical distribution is a special case of a distribution with one known mean. The basic problem is to find the distribution of p’s that maximizes missing information S subject to the constraints pi 1 (5a) i pi Ei E (5b) i For convenience, we use undetermined multipliers k 1 and k and write S S k 1 pi 1 k pi Ei E i i Maximizing S gives p j exp E j . (6) Probability Distributions 4 In principle, the Lagrange multipliers can be determined from the two constraint equations (5). From (5a) we find 1 e e Ei i and Eq.(6) becomes pj with e E j (7) Z Z e Ei (8) i The quantity Z plays a very prominent role in statistical mechanics where it is called the partition function. (Note that we are leaving unspecified). 6. Use the information approach to derive the probability distribution for one known mean quantity E . Identify for Thermodynamics The canonical distribution, Eq. (7), connects with thermodynamics only when we identify k as Boltzmann’s constant and the Lagrange multiplier as 1 kT . We saw that k had to be Boltzmann’s constant to agree with thermodynamics. The identification of can be seen in two steps: (i) evaluate entropy S with the canonical distribution and (ii) demand that the result for dS is equivalent to the thermodynamic relation (9) dU TdS dW where U is the more usual notation for E in thermodynamics. The algebra is made simple by defining a quantity F such that exp F Z and S k pi ln pi = k pi F Ei = k F U . Assume constant temperature and write the differential dS: dU 1 dS dF k Comparing the latter with Eq.(9) determines that 1 as required and, incidentally, dF kT is seen as a form of work. 7. Repeat the development given above to identify for a system with only two levels, E1 and E2. Probability Distributions 5 On Continuous Distributions Until now we considered discrete sets of probabilities. Here we discuss how to accommodate a continuous range of possibilities. Suppose that a system state has an associated continuous value x. Since there are an infinite number of possible values or outcomes, there is no chance of an perfectly accurate specific outcome x x0 . It makes more sense to speak of the system being in a neighborhood dx around x. In particular, we define a probability density x such that xdx probability of being in neighborhood dx around x The continuous version of the normalization condition (5a) becomes x dx 1 and an average over x analogous to Eq.(5b) is written x x dx x . (10) (11) The last two equations are the constraints for a continuum version of the canonical distribution. Entropy becomes (12) S k x ln x0 x dx where x0 is the hypothetical smallest measurable increment of x. It is included for dimensional consistency, but does not enter any results. In the following section we use a continuum analysis to derive the ubiquitous normal distribution Normal Distribution Information theory produces the normal distribution for a continuous system with a standard deviation around an average value x . The standard deviation is given by 2 x x x dx 2 (13) This is a measure of the average spread from the mean. We construct S from Eq.(12) and the constraint equations, x x dx . S S k 1 dx 1 k Maximizing with respect to gives x e x0 2 2 e x x const e x x . 2 2 The constants are determined from Eqs.(10) and (13). These give the normal or Gaussian distribution: Probability Distributions 1 x e 2 6 x x 2 22 (14) x/ 8. Use the information approach to derive the normal probability distribution. 9. A cohort of U.S. males have an average height of 5’10” with a standard deviation of 3”. Find the percentage of these men with heights between 5’7” and 5’9”. (Use a table of the normal distribution.) 10. A sample of the population has an average pulse rate of 70 beats/min with a standard deviation of 10 beats/min. Find the percent of the population with pulse rate (a) above 80 beats/min, (b) between 75 and 80 beats/min. ”. (Use a table of the normal distribution.)