Module 3 – Statistics I I. Basic Statistics A. Statistics and Physics 1. Why Statistics Up to this point, your courses in physics and engineering have considered systems from a macroscopic point of view. For instance, we have described baseballs, blocks, airplanes, etc. as rigid bodies. In our discussion of the kinetic theory of gases, we demonstrated that the macroscopic properties of a gas including temperature, pressure, and volume are derived from the microscopic motion of the molecules that compose the gas. Since all matter is composed of atoms, we should expect that this approach is universal and would provide a greater understanding than just studying macroscopic properties (the average of microscopic properties)! This particular field of physics is called Statistical Mechanics since it combines mechanics (classical or quantum) and statistics. In statistical mechanics and quantum mechanics, we talk about calculating the probability that a particle has some physical attribute. The attribute might be energy, linear momentum, position, etc. In the discussion below, we will consider the attribute to be position, but any attribute could be inserted. The reason for choosing position is to simplify the text and because position is easier to visualize. Humans are equipped with position detectors called eyes. 2. Probability Distribution Function - Φ (x) The probability distribution function is a function that determines the probability that an object is located between x and x + dx. It is defined as the change in probability function over the change in x and is most useful in dealing with problems continuous physical quantities. Φ (x) ΔP(x) Δx "Discrete Variable x" Φ (x) dP(x) dx "Continuous Variable x" 3. Calculating Probabilities Using the Probability Distribution Function Given the probability distribution function, we can calculate the probability that a particle is located in the region between xi and xf by f P (x1 x x 2 ) Φ (x k ) Δx k i f ΔP(x ) k ki "Discrete Variable x" xf P (x1 x x 2 ) Φ (x) dx "Continuous Variable x" xi 4. Normalization If we search all possible locations, we will have a 100% probability of finding the particle. Therefore, the sum/integrate of the probability distribution function over all possible values of x must equal one!! P ( x ) P( xk ) 1 k "Discrete Variable x" P ( x ) Φ(x) dx 1 "Continuous Variable x" If the function (x) doesn't have this property then it is said to be unnormalized and can NOT be a probability distribution function! In order to create a probability distribution function, we divided the function by the result of the previous equation. This process is called normalization! 5. Calculating the Average (Expectation) Value - x or x The average location of the object can be calculated using x x ΔP(x) "Discrete" x x Φ (x) dx "Continuous" In physics, the average value of a physical quantity is usually called its "expectation value." This is because it is the value that is expected on average from multiple measurements of the quantity even though no single measurement may give this value. Consider a class in which half the students score 100 and the other half score 0 on a test. The class average (expectation value) is 50% even though no individual student had this result! Mathematicians call this type of average value the "mean." Standard Deviation () and Variance ( 2) 6. Although the expectation value is important, it doesn't completely specify how a system is behaving. For example, the average voltage out of a wall plug is zero (no DC voltage). However, the standard deviation is 110 volts!! Obviously, the standard deviation is important since this is what makes your TV, radio, and other appliances work! The variance and standard deviation tells us how much the location of the particle will vary (ie the spread) on average as we make several measurements. Consider the following graph. Both the red system and blue system have the same average. x Average Measurement From the definition of the average, we know that if we sum the distance between each red data point and the average line it will add up to zero. Obviously, the same is true for the blue data points! This is expressed mathematically by the equation "Definition of an Average" x x 0 We can obtain a measure of the spread of the data by summing the square of the distance between each data point and the average line. Since taking additional data points will increase the sum even though the data points might be closer to the average (less spread), we must divide by the number of data points. Thus, we are finding the average of the square of the distance between the data points and the average line. This is called the variance! σ 2 (x x) 2 Since we have to calculate the expectation value to compute the variance, we find the following formula more useful for computations: σ2 x 2 x 2 From dimensional analysis, you should realize that the variance doesn't have the same units as x. Thus, we need to take the square root of the variance in order to obtain a quantity that can represent the spread of x. This quantity is called the standard deviation! σ σ2 x 2 x 2 II. Gaussian Distribution A. The Gaussian distribution is one of the more important probability distributions. It finds application in a wide range of fields. It is some times called the normal or standard distribution or the Bell curve. It is also sometimes refereed to as the "drunken walk." The Gaussian distribution is a special case of the discrete binomial distribution for large numbers of trials, n, as long as the probability of success, p, is not too small (see Appendix D of Rohlf). A common example of this condition occurs when the physical quantity that is being measured depends on the "sum" of a set of large random numbers. Start x The displacement of a drunk undergoing a random walk is the sum of several random steps. The drunk should have a much greater probability for small displacements where his/her individual steps cancel each other than for large displacements where more of the steps must be in the same direction! B. Formula 1 e (x a)2 /(2σ2 ) 2 π σ2 Φ (x) C. Average x a D. Standard Deviation Standard Deviation σ Graph Gaussian Distribution 0.07 0.06 0.05 Probability E. 0.04 0.03 0.02 0.01 0 0 5 10 15 20 25 x 30 35 40 45 50 The solid green line shows that: 1) the "most probable" value (peak) is at x = 25 2) the median (50% level) is at x = 25 3) the mean is at x = 25 The standard deviation for this distribution is 6. The dashed red lines (inner pair) show the region where x - < x < x + . The probability that a particle is located in this region is 0.683 (68.3%). The dashed purple lines (outer pair) show the region where x - 2 < x < x + 2. The probability that a particle is located in this region is 0.95 (95%).