Module 3 – Statistics I I. Basic Statistics

advertisement
Module 3 – Statistics I
I.
Basic Statistics
A.
Statistics and Physics
1.
Why Statistics
Up to this point, your courses in physics and engineering have considered systems
from a macroscopic point of view. For instance, we have described baseballs,
blocks, airplanes, etc. as rigid bodies.
In our discussion of the kinetic theory of gases, we demonstrated that the
macroscopic properties of a gas including temperature, pressure, and volume are
derived from the microscopic motion of the molecules that compose the gas.
Since all matter is composed of atoms, we should expect that this approach is
universal and would provide a greater understanding than just studying
macroscopic properties (the average of microscopic properties)! This particular
field of physics is called Statistical Mechanics since it combines mechanics
(classical or quantum) and statistics.
In statistical mechanics and quantum mechanics, we talk about calculating the
probability that a particle has some physical attribute. The attribute might be
energy, linear momentum, position, etc. In the discussion below, we will consider
the attribute to be position, but any attribute could be inserted. The reason for
choosing position is to simplify the text and because position is easier to visualize.
Humans are equipped with position detectors called eyes.
2.
Probability Distribution Function - Φ (x)
The probability distribution function is a function that determines the probability
that an object is located between x and x + dx. It is defined as the change in
probability function over the change in x and is most useful in dealing with
problems continuous physical quantities.
Φ (x) 
ΔP(x)
Δx
"Discrete Variable x"
Φ (x) 
dP(x)
dx
"Continuous Variable x"
3.
Calculating Probabilities Using the Probability Distribution Function
Given the probability distribution function, we can calculate the probability that a
particle is located in the region between xi and xf by
f
P (x1  x  x 2 )   Φ (x k ) Δx 
k i
f
 ΔP(x )
k
ki
"Discrete Variable x"
xf
P (x1  x  x 2 )   Φ (x) dx
"Continuous Variable x"
xi
4.
Normalization
If we search all possible locations, we will have a 100% probability of finding the
particle. Therefore, the sum/integrate of the probability distribution function over
all possible values of x must equal one!!

P (    x   )   P( xk )  1
k  
"Discrete Variable x"

P (    x   )   Φ(x) dx 1
"Continuous Variable x"

If the function (x) doesn't have this property then it is said to be unnormalized and can NOT be a probability distribution function! In order to
create a probability distribution function, we divided the function by the result
of the previous equation. This process is called normalization!
5.
Calculating the Average (Expectation) Value - x or  x 
The average location of the object can be calculated using

 x    x ΔP(x)

"Discrete"

 x    x Φ (x) dx

"Continuous"
In physics, the average value of a physical quantity is usually called its
"expectation value." This is because it is the value that is expected on average
from multiple measurements of the quantity even though no single measurement
may give this value. Consider a class in which half the students score 100 and the
other half score 0 on a test. The class average (expectation value) is 50% even
though no individual student had this result! Mathematicians call this type of
average value the "mean."
Standard Deviation () and Variance ( 2)
6.
Although the expectation value is important, it doesn't completely specify how a
system is behaving. For example, the average voltage out of a wall plug is zero
(no DC voltage). However, the standard deviation is 110 volts!! Obviously, the
standard deviation is important since this is what makes your TV, radio, and other
appliances work!
The variance and standard deviation tells us how much the location of the particle
will vary (ie the spread) on average as we make several measurements. Consider
the following graph. Both the red system and blue system have the same average.
x
Average
Measurement
From the definition of the average, we know that if we sum the distance between
each red data point and the average line it will add up to zero. Obviously, the
same is true for the blue data points! This is expressed mathematically by the
equation
"Definition of an Average"
 x  x 0
We can obtain a measure of the spread of the data by summing the square of the
distance between each data point and the average line. Since taking additional
data points will increase the sum even though the data points might be closer to
the average (less spread), we must divide by the number of data points. Thus, we
are finding the average of the square of the distance between the data points and
the average line. This is called the variance!
σ 2   (x  x) 2 
Since we have to calculate the expectation value to compute the variance, we find
the following formula more useful for computations:
σ2   x 2    x 2
From dimensional analysis, you should realize that the variance doesn't have the
same units as x. Thus, we need to take the square root of the variance in order to
obtain a quantity that can represent the spread of x. This quantity is called the
standard deviation!
σ  σ2   x 2    x 2
II.
Gaussian Distribution
A.
The Gaussian distribution is one of the more important probability distributions. It
finds application in a wide range of fields. It is some times called the normal or
standard distribution or the Bell curve. It is also sometimes refereed to as the
"drunken walk."
The Gaussian distribution is a special case of the discrete binomial distribution for
large numbers of trials, n, as long as the probability of success, p, is not too small
(see Appendix D of Rohlf). A common example of this condition occurs when the
physical quantity that is being measured depends on the "sum" of a set of large
random numbers.
Start
x
The displacement of a drunk undergoing a random walk is the sum of several
random steps. The drunk should have a much greater probability for small
displacements where his/her individual steps cancel each other than for large
displacements where more of the steps must be in the same direction!
B.
Formula
1 e  (x  a)2 /(2σ2 )
2 π σ2
Φ (x) 
C.
Average
 x a
D.
Standard Deviation
Standard Deviation  σ
Graph
Gaussian Distribution
0.07
0.06
0.05
Probability
E.
0.04
0.03
0.02
0.01
0
0
5
10
15
20
25
x
30
35
40
45
50
The solid green line shows that:
1) the "most probable" value (peak) is at x = 25
2) the median (50% level) is at x = 25
3) the mean is at x = 25
The standard deviation for this distribution is 6.
The dashed red lines (inner pair) show the region where x -  < x < x + . The
probability that a particle is located in this region is 0.683 (68.3%).
The dashed purple lines (outer pair) show the region where x - 2 < x < x + 2.
The probability that a particle is located in this region is 0.95 (95%).
Download