Or Z

advertisement
Continuous Probability
Distributions:
The Normal Distribution
Normal Dist1
Towards the Meaning of Continuous Probability
Distribution Functions:
When we introduced probabilities, we spoke of
discrete events:
S = collection of all possible sample points ei
0  P(ei)  1
 Probability of any event is
between zero and one
P(ei) = 1
 Probability of all elementary
events sum to 1 (something
happens)
Normal Dist2
In particular, for the binomial distribution:
For the random variable X:
• x stands for a particular value
0  P[ X  x]  1  The probability that the random
variable X takes the value x is
between 0 and 1, inclusive.
 P[ X  x]  1
all x
 The sum of the probabilities
over all possible values of x is
1.
Normal Dist3
A continuous variable has infinitely many possible
values:
With infinitely many possible values, the probability of
observing any one particular value is essentially zero:
[Pr(X=x)] = 0
e.g., for x=1.0 vs 1.02
vs 1.0195
vs 1.01947, …
Pr(X=x) is meaningless for a continuous random
variable – Instead, we consider a range of values for
X:
Pr(aX b)
We can make this range quite broad or very narrow
Normal Dist4
Comparing Probability Distributions for Discrete vs
Continuous Random Variables
We need new notation to describe probability
distributions for continuous variables.
Discrete
Continuous
List all possible
sample points,
e.g.,
State the range of of
possible values of X;
e.g.,
S={ei}, i=1 to k.
 to 
0 to 
Note:  is the symbol
for ‘infinity’
 to 0
Normal Dist5
For a continuous Random Variable, X,
• P(X=x) = 0
• Instead, we compute the probability of X
within some interval:
b
P[a  X  b]   f x ( x )dx
a
This function is the probability density
function of X.
Don’t worry – if you don’t know or have forgotten calculus, I
won’t be asking you to work with this notation.
Normal Dist6
Much of statistical inference is based upon a
particular choice of a probability density function,
fx(x) –
The Normal distribution.
•
This function is a mathematical model
describing one particular pattern of variation
of values.
•
It is appropriate for continuous variables
only.
Normal Dist7
Practically speaking, the normal distribution
function is appropriate for:
•
Many phenomena that occur naturally.
•
Special cases of other phenomena.
e.g., averages of phenomena that,
individually are not normally distributed.
For example, the sampling distribution of means
may follow a normal distribution even when the
underlying data do not.
Normal Dist8
The Normal Probability Density Function
1
f x ( x) 
e
 2
 ( x   )2 

2 
 2 
Features to note:
 The range of X is –  to 
  is the mathematical constant 3.14159…
 e is the mathematical constant 2.71828…
Normal Dist9
The Normal Probability Density Function
1
f x ( x) 
e
 2
 ( x   )2 

2 
 2 
Features to note:
  is the mean of the distribution
  is the standard deviation of the distribution
 2 is the variance
 (x – )2 the squared deviation from the mean
appears in the function
Normal Dist10
Notation:
X ~ N(,2)
We say
“X follows a Normal Distribution with mean  and
variance 2 ”
or
“X is Normally distributed with mean  and
variance 2 ”
Normal Dist11
A Picture of the Normal Distribution
fx x 

x
The infamous “Bell-shaped Curve”
Normal Dist12
There are infinitely many normal distributions,
each determined by different values of  and 2.
The Shape of the Normal Distribution is
characteristically
• Smooth
• Defined everywhere on the real axis
• Bell-shaped
• Symmetric about the mean = 
(it is defined in terms of deviations about
the mean)
Normal Dist13
fx x 

x
The area under the curve represents probability, and
the total area under the curve = 1

1

Pr[  X  ]  
e

  2


 ( x   )2 

2 
 2 

dx  1


Normal Dist14
Pr[X < x]

x
The area under the curve up to the value x is often
represented by the notation:
( x)  Pr[  X  x]  Pr[ X  x]
Normal Dist15
A Feeling for the Shape of the Normal distribution:

locates the center, and

measures the spread


Normal Dist16
IF  alone is changed – by adding a constant c,
• the entire curve is shifted in location
• but the shape remains the same.

 c
Normal Dist17
IF  alone is changed – by multiplying by a constant c
• the shape of the bell is changed
• a larger variance implies a wider spread (or
flatter curve) – the area under the curve is
always 1

c

Normal Dist18
Picturing the Normal Probability Density
x
As the variance, 2, increases:
•
Bell flattens (gets wide)
•
Values close to the mean are less likely
• Values farther from the mean more likely.
As the variance decreases:
• Bell narrows
• Most values are close to the mean
• Values close to the mean are more likely
Normal Dist19
A Very Handy Rough Rule of Thumb:
If X follows a Normal Distribution
Then: ~68% of the values of X are in the
interval   
68%



Normal Dist20
If X follows a Normal Distribution
Then: ~95% of the values of X are in the interval
  1.96
  1.96

  1.96
~99% of the values of X are in the interval
  2.576
  2.58

  2.58
Normal Dist21
Why is the Normal Distribution So Important?
There are two types of data that follow a normal
distribution:
1. A number of naturally occurring phenomena:
For example :
•
heights of men (or women)
•
total blood cholesterol of adults
2. Special functions of some non-normally
distributed phenomena, in particular sums and
averages:
The sampling distribution of sample means tends
to be ~ Normal.
Normal Dist22
Research often focuses on sample means
Example: Blood pressure can vary with time of day,
stress, food, illness, etc. One reading may not be
a good representation of “typical”
Distribution of a single reading of blood pressure
for an individual
– tends to be skewed, with a few high values
Normal Dist23
To have a better gauge of an individual’s BP, we
might use the average of 5 readings:
Sampling Distribution of mean of 5 readings for an
individual
– tends to be ~ Normal, even when the original
distribution is not
Normal Dist24
A Feeling for the Central Limit Theorem.
• Shake a pair of die.
• On each roll, note the total of the two die
faces.
• This total can range from 2 to 12.
• The most likely total is 7. (Why?)
• How often do the other totals arise?
Histogram of die totals for n=100 trials of rolling die pair
2
3
4
5
6
7
8
9
10
11
12
Normal Dist25
Histogram of die totals for n=1000 trials of rolling die pair
2
3
4
5
6
7
8
9
10
11
12
As the sample size n increases the distribution
of the sum of the 2 die begins to look more and
more normal.
Normal Dist26
A Statement of the Central Limit Theorem:
For any population with
• mean  and finite variance 2,
• the sampling distribution of means, x,
• from samples of size n from this population,
• will be approximately normally distributed
• with mean ,
• and variance 2/n,
• for n large.
That is, for n large, and X ~ ?? (, 2)
then
Xn ~ N (, 2/n)
Normal Dist27
This is the main reason for our interest in the
normal distribution:
• regardless of the underlying distribution
• if we take a large enough sample
• we can make probability statements
about means from such samples
• based upon the normal distribution.
This is true, even when the underlying
distribution is discrete.
Normal Dist28
Example: The Central Limit Theorem Works
even for VERY non-normal data:
A population has only 3 outcomes in it:
1
2
9
P(X=x)
1/3
1
2
mean of { 1,2,9 }: =4
9
X
sum of {1,2,9}=12
standard deviation of {1,2,9}
=3.6
Normal Dist29
Experiment: Take sample of size n with
replacement. Compute sum of all n. Repeat…
Look at Sampling Distribution of Sums
n=25
n=50
n=100
Normal Dist30
To compute probabilities for a normal distribution.
• Recall that we are looking at intervals of
values of the random variable, X.
• The probability that X has a value in the
interval between a and b is the area under the
curve corresponding to that interval:
a
b

b
Pr(a  X  b)   f x ( x)dx
a
Note: since
Pr(X=a) or any
exact value is
zero, this can be
written as
Pr(aXb) or
Pr(a<X<b)
Normal Dist31
The symmetry of the normal distribution can
also help in computing probabilities.
• The normal distribution is symmetric about
the mean µ.
• This tells us that the probability of a value
less than the mean is .5 or 50%,
• and the probability of a value greater than
the mean is also .5 or 50%
Pr( X   ) 


f x ( x)dx  0.5

0.5
0.5

Normal Dist32
The Standard Normal Distribution
The standard normal distribution is just one of
infinitely many possible normal distributions.
It has
mean:  = 0
variance: 2 = 1
=1
=0
By convention we let the letter Z represent a
random variable that is distributed Normally with
=0 and 2=1:
Z ~ N(0,1)
Normal Dist33
The standard normal distribution is important for
several reasons:
• Probabilities of Z within any interval have
been computed and tabulated.
• It is possible to look up Pr(a  Z  b) for any
values of a and b in such tables.
• Any other normal distribution can be
transformed to a standard normal for
computing probabilities.
• Distances from the mean are equivalent to
number of standard deviations from the mean.
This last is perhaps of greatest interest to us, now that
software does much of the transformation and
computation for us.
Normal Dist34
Table 3 in the Appendix of Rosner gives areas under
the normal curve, in 4 different ways:
• Column A gives values between – and z,
where z is a particular value of the standard
normal distribution.
(Note: Rosner uses X rather than Z)
That is, column A gives values for
Pr(–   Z  z) = Pr(Z  z)
z is also known as a standard normal deviate.
Pr[Z < z]

0
z
Normal Dist35
Table 3 in the Appendix of Rosner:
• Column B gives values between z and 
Pr(z  Z  ) = Pr(z  Z) = Pr(Z z)
0 z
• Column C gives values between 0 and z
Pr(0  Z  z)
0 z
• Column D gives values between -z and z
Pr(-z  Z  z)
-z
0
z
Normal Dist36
A probability calculation for any random variable,
X~Normal (,2) can be re- expressed as an
equivalent probability calculation for a standard
Normal (0,1).
This is nice because
• we have tables for probabilities of the Normal
(0,1) distribution.
• We can interpret probabilities in terms of # of
std deviations from the mean
Of course, we can also use computer programs to
compute probabilities for any Normal Distribution –
the program does the translation for us.
Normal Dist37
The Normal (0,1) or Standard Normal Table.
Positive values of z are read from the first column
(under x in Rosner)
z
0.0
0.01
…
0.30
0.31
A
B
C
D
.5000 .5000 .0
.0
.5040 .4960 .0040 .0080
.6179 .3821 .1179 .2358
.6217 .3783 .1217 .2434
Pr[Z < 0.31]
0
z
0.31
The shaded area,
which is the
probability of Z  z,
is shown under Col
A of the table:
Pr(Z < 0.31) = .6217
A check that this
makes sense: any
positive value of z is
above the mean, and
should have a
probability > .5
Normal Dist38
Note that only positive values of z are tabulated.
We can take advantage of a few important features of
the standard normal, to compute probabilities for
values of z less than zero:
• Symmetry
 Pr(Z  -z) = Pr(Z  z)
• Zero is the median
 Pr(Z  0) = Pr(Z  0) = .50
• Total area is 1
 Pr(Z  z) + Pr(Z  z) = 1
Normal Dist39
For example, we cannot read Pr(Z < -0.31) directly
from the tables.
We can, however use the property of symmetry:
Use the property of
symmetry to get this.
Pr(Z <- 0.31)
= .3783
z = - 0.31
We can read this
probability from Col B
Pr(Z > 0.31)
= .3783
z = 0.31
Normal Dist40
-z
0
z
Normal Dist41
Example Word Problem
What is the probability of a value of Z more than 1
standard deviation below the mean?
Solution: Since  = 0 and  = 1
1 standard deviation below the mean is
z =  1x 0 11
Pr(Z<-1) = 0.1587
-1
0
The probability of observing a value more than 1 standard
deviation below the mean is .1587, or just under 16%.
Normal Dist42
Example: What is the probability Z is between –1.5
and 1.5?
We can read this from Column D of the Table in
Rosner:
Pr[-1.50  Z  1.50] from the table: 0.8664
Example: What is the probability of Z more than 1.5
standard deviations from the mean in either
direction?
Since probabilities sum to 1:
Pr[ Z  -1.50 or 1.50  Z ] = 1 – 0.8664 = 0.1336
By symmetry, half of this or 0.0668 lies at either end.
.0668
.0668
-1.50
0
1.50
Normal Dist43
Exercise
Find the area under the standard normal curve
between Z = +1 and Z = +2
Solution.
It helps to draw pictures!
0
1
Pr(1<Z<2)
2
0
2
0
=
Pr(Z<2)

Pr(Z<1)
=
0.9772

0.8413
=
0.1359
1
Normal Dist44
Notes on using Standard Normal Tables:
• These come in a variety of formats. The examples
given here are for the version seen in Rosner,
Table 3 in the Appendix.
• Look at the accompanying picture of the
distribution to be clear what probability is listed in
the body of the table.
• Draw a sketch (paper and pencil) when computing
probabilities – it always helps you keep track of
what you are doing.
• Minitab provides the same probabilities as Column
A: Pr(X<x), when Cumulative Probability is
selected
Normal Dist45
Using Minitab:
Calc  Probability Distributions  Normal
Select for
Pr(Z<z)
or Pr(X<x)
Enter
value of z
(or x)
Normal Dist46
Finding Percentiles of the Normal Distribution
Example: What is the 75th percentile of N(0,1) ?
Solution: Again, it helps to draw a picture!
0.75
0
z.75
We want the area under the curve to be 75% -The value of z we want is the value, below which 75%
of values are found.
That is, find z.75 so that Pr(Z < z.75) = .75
Normal Dist47
Use the
Inverse
Cumulative
Option in
Minitab
Input desired
percentile
Inverse Cumulative Distribution Function
Normal with mean = 0 and standard deviation = 1.00000
P( X <= x)
0.7500
x
0.6745
Normal Dist48
Standardizing a Normal Random Variate:
From N(,2) to N(0,1)
We can transform any Normal distribution to a
standard normal by means of a simple
transformation:
X ~ N ( , )
2

Z
X 

~ N (0,1)
Normal Dist49
Standardizing a Normal Random Variate:
From N(,2) to N(0,1)
Adding a constant:

For X~N(,2)

(X+b) ~ N(?,?)
 b
The mean is shifted over ‘b’ units, but the variance
or spread of the data is unchanged by adding a
constant:
(X+b) ~ N(+b, 2)
Normal Dist50
Multiplying by a constant:
For X~N(,2)

(aX) ~ N(?,?)

a

a
The mean is adjusted to ‘a’ times the original
mean, and the variance by a2 times the original
variance – this is a shift in scale:
(aX) ~ N(a, a22)
Normal Dist51
Adding a constant, multiplying by a constant:
For X~N(,2)

(aX+b) ~ N(?,?)
Both adjustments are made:
The mean is adjusted to ‘a’ times the original
mean plus ‘b’, and the variance by a2 times the
original variance:
(aX+b) ~ N(ab, a22)
Normal Dist52
Now, let
Then
a 1/ and b /
 1 
 X 
Z  aX  b    X   


  

For X~N(,2)
Z ~ N(?,?)
  
 z  a  b    
0



1
2
1 2
  a      1
 
2
z
Or
2
2
Z ~ N(0,1)
Normal Dist53
X ~ N ( , )
2

Z
X 

~ N (0,1)
We have transformed the original scale
• to units measured in multiples of standard
deviations
• centered around zero
• A value of z=-1 means the value of x is 1
standard deviation below the mean
• A value of z=2.5 means the value of x is 2.5
standard deviations above the mean
Normal Dist54
This transformation is also important, because if
we want to know
Pr(a  X  b)
Then we can convert it to an equivalent
calculation:
a X  b 
Pr(a  X  b)  Pr 




 
 
b 
a
 Pr 
Z

 
 
Normal Dist55
Word Problem
The profit from the Massachusetts state lottery on
any given week is distributed Normally with
mean = 10.0 million and variance = 6.25 million
dollars.
What is the probability that this week’s profit is
between 8 and 10.5 million?
Let X = weekly profit in millions
Then X ~ N(,2)
where
=10 and 2=6.25
(  =2.5 )
What is Pr(8  X  10.5) ?
Normal Dist56
What is Pr(8  X  10.5) ?
Translate to Standard Normal:
 8   X   10.5   
Pr(8  X  10.5)  Pr 








10.5  10 
 8  10
 Pr 
Z

2.5
2.5


 Pr  0.8  Z  0.2
-.8
.2
Normal Dist57
-.8
.2
Pr(Z<0.2)
–
Pr(Z<-.8)
Read from Table 3 or use Minitab or other program:
= 0.5793
–
0.2119
= 0.3674
The probability of a weekly profit between 8 and
10.5 million dollars is 36.74%.
Normal Dist58
Application of the Central Limit Theorem
• Means of samples of size n
• from a population with
• mean  and variance 2
• follow a normal distribution
• with mean  and variance 2/n, for n large.
That is, for X ~ ?(, 2)
for n large,
X ~ N(, 2/n)
Normal Dist59
Example: Consider a population of families with
=3.4 children per family and 2=4.37.
What percentage of samples of size n=4 families will
have means greater than 5 children per family?
Sample means from samples with n=4 follow a
normal distribution with
x= 3.4 and x2 = 2/n = 4.37/4 = 1.09.
Then x = 1.045
We want: Pr(X>5) , where X ~ N(3.4, 1.09)
Normal Dist60
 X   x 5  3.4 
Pr( X  5)  Pr 


1.045 
 x
 Pr  Z  1.53
Pr(z > 1.53) = 0.06
1.53
The probability of observing a sample with a
mean of 5 children per family or larger, when
n=4 is about 6%.
Normal Dist61
So far we have gone from
• X ~ N(, 2)

Z ~ N(0,1):
Z
X 

We may be interested in the reverse:
• Z ~ N(0,1)

X ~ N(, 2):
X Z  
Normal Dist62
Example:
The distribution of IQ scores is normal with a mean of
100 and a standard deviation of 15.
What is the 95th percentile of this distribution?
Step 1:
Find the 95th percentile of the standard normal –
use Minitab, or another program to compute:
Inverse Cumulative Distribution Function
Normal with mean = 0 and standard deviation = 1.00000
P( X <= x)
0.9500
x
1.6449
or z.95 = 1.645
Normal Dist63
Step 2:
X Z  
We know X ~ N(100, 152), and z.95 = 1.645
x.95 = z.95 + 
= (15)(1.645) + 100
= 124.7
The 95th percentile of the IQ distribution is 124.7
Normal Dist64
Another Example:
Taking samples of size n=4 from the population of
families with =3.4 children per family and 2=4.37:
What is the middle 50% of the sampling distribution?
50%
25%
25%
a
b
That is, find a and b so the Pr(a  X  b) = .50
a is the 25th percentile of the sampling distribution of X
b is the 75th percentile of the sampling distribution of X
Normal Dist65
Use Minitab to find 25th and 75th percentiles of
standard normal:
Inverse Cumulative Distribution Function
P( X <= x)
x
0.2500
-0.6745
0.7500
0.6745
For X ~ N(, 2/n) where =3.4 and 2/n=1.09,
Convert z back to x:
x = z x + 
x.75 = .675 (1.045) + 3.4 = 4.11
x.25 = -.675 (1.045) + 3.4 = 2.69
 Pr( 2.69 < X < 4.11) = .50
50% of samples of size 4 from this population will have
mean family size between 2.69 and 4.11 children per
family.
Normal Dist66
Recap. . . Introduction to the Normal Distribution
For continuous variables, we speak of a
• probability density function
• We calculate the probabilities of intervals of
values, not individual values
The normal distribution is a good description of
• many naturally occurring phenomena
• the average of non-normal phenomena
This last is particularly important since much
statistical inference is based on the behavior of
averages.
Normal Dist67
While there are infinitely many normal distributions,
each determined by  and 2,
• they can all be standardized by using the
transformation
Z
X 

~ N (0,1)
• We use the standardized form to compute
probabilities for any normal distribution.
• In the standardized form, distance from the
mean is in units of standard deviation
Normal Dist68
Download