Chapter 7 - Emunix Emich

advertisement
Continuous Random Variables
&
The Normal Probability
Distribution
Learning Objectives
1. Understand characteristics about continuous
random variables and probability distributions
2. Understand the uniform probability distribution
3. Graph a normal curve
4. State the properties of a normal curve
5. Understand the role of area in the normal density
function
6. Understand the relation between a normal random
variable and a standard normal random variable
Continuous Random Variable
&
Continuous Probability
Distribution
Continuous Random Variable
• The outcomes of a continuous random variable
consist of all possible values made up an
interval of a real number line.
• In other words, there are infinite number of
possible outcomes for a continuous random
variable.
Continuous Random Variable
• For instance, the birth weight of a randomly selected baby.
The outcomes are between 1000 and 5000 grams with all
1-gram intervals of weight between1000 and 5000 grams
equally likely.
• The probability that an observed baby’s weight is exactly
3250.326144 grams is almost zero. This is because there
may be one way to observe 3250.326144, but there are
infinite number of possible values between 1000 and 5000.
According to the classical probability approach, the
probability is found by dividing the number of ways an
event can occur by the total number of possibilities. So, we
get a very small probability almost zero.
Continuous Random Variable
• To resolve this problem, we compute
probabilities of continuous random variables
over an interval of values. For instance, instead
of getting exactly weight of 3250.326144 grams
we may compute the probability that a selected
baby’s weight is between 3250 to 3251 grams.
• To find probabilities of continuous random
variables, we use probability distribution (or so
called density) function.
Uniform Random Variable
&
Uniform Probability Distribution
Uniform Random Variable
• Sometimes we want to model a continuous
random variable that is equally likely between
two limits
• Examples
– Choose a random time … the number of seconds
past the minute is random number in the interval from
0 to 60
– Observe a tire rolling at a high rate of speed …
choose a random time … the angle of the tire valve to
the vertical is a random number in the interval from 0
to 360
Uniform Probability Distribution
• When “every number” is equally likely in
an interval, this is a uniform probability
distribution
– Any specific number has a zero probability of
occurring
– The mathematically correct way to phrase this
is that any two intervals of equal length have
the same probability
Example
• For the seconds after the minute example
• Every interval of length 3 has probability
3/60
– The chance that it will be between 14.4 and
17.3 seconds after the minute is 3/60
– The chance that it will be between 31.2 and
34.2 seconds after the minute is 3/60
– The chance that it will be between 47.9 and
50.9 seconds after the minute is 3/60
Probability Density Function
• A probability density function is an equation
used to specify and compute probabilities of a
continuous random variable
• This equation must have two properties
– The total area under the graph of the
equation is equal to 1 (the total probability
is 1)
– The equation is always greater than or equal
to zero (probabilities are always greater than
or equal to zero)
Probability Density Function
• This function method is used to represent
the probabilities for a continuous random
variable
• For the probability of X between two
numbers
– Compute the area under the curve between
the two numbers
– That is the probability
Area is the Probability
• The probability of being between 4 and 8
The probability
From 4 (here)
To 8 (here)
Probability Density Function
• An interpretation of the probability density
function is
– The random variable is more likely to be in
those regions where the function is larger
– The random variable is less likely to be in
those regions where the function is smaller
– The random variable is never in those regions
where the function is zero
Probability Density Function
• A graph showing where the random variable has
more likely and less likely values
More likely values
Less likely values
Uniform Probability Density
Function
• The time example … uniform between 0 and 60
– All values between 0 and 60 are equally likely, thus
the equation must have the same value between 0
and 60
Uniform Probability Density
Function
• The time example … uniform between 0 and 60
– Values outside 0 and 60 are impossible, thus the
equation must be zero outside 0 to 60
Uniform Probability Density
Function
• The time example … uniform between 0 and 60
– Because the total area must be one, and the width of the
rectangle is 60, the height must be 1/60. Therefore the uniform
1
y

f
(
x
)

probability density is a constant ( the equation is
)
60
1/60
Uniform Probability Density
Function
• The time example … uniform between 0 and 60
– The probability that the variable is between two
numbers is the area under the curve between them
1/60
Normal Random Variable
&
Normal Probability Distribution
Overview
• The normal distribution models bell
shaped variables
• The normal distribution is the fundamental
distribution underlying most of inferential
statistics
Chapter 7 – Section 1
• The normal curve has a very specific bell
shaped distribution
• The normal curve looks like
Normal Random Variable
• A normally distributed random variable, or a variable with
a normal probability distribution, is a random variable
that has a relative frequency histogram in the shape of a
normal curve
• This curve is also called the normal density
curve/function or normal curve (a particular probability
density function)
• The normal distribution models bell shaped
variables
• The normal distribution is the fundamental
distribution underlying most of inferential
statistics
Normal Density Curve
• In drawing the normal curve, the mean μ and the
standard deviation σ have specific roles
– The mean μ is the center of the curve
– The values (μ – σ) and (μ + σ) are the inflection points
of the curve, where the concavity of the curve
changes.
Normal Density Curve
• There are normal curves for each
combination of μ and σ
• The curves look different, but the same too
• Different values of μ shift the curve left and
right
• Different values of σ shift the curve up and
down
Normal Curve
• Two normal curves with different means (but the same
standard deviation)
– The curves are shifted left and right
Normal Density Curve
• Two normal curves with different standard
deviations (but the same mean)
– The curves are shifted up and down
Properties of Normal Curve
• Properties of the normal density curve
– The curve is symmetric about the mean
– The mean = median = mode, and this is the highest point of the
curve
– The curve has inflection points at (μ – σ) and (μ + σ)
– The total area under the curve is equal to 1. The total area is
equal to 1. (It is complicated to show this. But it is true.)
– The area under the curve to the left of the mean is equal to the
area under the curve to the right of the mean
Properties of Normal Curve
• Properties of the normal density curve
– As x increases, the curve getting close to zero (never goes to
zero, though)… as x decreases, the curve getting close to zero
(never goes to zero)
• In addition,
– The area within 1 standard deviation of the mean is
approximately 0.68
– The area within 2 standard deviations of the mean is
approximately 0.95
– The area within 3 standard deviations of the mean is
approximately 0.997 (almost 100%)
This is so called empirical rule.
Therefore, a normal curve will be close to zero at about 3 standard
deviation below and above the mean.
Empirical Rule
• The empirical rule or 68-95-99.7 rule is true
– Approximately 68% of the values lie between
(μ – σ) and (μ + σ)
– Approximately 95% of the values lie between
(μ – 2σ) and (μ + 2σ)
– Approximately 99.7% of the values lie between
(μ – 3σ) and (μ + 3σ)
• These are difficult calculations, but they are true
Empirical Rule ( 68-95-99.7 Rule)
• An illustration of the Empirical Rule
Histogram & Density Curve
• When we collect data, we can draw a
histogram to summarize the results
• However, using histograms has several
drawbacks
• Histograms are grouped, so
– There are always grouping errors
– It is difficult to make detailed calculations
Histogram & Density Curve
• Instead of using a histogram, we can use
a probability density function that is an
approximation of the histogram
• Probability density functions are not
grouped, so
– There are not grouping errors
– They can be used to make detailed
calculations
Normal Histogram
• Frequently, histograms are bell shaped such as
• We can approximate these with normal curves
Normal Curve Approximation
• Lay over the top of the histogram with a curve such as
• In this case, the normal curve is close to the histogram,
so the approximation should be accurate
Normal Density Probability
Function
• The equation of the normal curve with mean μ
and standard deviation σ is
y
1
2 
( x   )2
2
e 2
• This is a complicated formula, but we will never
need to use it for the calculation of probabilities.
(thankfully)
Modeling with Normal Curve
• When we model a distribution with a normal probability
distribution, we use the area under the normal curve to
– Approximate the areas of the histogram being
modeled
– Approximate probabilities that are too detailed to be
computed from just the histogram
Example
• Assume that the distribution of giraffe weights
has μ = 2200 pounds and σ = 200 pounds
Example Continued
• What is an interpretation of the area under the
curve to the left of 2100?
Example Continued
• It is the proportion of giraffes that weigh 2100 pounds
and less
Note: Area = Probability = Proportion
Standardize Normal Random Variable
• How do we calculate the areas under a normal curve?
– If we need a table for every combination of μ and σ,
this would rapidly become unmanageable
– We would like to be able to compute these
probabilities using just one table
– The solution is to use the standard normal random
variable
Standard Normal Random Variable
• The standard normal random variable is the specific
normal random variable that has
μ = 0 and σ = 1
• We can relate general normal random variables to the
standard normal random variable using a so-called Zscore calculation
Standard Normal Random Variable
• If X is a general normal random variable with
mean μ and standard deviation σ then
Z
X 

is a standard normal random variable ( Z-score)
• This equation connects general normal random
variables with the standard normal random
variable
• We only need a standard normal table
Example
• The area to the left of 2100 for a normal curve with mean
2200 and standard deviation 200
Example Continued
• To compute the corresponding value of Z, we use the Zscore
Z
X 


2100  2200
1

200
2
• Thus the value of X = 2100 corresponds to a value of Z =
– 0.5
Symmary
• Normal probability distributions can be used to
model data that have bell shaped distributions
• Normal probability distributions are specified by
their means and standard deviations
• Areas under the curve of general normal
probability distributions can be related to areas
under the curve of the standard normal
probability distribution
The Standard Normal Distribution
Objectives
• Find the area under the standard normal
curve
• Find Z-scores for a given area
• Interpret the area under the standard normal
curve as a probability
How to Compute Area under Standard
Normal Curve
• There are several ways to calculate the
area under the standard normal curve
– We can use a table (such as Table IV on the
inside back cover)
– We can use technology (a calculator or
software)
• Using technology is preferred
Compute Area under Standard Normal
Curve
• Three different area calculations
– Find the area to the left of
– Find the area to the right of
– Find the area between
• Two different methods shown here
– From a table
– Using TI Graphing Calculator (recommended method)
Finding Area under Standard Normal Curve
using Z-table
• “Area to the left of" – using Z-table ( Standard Normal Table)
• Calculate the area to the left of Z = 1.68
– Break up 1.68 as 1.6 + .08
– Find the row 1.6
– Find the column .08
• The probability is 0.9535
Note: The table always covers the area to the left of the Z score.
Finding Area under Standard Normal
Curve using Z- Table
• “Area to the right of" – using a Z- table
• The area to the left of Z = 1.68 is 0.9535 from
reading the table.
• The right of … that’s the remaining amount
• The two add up to 1, so the right of is
1 – 0.9535 = 0.0465 which is the solution.
Finding Area under Standard Normal Curve
using Z-table
• “Area Between”
• Between Z = – 0.51 and Z = 1.87
• This is not a one step calculation
Finding Area under Standard Normal Curve
using Z-table
• The left hand picture … area to the left of 1.87 ( which is
0.9693) … includes too much
• It is too much by the right hand picture … area to the left
of -0.51(which is 0.3050)
Included
too much
Finding Area under Standard Normal Curve
using Z-table
• Area between Z = – 0.51 and Z = 1.87…. 0.9693 – 0.3050 = 0.6643
We want
We start out with,
but it’s too much
We correct by
Area = 0.9693
Area=0.3050
Finding Area under Standard Normal Curve
using Z- Table
• The area between -0.51 and 1.87
 The area to the left of 1.87, or 0.9693 … minus
 The area to the left of -0.51, or 0.3050 … which
equals
 The difference of 0.6643
• Thus the area under the standard normal curve between
-0.51 and 1.87 is 0.6643
Finding Area under Standard Normal Curve
using Z-table
• A different way for “between” …. 1 – (0.3050+0.0307) = 0.6643
We want
We delete the
extra on the left
We delete the
extra on the right
Area = 0.3050
Area = 0.0307
Finding Area under Standard Normal Curve
using Z-table
• The area between -0.51 and 1.87
– The area to the left of -0.51, or 0.3050 … plus
– The area to the right of 1.87, or 0.0307 … which
equals
– The total area to get rid of which equals 0.3357
• Thus the area under the standard normal curve between
-0.51 and 1.87 is 1 – 0.3357 = 0.6643
Finding Area under Standard Normal Curve
using TI Graphing Calculator
•
•
Area to the left of 1.68 – using TI graphing calculator
The function is normalcdf( ). Following the key sequence below:
1. DISTR[2ND VARS]  DISTR  2:normalcdf  ENTER 
2 Then, enter -E99,1.68,0,1)  ENTER
The probability is 0.9535
Note:
1. -E99 = -1099 which is a negative number near –infinity. We use it as the left
bound to obtain “less than or equal to” some values, that is, x  a .
E symbol can be entered by pressing EE on the calculator, using the key
sequence [2ND ,].
2. normalcdf() (cdf means cumulative distribution function) sums up the
probabilities. It differs from 1:normalpdf() on the calculator which calculate
the normal densities.
3. There are four entries/parameters needed for the function normalcdf(). For
instance, to find the probability of a normal variable between the interval
from a to b, i.e. a  x  b. The 1st number entered for normalcdf() is the left
bound of an interval a; the 2nd number is the right bound of the interval b; the
3rd number is the mean of the normal variable ( it is 0 for a standard normal
variable). The 4th number is the standard deviation of the normal variable. (
which is 1 for a standard normal variable).
Finding Area under Normal Curve using TI
Graphing Calculator
•
•
“Area to the right of" – using TI graphing calculator
The area to the right of Z = 1.68
1. DISTR[2ND VARS]  DISTR  2:normalcdf  ENTER 
2 Then, enter 1.68, E99, 0,1)  ENTER
The probability is 0.0465
Note:
1.
E99 = 1099 which is a very large number near infinity. We use it as
the right bound to obtain “greater than or equal to” some values, that
is, x  a . E symbol can be entered by pressing EE key on the
calculator, using the key sequence [2ND ,].
Finding Area under Normal Curve using
TI Graphing Calculator
• “Area Between” – using TI graphing calculator
• Between Z = – 0.51 and Z = 1.87
1. DISTR[2ND VARS]  DISTR  2:normalcdf  ENTER 
2 Then, enter -0.51, 1.87, 0,1)  ENTER
The probability is 0.6642
Finding Z score from Probability
• We did the problem:
Z-Score  Area
• Now we will do the reverse of that
Area  Z-Score
• This is finding the Z-score (value) that corresponds to a
specified area (percentile)
• And … no surprise … we can do this with a table, with TI
graphing calculator.
Locate Z Score from Table
•
•
“To the left of” – using a table
Find the Z-score for which the area to the left of it is 0.32
– Look in the middle of the table … find 0.32
– The nearest to 0.32 is 0.3192 … a Z-Score of -0.47
Locate Z Score from Table
•
•
•
•
"To the right of" – using a table
Find the Z-score for which the area to the right of it is 0.4332
Right of it is .4332 … So, left of it would be .5668
Look in the middle of the table … find 0.5668. The nearest one is
0.5675.
• A value of .17
Read
Read
Enter
Note: The table always covers the area to the left of a z score. So, we need
the area to the left.
Locate Z Score from TI Graphing
Calculator
•
•
“To the left of” – using TI graphing Calculator
Find the Z-score for which the area to the left of it is 0.32
1.
DISTR[2nd VARS]  3:invNorm (  ENTER
2.
Enter 0.32,0,1), hit ENTER
Solution: The Z-Score is -0.47
•
•
Find the Z-score for which the area to the right of it is 0.4332
Right of it is .4332 … So, left of it would be .5668
1.
DISTR[2nd VARS]  3:invNorm (  ENTER
2.
Enter 0.5668,0,1), hit ENTER
Solution: The Z-Score is 0.17
Note: invNorm( ) contain 3 parameters: the 1st is the area to the left of
a Z score; the 2nd is the mean; the 3rd is the standard deviation.
Finding a Middle Range
• We will often want to find a middle range of Z scores,
from z0 to z 1 . For instance, find the middle 90% or
the middle 95% or the middle 99%, of a standard
normal distribution
• The middle 90% would be
How to find a Middle 90% Range
• The two possible ways
– The number for which 5% is to the left, or
– The number for which 5% is to the right
5% is to the left
5% is to the right
How To Find a Middle 90% Range
• 90% in the middle is 10% outside the middle, i.e. 5% off
each end
• These problems can be solved in either of two equivalent
ways
• We could find
– The number for which 5% is to the left, or
– The number for which 5% is to the right
• Use TI calculator: From invNorm(.05, 0, 1), we get a
lower z score of -1.64. From invNorm(0.95, 0, 1), we get
a upper z score of 1.64. So the middle range that covers
the middle 90% of the values for a standard normal
distribution is from -1.64 to 1.64.
What is zα ?
• The number zα denotes a Z-score such that the
area to the right of zα is α (Greek letter alpha)
• Some commonly used zα values are





z.10 = 1.28, the area between -1.28 and 1.28 is 0.80
z.05 = 1.64, the area between -1.64 and 1.64 is 0.90
z.025 = 1.96, the area between -1.96 and 1.96 is 0.95
z.01 = 2.33, the area between -2.33 and 2.33 is 0.98
z.005 = 2.58, the area between -2.58 and 2.58 is 0.99
Area as the Probability
• The area under a normal curve can be interpreted as a probability
• The standard normal curve can be interpreted as a probability
density function
• We will use Z to represent a standard normal random variable, so it
has probabilities such as
 P(a < Z < b)
 P(Z < a)
 P(Z > a)
Note: Normal random variable is a continuous random variable. The
probability for a continuous random variable being equal to a single
value is zero as explained previously. So, The probability remains
the same regardless if the inequalities are inclusive (include the
endpoints) or exclusive (do not include the end points). That is, for
instance, P(Z  a)  P(Z  a) .
Summary
• Calculations for the standard normal curve can
be done using tables or using technology
• One can calculate the area under the standard
normal curve, to the left of or to the right of each
Z-score
• One can calculate the Z-score so that the area
to the left of it or to the right of it is a certain
value
• Areas and probabilities are two different
representations of the same concept
Applications of the
Normal Distribution
Learning Objectives
1. Find and interpret the area under a normal
curve
2. Find the value of a normal random variable
General Normal Probability
Distribution
• So far, we have learned to find the area
under a standard normal curve. Now, we
want to calculate area and values for
general normal probability distributions
• We can relate these problems to
calculations for the standard normal
previously.
Standardize a General Normal
Variable
• For a general normal random variable X with
mean μ and standard deviation σ, the variable
Z
X 

has a standard normal probability distribution
• We can use this relationship to perform
calculations for X from Z
Convert X to Z
• Values of X  Values of Z
• If x is a value for X, then
z
x

is a value for Z
• This is a very useful relationship
Example
• For example, if a normal variable X has
μ = 3 and σ = 2,
then a value of x = 4 for X corresponds to
43
z
 0.5
2
a value of z = 0.5 for Z
Find P(X < x) from P(Z < z)
• Because of this relationship
Values of X  Values of Z
z
x

then
P(X < x) = P(Z < z)
• To find P(X < x) for a general normal random variable,
we could calculate P(Z < z) for a corresponding standard
normal random variable
Find P(X < x) from P(Z < z)
• This relationship lets us compute all the
different types of probabilities
• Probabilities for X are directly related to
probabilities for Z using the (X – μ) / σ
relationship
Find P(X < x) from P(Z < z)
• A different way to illustrate this relationship
X
a
μ
b
Z
a–μ
σ
b–μ
σ
Find P(X < x) from P(Z < z)
• With this relationship, the following method can
be used to compute areas for a general normal
random variable X
– Shade the desired area to be computed for X
– Convert all values of X to Z-scores using
z
x

– Solve the problem for the standard normal Z
– The answer will be the same for the general normal X
Example
• For a general normal random variable X with
μ = 3 and σ = 2
calculate P(X < 6)
• This corresponds to
63
z
 1.5
2
so P(X < 6) = P(Z < 1.5) = 0.9332 [Use a Z-table
or TI calculator from normalcdf(-E99,1.5, 0, 1)]
Example
• For a general normal random variable X with
μ = –2 and σ = 4
calculate P(X > –3)
• This corresponds to
z
 3  ( 2 )
 0.25
4
so P(X > –3) = P(Z > –0.25) = 0.5987 [ Use a Z-Table or TI
calculator from normalcdf(-3, E99, 0, 1)]
Example
• For a general normal random variable X with
μ = 6 and σ = 4
calculate P(4 < X < 11)
• This corresponds to
z
46
 0.5
4
z
11  6
 1.25
4
so P(4 < X < 11) = P(– 0.5 < Z < 1.25) = 0.5858 [ Use a
Z-table or TI calculator from normalcdf(-0.5,1.25,0,1)]
Calculate P(X < x) Directly
• Technology often has direct calculations for the general normal
probability distribution
• For instance, for a general normal random variable X with μ = 6 and
σ = 4, calculate P(4 < X < 11).
Use TI graphing calculator, we can obtain the answer directly from
normalcdf(4, 11, 6, 4) without converting X to Z.
Note: In general, to find the area under any normal curve between the
interval from a to b, the sequence of parameters for the function
normalcdf( ) is (a, b, mean, standard deviation). If it is a standard
normal curve, you can just enter (a, b) instead of (a, b, 0,1), because
Z is the default normal variable in TI calculator.
Compute X values from probabilities
• The inverse of the relationship
Z
X 

is the relationship
X    Z
• With this, we can compute value problems (
convert Z score to its original score) for the
general normal probability distribution
Compute X values from probabilities
• The following method can be used to compute
values for a general normal random variable X
– Shade the desired area to be computed for X
– Find the Z-scores for the same probability problem
– Convert all the Z-scores to X using
X    Z
Example
• For a general random variable X with
μ = 3 and σ = 2,
find the value x such that P(X < x) = 0.3
• Since P(Z < –0.5244) = 0.3 (Note: From a Z-table or
calculator: invNorm(0.3,0,1) = -0.5244), we then
convert Z to X:
X    Z
x  3  (0.5244)  2  1.95
so P(X < 1.95) = P(Z < –0.5244) = 0.3
Example
• For a general random variable X with
μ = –2 and σ = 4
find the value x such that P(X > x) = 0.2
• Since P(Z > 0.8416) = 0.2, (Note: From a Z-table or
calculator to obtain a z-score: invNorm(0.8, 0,1) =
0.8416), we then convert the Z score back to X using:
X    Z
x  2  0.8416  4  1.37
so P(X > 1.37) = P(Z > 0.8416) = 0.2
Example
• We know that z.05 = 1.28, so
P(–1.28 < Z < 1.28) = 0.90
• Thus for a general random variable X with
μ = 6 and σ = 4, the middle 90% range is from
-0.58 to 12.58.
x1  6  1.28  4  0.58
x2  6  1.28  4  12.58
Compute X values directly
• Technology often has direct calculations for the general normal
probability distribution
• For instance, For a general random variable X with μ = 3 and σ = 2,
find the value x such that P(X < x) = 0.3. We can solve it with a TI
graphing calculator: invNorm(0.3, 3, 2) which gives the answer
1.95.
Note: In general, to find a x value corresponding a given area, say p, to
the right of x under any normal curve, the sequence of parameters
for the function invNorm( ) is (p, mean, standard deviation). If it is a
standard normal curve, you can just enter (p) instead of (p, 0,1),
because Z is a default normal variable in TI calculator.
Summary
• We can perform calculations for general normal
probability distributions based on calculations for
the standard normal probability distribution
• For tables, and for interpretation, converting
values to Z-scores can be used
• For technology, often the parameters of the
general normal probability distribution can be
entered directly into a routine
Summary
• The normal distribution is
– The most important bell shaped distribution
– Will be used to model many random variables
• The standard normal probability distribution
– Has a mean of 0 and a standard deviation of 1
– Is the basis for normal distribution calculations
• The general normal probability distribution
– Has a general mean and general standard deviation
– Can be used in general modeling situations
Download