Section 9.5 The Normal Distribution

advertisement
Section 9.5 – The Normal Distribution
Section 9.5
The Normal Distribution
The normal distribution is a very common continuous probability distribution model in which most
of the data points naturally fall near the mean, with fewer data points farther away from the mean. For
example, if a test was given in a large class of students, most students would be expected to score at
or near the mean, while fewer and fewer students would be expected to lie far from the mean. When
graphed, a normal distribution takes on a recognizable “bell curve” shape:
Figure: Typical normal distribution (“bell”) curve.
Most of the data, represented as area, is clumped into the middle of the graph, while little of the data
is found on the on the fringes. Since the curve is a probability density function, the area underneath is
by definition equal to 1.
The Standard Normal Distribution
The standard normal distribution is a normal probability distribution function in which the mean is
µ = 0 , the variance is var = 1 and the standard deviation (the square root of the variance) is σ = 1 .
The notation for a normal distribution is N ( µ , var) , so the standard normal distribution is known as
the N (0,1) distribution. The actual probability density function that defines the standard normal
distribution is
N (0,1) = f ( x ) =
1
2π
e−x , − ∞ < x < ∞
2
This function is somewhat difficult to handle in an integral, so instead tables or calculators are used to
determine the various areas underneath the probability density function.
Since the standard normal distribution is a p.d.f., it has an area of 1 below its graph. The standard
normal distribution N (0,1) is built in such a way that approximately 68% of the data is within one
standard deviation of the mean, about 95% of the data is within two standard deviations of the mean,
and over 99% of the data is within three standard deviations of the mean. Half of the area lies below
(to the left) the mean, and half lies above (to the right) the mean. The middle “hump” of the
distribution is sometimes called the body of the curve while the narrow fringes on both sides of the
hump are called tails.
The standard normal distribution has symmetry across the y-axis which can be used to help calculate
areas over various bounds.
423
Section 9.5 – The Normal Distribution
Figure: The shaded regions represent one, two and three standard deviations
from the mean on the standard normal distribution.
Example 1: Let r.v. X be normally distributed according to the N (0,1) distribution. Use the above
graphs and symmetry to determine the following probabilities (areas): (a) P(−1 ≤ X ≤ 0) , (b)
P(0 ≤ X ≤ 2) , (c) P( X ≤ 0) and (d) P( X ≤ 1) .
Discussion: For (a), the area between − 1 ≤ x ≤ 0 is half of the 0.68 that normally spans within one
standard deviation of the mean. Hence, P(−1 ≤ X ≤ 0) = 0.34 . For (b), the area over 0 ≤ x ≤ 2 is
half of the 0.95 that normally spans within two standard deviations of the mean. Therefore,
P(0 ≤ X ≤ 2) = 0.475 . Parts (c) and (d) are open-ended intervals. For part (c), P( X ≤ 0) = 0.5 since
this encloses half of the full area under the graph. For (d), the area over 0 ≤ x ≤ 1 is 0.34, and added
to the 0.5 over the left half of the graph, gives a total area of 0.84. Therefore, P( X ≤ 1) = 0.84 . The
graphs of each probability from example 1 are on the next page. 424
Section 9.5 – The Normal Distribution
Figure: Shaded areas for example 1.
For standard deviation values that are not integers, a table or calculator is used to determine the areas
over a given set of bounds, as the next example illustrates:
Example 2: Use a table or calculator to determine these probabilities on the standard normal
distribution N (0,1) : (a) P(−0.45 ≤ X ≤ 1.26) , (b) P(−1 ≤ X ≤ 2.38) and (c) P( X ≥ 0.35) .
Discussion: The TI models of calculator have a built-in feature under DISTR for calculating areas
under a standard normal distribution: normalcdf(a,b). To calculate these probabilities as areas,
simply call up the normalcdf command and type in the bounds, separated by a comma:
(a)
(b)
(c)
P(−0.45 ≤ X ≤ 1.26) = normalcdf(–0.45, 1.26) = 0.5698.
P(−1 ≤ X ≤ 2.38) = normalcdf(–1, 2.38) = 0.8327
P( X ≥ 0.35) = 0.5 − P(0 ≤ X ≤ 0.35) = 0.5 – normalcdf(0, 0.35) = 0.3632
The calculator has the obvious advantage that the left and right bounds can be entered in together.
Most tables are built to give readings from 0 to a positive x-value. Thus, to answer part (a) using a
table, look up the area from 0 to 0.45, which by symmetry is the same as the area from –0.45 to 0.
Then look up the area from 0 to 1.26, and finally sum the two values together. A similar approach
will be needed if using a table for part (b).
For (c), the interval is open-ended to the right, so we use the fact that the area to the right of the mean
is 0.5. Look up the area from 0 to 0.35, and subtract from 0.5. 425
Section 9.5 – The Normal Distribution
Conversion to the Standard Normal Distribution: Z-Scores
Most scenarios that follow a normal distribution do not have a convenient mean of µ = 0 and
standard deviation of σ = 1 . In order to use the table or calculators to solve general normal
distribution scenarios, the input values x must be converted to “fit” onto the standard normal
distribution curve. The results are called z-scores and are simply standard deviations on the standard
normal distribution curve. The normalization formula is:
z=
x−µ
σ
In practice this formula is very easy to use. Just be sure to convert any variance figures into standard
deviation first.
Example 3: Suppose r.v. X is normally distributed as a N (10,4) distribution.
probabilities: (a) P(9 ≤ X ≤ 11) , (b) P(6.5 ≤ X ≤ 10.75) and (c), P( X ≤ 9.8) .
Find these
Discussion: The mean of this distribution is µ = 10 and the standard deviation is σ = 4 = 2 . To
answer part (a), normalize the 9 and the 11 into their corresponding z-scores, and calculate:
11 − 10 
 9 − 10
P(9 ≤ X ≤ 11) ⇒ P
≤Z≤
 = P(−0.5 ≤ Z ≤ 0.5) = 0.3829
2 
 2
For part (b), a similar conversion is made:
10.75 − 10 
 6.5 − 10
P(6.5 ≤ X ≤ 10.75) ⇒ P
≤Z≤
 = P(−1.75 ≤ Z ≤ 0.375) = 0.6061
2
 2

For part (c), the 9.8 is normalized into its z-score, and make note that the left endpoint is open ended:
9.8 − 10 

P( X ≤ 9.8) ⇒ P Z ≤
 = P( Z ≤ −0.1) = 0.4602
2 

Note that Z becomes a new random variable in place of X. The TI-83 calculator allows the mean and standard deviation to be entered in as parameters directly.
For example, in the case P(9 ≤ X ≤ 11) above with mean µ = 10 and standard deviation σ = 2 , the
value can be determined by typing normalcdf(9,11,10,2), which gives the desired result of 0.3829.
If the latter two parameters are not included, the calculator assumes the standard normal distribution
of N (0,1) .
426
Section 9.5 – The Normal Distribution
Applications of the Normal Distribution
The following examples illustrate the normal distribution as an application:
Example 4: Suppose the average American male stands 70 inches tall (5 ft, 10 in) with a standard
deviation of 2.5 inches. Assume the heights are normally distributed. A man is chosen at random.
What is the probability he (a) stands between 5 ft, 6 in and 6 ft tall? (b) he stands at least 6 ft tall? And
(c) he stands at most 5 ft, 9 in?
Discussion: Let r.v. X represent the height of men in inches. For consistency, all heights are
converted into inches, which in turn must be normalized into their corresponding z-scores.
For (a), the probability is written P(66 ≤ X ≤ 72) . With mean µ = 70 and σ = 2.5 , the heights are
normalized into z-scores and the probability determined:
72 − 70 
 66 − 70
P(66 ≤ X ≤ 72) ⇒ P
≤Z ≤
 = P(−1.6 ≤ Z ≤ 0.8) = 0.7333
2.5 
 2.5
Therefore, about 73.3% of all American men stand between 5 ft 6 in and 6 ft tall.
For (b), the probability is written P( X ≥ 72) , normalized into z-scores and the probability
determined:
72 − 70 

P( X ≥ 72) ⇒ P Z ≥
 = P( Z ≥ 0.8) = 0.2119
2.5 

Therefore, about 21.2% of American men stand above 6 ft tall.
For (c), the probability is written P( X ≤ 69) , normalized into z-scores and the probability
determined:
69 − 70 

P( X ≤ 69) ⇒ P Z ≤
 = P( Z ≤ −0.4) = 0.3445
2.5 

About 34.5% of men stand at or below 5 ft, 9 in. In many applications, we may work the process in reverse using tables or the invNorm feature on the
calculator. In cases like this, a percentage (area) underneath the normal distribution is known and we
wish to work backwards to find the corresponding z-score that agrees with this area. Specifically,
invNorm(a) returns a z-score on the standard normal curve N (0,1) such that the area from the left
up to the z-score is the desired area a, which must be entered as a decimal, 0 < a < 1 .
invNorm(a) = z ⇔
427
∫
z
−∞
N (0,1) dx = a
Section 9.5 – The Normal Distribution
Example 5: On the standard normal distribution N (0,1) , what z-score corresponds to an area of 0.85,
as read from the left?
Discussion: This problem can be rephrased as P( X ≤ z ) = 0.85 . The invNorm feature is designed to
return z-scores such that the area from the left up to the z-score is the desired percentage. Call up
invNorm and enter in the area: invNorm(0.85) = 1.036. Therefore, z = 1.036 . This z-score is
graphed with its corresponding area:
Remember, invNorm always returns z-scores for areas from the left up to the z-score. Example 6: On the standard normal distribution N (0,1) , what z-score determines an area of 0.25 to
the right of the z? In other words, what is the minimum z-score needed to be in the “top 25%”?
Discussion: Graphically, the top 25% is the shaded area extending from (unknown) z and to the right:
However, the InvNorm feature is designed to return areas from the left and up to z as a right endpoint. Therefore, the problem needs to be restated as invNorm(0.75), which is 0.674.
428
Section 9.5 – The Normal Distribution
Therefore, one needs to be positive 0.674 standard deviation from the mean to be at the minimum
threshold for the “top 25%”.
Example 7: In a large college class, the mean score on the midterm was µ = 66 with a standard
deviation of 8 points. The professor will give anyone who scored in the top 40% a B grade, and
anyone who scored in the top 12% an A grade. What are the minimum scores needed for a B grade?
an A grade?
Discussion: This distribution is N (66,64) . Be sure to note that var = 64, so that the standard
deviation is σ = var = 64 = 8 . In using the invNorm feature, we are assuming the N (0,1)
standard normal distribution. Therefore, we will need to find the z-scores in N (0,1) then convert
back to the N (66,64) setting.
To be in the top 40% means you scored better than 60% of the rest of the students. On the standard
normal distribution, InvNorm(0.6) = 0.253. We need to convert this figure from the N (0,1) setting
back to the N (66,64) by working the z-score formula in reverse:
z=
x−µ
σ
⇒ 0.253 =
x − 66
8
Solving for x, we get: x − 66 = 0.253(8) = 68.024 . Therefore, the minimum score needed for a B on
this test is a 68 (rounding as needed).
For an A grade, the same approach is used: to be in the top 12% means the student is “better” than
88% of the students: invNorm(0.88) = 1.175. Converting back to x, we get:
z=
x−µ
σ
⇒ 1.175 =
x − 66
8
Solving for x, we get x = 75.4 . Therefore, a minimum score of 75 (rounded) gives the student an A
grade.
Sometimes, “grading on the curve” means to adapt the scores onto a normal distribution such as the
above example. The parameters for mean and standard deviation can also be included with the calculator command.
For example, the above calculation could be entered as invNorm(0.88,66,8), which gives
75.3998…, or 75.4 as was determined.
429
Section 9.5 – The Normal Distribution
Example 8: The average speed for vehicles along one notorious stretch of freeway is µ = 75 miles
per hour with a standard deviation of σ = 6 miles per hour. The police will ticket anyone caught in
the top 0.5% of speeders. What is the minimum speed one needs to go to get a ticket?
Discussion: Being in the top 0.5% of speeders is equivalent to being faster than 99.5% of all drivers.
Hence, invNorm(0.995) = 2.58. This z-score is converted back to its corresponding x:
z=
x−µ
σ
⇒ 2.58 =
x − 75
⇒ x = 90.48
6
Therefore, a minimum speed of 90 miles per hour (rounded) earns you an automatic ticket. Example 9: You took a standardized exam and scored 95 points. The mean was µ = 71 . Your friend
scored 80 points and was found to be in the 85th percentile (meaning he scored better than 85% of all
other test takers). What is your percentile?
Discussion: The only missing piece of information here is the standard deviation. Your friend’s
result can be used to determine the standard deviation. Since he was in the 85th percentile, your
friend’s z-score is invNorm(0.85) = 1.036. Working this into the z-score conversion formula with
σ unknown, we can solve for σ :
z=
x−µ
σ
⇒ 1.036 =
80 − 71
σ
⇒ 1.036σ = 9 ⇒ σ =
9
= 8.69
1.036
Therefore we can determine your z-score:
z=
95 − 71
= 1.842
8.69
Thus, your percentile is equivalent to the area below z = 1.842 :
p(Z ≤ 1.842) = 0.967
Therefore, you scored in the 96.7th percentile. Two final comments regarding the Normal distribution:
•
•
P( X = k ) = 0 for all k, for the reasons outlined in section 9.4.
The inequalities ≤ and <, and ≥ and >, are used interchangeably: P( X < k ) = P( X ≤ k ) ,
and so forth. Inclusion or exclusion of the endpoint has no effect on the probability. By
default, the inequalities ≤ and ≥ are used. Thus, a statement such as “X is at most 20” can
be interpreted as P( X ≤ 20) or P( X < 20) ; both are the same.
430
Section 9.5 – The Normal Distribution
Summary
•
The standard normal distribution is a bell-shaped curve with mean µ = 0 , variance
var = 1 and standard deviation σ = 1 .
•
Any normal distribution is written N ( µ , var) . The standard normal distribution is called
the N (0,1) distribution.
•
The standard normal distribution is a probability density function defined for − ∞ < x < ∞
and an area of 1 below the graph.
•
Probabilities are read as areas under the curve.
•
About 68% of the data is within one standard deviation of the mean, 95% of the data is
within two standard deviations of the mean, and over 99% of the data is within three
standard deviations of the mean. To 3-decomal place accuracy:
P(−1 ≤ X ≤ 1) = 0.683
P(−2 ≤ X ≤ 2) = 0.954
P(−3 ≤ X ≤ 3) = 0.997
•
The standard normal distribution is symmetric: the area to the left of the mean µ = 0 is 0.5
and to the right is 0.5.
P( X ≤ 0) = 0.5
P( X ≥ 0) = 0.5
•
To convert from a general normal distribution with mean µ and standard deviation σ to
corresponding z-scores on the N (0,1) distribution, use the normalization formula
z=
•
x−µ
σ
invNorm(a) returns a z-score on the standard normal curve N (0,1) such that the area
from the left up to the z-score is the desired area a, which must be entered as a decimal,
0 < a < 1.
invNorm(a) = z ⇔
431
∫
z
−∞
N (0,1) dx = a
Download