3 Normal Distriubtio..

advertisement
CHAPTER 3
CONTINUOUS RANDOM VARIABLES AND THE NORMAL DISTRIBUTION
1.
2.
Difference Between Discrete and Continuous Random Variables
1.1.
Discrete Random Variables
1.2.
Continuous Random Variables
The Normal Distribution
2.1.
Finding Probabilities in a Normal Distribution
2.1.1.
Using the Standard Normal Distribution to Find Probability of x
2.1.2.
The Standard Normal Table (z-Table)
2.2.
Finding the x Value for a Given Probability
2.3.
z scores Frequently Used in Inferential Statistics
Random variable are divided into two general categories: discrete random variable and continuous random
variables.
1. Difference Between Discrete and Continuous Random Variables
The difference between the probability distribution of discrete versus continuous random variables can be
explained by the way they are presented graphically.
1.1. Discrete Random Variables
In the case of guessing the answers to the 5-question multiple choice exam and letting x denote the number of
correct answers guessed, the probability distribution of x is as follows:
𝑥
0
1
2
3
4
5
𝑓(𝑥)
0.2373
0.3955
0.2637
0.0879
0.0146
0.0010
1.0000
Since x is discrete, it takes on discrete values and the probability of each value is specified. If you can
enumerate or list every value of the random variable, and each value has a non-zero probability, then you
have a discrete random variable.
Note: Some textbooks use the notation P(x) in place of f(x). The term f(x) is called a probability density
function. Probability density refers to the point on the graph corresponding to a given value of the random
variable x. In other words, probability density is the height of the graph at the given value of the random
variable. In the chart below, for example, f(2) = 0.2637.
CH 3—The Normal Distribution
Page 1 of 16
Probability Density Function
of a Discrete Random Variable
0.45
0.3955
0.40
0.35
0.30
f(x) 0.25
0.20
0.2637
0.2373
0.15
0.0879
0.10
0.05
0.0146
0.0010
4
5
0.00
0
1
2
x
3
1.2. Continuous Random Variables
Unlike a discrete random variable, you cannot enumerate or list all the values of a continuous random
variable. A continuous random variable takes on infinite number of values within an interval. Graphically,
the distinguishing features of a continuous random variable are:



The density function of a continuous random variable, 𝑓(𝑥), is not represented by a bar graph. It is,
rather, shown as a continuous graph, a smooth curve.
Because probability cannot be defined for a single value of x, the height of 𝑓(𝑥) at a given value of x
does not represent the probability of that value. The continuous random variable can take on any of
the infinite number of values in a given range. Thus the probability that 𝑥 will be equal to a single
value is 1⁄∞ , which is 0.
Probability is defined, instead, for an interval of 𝑥 and is represented by the area under 𝑓(𝑥) bounded
by that interval.
2. The Normal Distribution
The normal distribution is the most familiar continuous distribution. The normal distribution is a family of
continuous bell-shaped distributions. Each particular normal distribution is defined by two summary
characteristics μ, the mean, and σ, the standard deviation. μ and σ are called the parameters of the
distribution. The shorthand expression of the normal distribution is as follows:
𝑥~𝑁(μ, σ)
The above expression is read: "𝑥 is normally distributed with mean of μ and standard deviation σ."
The mathematical formula for the probability density function of 𝑥~𝑁(μ, σ) is:
𝑓(𝑥) =
1
σ√2π
1 𝑥−μ 2
)
σ
𝑒 −2(
Plotting this function for fixed values of μ and σ and different values of x will generate a bell-shaped curve
which is symmetrical about the mean μ. The following diagram shows the graphs of four normal curves—
four members of the family of infinite number of normal curves. Two curves, with means μ1 = 10 and μ2 = 20,
have different means, but the same standard deviation. The other two share the same mean (μ3 = 40), but
CH 3—The Normal Distribution
Page 2 of 16
have different standard deviations . Note that the smaller the standard deviation σ is, the narrower the curve.
This shows that with a small σ, the values of the random variable x are more clustered around the mean.
Four Members of the Family of Normal Curves
A
B
C
D
10
20
40
The total area under each normal curve is equal to 1. The probability of x taking on values within a given
interval, say 𝑥1 ≤ 𝑥 ≤ 𝑥2 , is the area under the curve bounded by 𝑎 and 𝑏, as shown in the following diagram.
Note that P(𝑥 = 𝑥1 ) = 𝑃(𝑥 = 𝑥2 ) = 0, that is, the probability of 𝑥 being equal to a given value is zero.
Therefore, P(𝑥1 < 𝑥 < 𝑥2 ) = P(𝑥1 ≤ 𝑥 ≤ 𝑥2 ).
Probability is measured as the area under the curve for
an interval of values of x
P(x₁ ≤ x ≤ x₂)
x₁
2.1.
x₂
x
Finding Probabilities in a Normal Distribution
Consider a normally distributed random variable 𝑥 with μ = 10 and σ = 2. What is the probability that 𝑥 is
less than or equal to 8? Alternatively stated, what proportion of the 𝑥 values are less than or equal to 8? Find
𝑃(𝑥 ≤ 8). That is, find the area to the left of 𝑥 = 8 under the normal curve with a mean of 10 and standard
deviation of 2.
CH 3—The Normal Distribution
Page 3 of 16
Probability of x less than or equal to 8 is the area under the
curve to left of x = 8
σ=2
P(x ≤ 8)
8
µ = 10
x
To find this probability, first you have to convert 𝑥 into the standard normal random variable 𝑧.
2.1.1. Using the Standard Normal Distribution to Find Probability of x
Unlike the binomial distribution, there is no simple formula used to find the probability involving the normal
distribution. In the absence of a computer you must rely on a table. To use this table you should first
transform 𝑥 into another random variable called the standard normal random variable, denoted by 𝑧. The
familiar formula to transform 𝑥 into 𝑧 is
𝑧=
𝑥−μ
σ
In Chapter 1 it was shown that 𝑧 measures the deviation of 𝑥 values from the mean in units of standard
deviation. Thus, when 𝑧 = 2, 𝑥 is two standard deviations above than the mean: 𝑥 = μ + 2𝜎. And, when 𝑧 =
−2, 𝑥 is two standard deviations below the mean: 𝑥 = μ − 2𝜎. This leads us to two important properties of 𝑧
μ𝑧 = 0 and σ𝑧 = 1
(See the footnote below)1
Show that μ𝑧 = 0 and σ𝑧 = 1
First, rewrite the 𝑧 equation as,
1
𝑧=
𝑥−μ
μ 1
=− + 𝑥
σ
σ σ
Note that z is the linear transformation of the random variable 𝑥, 𝑧 = 𝑎 + 𝑏𝑥, where 𝑎 = − μ⁄σ and 𝑏 = 1⁄σ. Using the
arithmetic properties of expected value and standard deviation of the random variable x,
μ 1
μ 1
E(𝑧) = − + E(𝑥) = − + μ = 0
σ σ
σ σ
sd(𝑧) =
1
1
sd(𝑥) = σ = 1
σ
σ
CH 3—The Normal Distribution
Page 4 of 16
In short, if 𝑥 is normally distributed with a mean of μ and standard deviation of σ, then 𝑧 is also normally
distributed but with a mean of 0 and standard deviation of 1. These properties of 𝑧 allow for the development
of a single probability table that can be used to find all normal probabilities.
Example 1
Let 𝑥 be a normally distributed random variable with a mean of μ = 10 and standard deviation of σ = 2. Find
P(𝑥 ≤ 8).
First transform 𝑥 to 𝑧:
𝑧=
Thus, given μ = 10, and σ = 2:
𝑥 − μ 8 − 10
=
= −1.00
σ
2
P(𝑥 ≤ 8) = P(𝑧 ≤ −1.00)
Now you need to find the area under the z curve to the left of 𝑧 = −1.00. Using the z table you can see that
this area is equal to 0.1587. Finding this value is explained below.
The mean of the standard
normal distribution z is
always μ = 0 and the standard
deviation is always σ = 1. The
probability of z less than or
equal to −1, P(z ≤ −1), is the
area under the z curve to left
of z = −1.
σ=1
P(z ≤ −1)
-1
2.1.2.
µ=0
z
The Standard Normal Table (z Table)
The z table2 provides the area under the z curve to the left of the z score. For example, if 𝑧 = −1.00, then the
table provides the area under the curve to the left of −1.00. A portion of the z-table is reproduced below.
The complete table shows the areas for z values ranging from −4.00 to 3.99. As the table shows,
P(z < −1.00) = 0.1587.
The z table is available on my website in E270 LECTURE NOTES/2 EXCEL FILES/3 Normal Distribution. Click on the tab
“z table”.
2
CH 3—The Normal Distribution
Page 5 of 16
z
-1.50
-1.49
-1.48
-1.47
-1.46
-1.45
-1.44
-1.43
P(Z < z)
0.0668
0.0681
0.0694
0.0708
0.0721
0.0735
0.0749
0.0764
z
-1.00
-0.99
-0.98
-0.97
-0.96
-0.95
-0.94
-0.93
P(Z < z)
0.1587
0.1611
0.1635
0.1660
0.1685
0.1711
0.1736
0.1762
z
-0.50
-0.49
-0.48
-0.47
-0.46
-0.45
-0.44
-0.43
P(Z < z)
0.3085
0.3121
0.3156
0.3192
0.3228
0.3264
0.3300
0.3336
z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
P(Z < z)
0.5000
0.5040
0.5080
0.5120
0.5160
0.5199
0.5239
0.5279
Example 2
Find P(𝑧 < 1.00).
From the table,
P(𝑧 < 1.00) = 0.8413.
P(z ≤ 1.00) = 0.8413
0.8413
1.00
z
Example 3
Find P(z > 1.25)
From the complete z table,
P(z < 1.25) = 0.8944
Therefore,
P(z > 1.25) = 1 – 0.8944 = 0.1056.
CH 3—The Normal Distribution
Page 6 of 16
P(z ≥ 1.25) = 1 − 0.8944 = 0.1056
0.8944
0.1056
1.25
z
Note: Since the standard normal distribution is symmetric about the mean 0, when asked to find the area to
the right of a given 𝑧 score, , rather than finding the area to the left and subtracting the result from 1, you can
directly find the area by using the negative of that z score. Thus, P(𝑧 > 1.25) = P(𝑧 < −1.25) = 0.1056.
P(z ≥ 1.25) = P(z ≤ −1.25)= 0.1056
0.1056
-1.25
z
Example 4
Find P(−1.40 < 𝑧 < 1.40)
From the complete z table,
P(𝑧 < 1.4) = 0.9192
and
P(z < −1.4) = 0.0808.
Therefore,
P(−1.40 < z < 1.40) = P(z < 1.40) – P(z < −1.40) = 0.9192 – 0.0808 = 0.8384.
CH 3—The Normal Distribution
Page 7 of 16
0.0808
-1.4
0.9192
1.4
z
Example 5
Let x be a normally distributed random variable with a mean of µ = 10 and standard deviation of σ = 2. Find
P(8 < x < 12).
𝜇 = 10
𝑧=
𝜎=2
𝑥 − μ 8 − 10
=
= −1.00
σ
2
and
𝑧=
12 − 10
= 1.00
2
Thus, for the given mean and standard deviation of 𝑥,
P(8 < 𝑥 < 12) = 𝑃(−1.00 < 𝑧 < 1.00)
P(8 < 𝑥 < 12) = P(𝑧 < 1.00) − P(𝑧 < −1.00)
P(8 < 𝑥 < 12) = 8413 − 0.1587 = 0.6827
Note: When asked to find the area between two z scores that are symmetric about the mean 0, it is easier to
simply double the area from the left tail and subtract the result from 1.
P(−1.00 < 𝑧 < 1.00) = 1 − 2 × 𝑃(𝑧 < −1.00)
P(1.00 < 𝑧 < 1.00) = 1 − 2 × 0.1587 = 0.6827
CH 3—The Normal Distribution
Page 8 of 16
0.6827
0.1587
0.1587
-1.00
z
1.00
Example 6
Given μ = 10 and σ = 2, find P(6 < 𝑥 < 14).
𝑧1 =
6 − 10
= −2.00
2
𝑧2 =
14 − 10
= 2.00
2
P(−2.00 < 𝑧 < 2.00) = P(𝑧 < 2.00) − P(𝑧 < −2.00)
P(−2.00 < 𝑧 < 2.00) = 0.9772 − 0.0228 = 0.9544
Alternatively, you can find the same probability by,
P(−2.00 < 𝑧 < 2.00) = 1 − 2 × 𝑃(𝑧 < −1.00)
P(−2.00 < 𝑧 < 2.00) = 1 − 2 × 0.0228 = 0.9544
0.9545
0.0228
-2.00
0.0228
2.00
z
Example 7
Suppose the vehicle speed on I-65 between Lebanon and Gary is normally distributed with a mean of 74 mph
and a standard deviation of 5 mph.
CH 3—The Normal Distribution
Page 9 of 16
a) What is the probability that a vehicle clocked at random is traveling slower that 65 mph? Or, what
proportion of vehicles are traveling below 65 mph? Given 𝜇 = 74 and 𝜎 = 5, find
P(𝑥 < 65)
Solution
𝑧=
65 − 74
= −1.80
5
P(𝑧 < −1.80) = 0.0359
b) What is the probability that the vehicle is traveling faster than 80 mph? Or, what proportion of vehicles
are traveling faster than 80 mph? Given 𝜇 = 74 and 𝜎 = 5, find
P(𝑥 > 80)
Solution
𝑧=
80 − 74
= 1.20
5
P(𝑧 > 1.20) = 1 − P(𝑧 < 1.20)
P(𝑧 > 1.20) = 1 − 0.8849 = 0.1151
Note: To find the area or probability to the right of a 𝑧-score, look up the area to the left of the negative of
that 𝑧-score:
P(𝑧 > 1.20) = P(𝑧 < −1.20) = 0.1151
c) What proportion of drivers drive between 75 and 85 mph? Given 𝜇 = 74 and 𝜎 = 5, find
P(75 < 𝑥 < 85)
Solution
𝑧1 =
75 − 74
= 0.20
5
𝑧2 =
85 − 74
= 2.20
5
P(0.20 < 𝑧 < 2.20) = P(𝑧 < 2.20) − P(𝑧 < 0.20)
P(−2.00 < 𝑧 < 2.00) = 0.9861 − 0.5793 = 0.4068
d) What fraction or proportion of vehicles drive within ±6 mph from the mean? Given 𝜇 = 74 and 𝜎 = 5,
find
P(𝜇 − 6 < 𝑥 < 𝜇 + 6) = P(68 < 𝑥 < 80)
Solution
CH 3—The Normal Distribution
Page 10 of 16
𝑧1 =
68 − 74
= −1.20
5
𝑧2 =
80 − 74
= 1.20
5
P(−1.20 < 𝑧 < 1.20) = 1 − 2 × P(𝑧 < −1.20)
P(−1.20 < 𝑧 < 1.20) = 1 − 2 × 0.1151 = 0.7699
e) What fraction or proportion of vehicles drive within ±2 standard deviations from the mean? Given μ =
74 and σ = 5, find
P(𝜇 − 2𝜎 < 𝑥 < 𝜇 + 2𝜎) = P(74 − 2 × 5 < 𝑥 < 74 + 2 × 5)
P(𝜇 − 2𝜎 < 𝑥 < 𝜇 + 2𝜎) = P(64 < 𝑥 < 84)
𝑧1 =
64 − 74
= −2.00
5
𝑧2 =
84 − 74
= 2.00
5
Note: The term “2 standard deviations from the mean” is an expression of the distance of 𝑥 from the mean in
units of standard deviation, that is, 𝑧 = 2.00. Thus,
P(−2.00 < 𝑧 < 2.00) = 0.9544
2.2. Finding the x Value for a Given Probability
In many statistical problems involving the normal distribution you will be asked to find the value or an
interval of values of 𝑥 that bounds a given area under the normal curve. In other words, you will be given a
probability and asked to find the 𝑥 value (or values) corresponding to that probability.
Example 8
Let 𝑥 be a normally distributed random variable with 𝜇 = 20 and 𝜎 = 5. The area under the normal curve to
left of an unknown value of 𝑥 is 0.2005. Find the 𝑥 value.
CH 3—The Normal Distribution
Page 11 of 16
Find the x values that bound a left-tail area of 0.2005
x
μ = 74
x
To find 𝑥, you have to work your way back starting with the 𝑧 table. Look up the z score that corresponds to
the area or probability 0.2005. The z score corresponding to this area or probability is 𝑧1 = −0.84.
Find the z values that bound a left-tail area of 0.2005
z
μ=0
z
Now, using
𝑧=
𝑥−μ
σ
solve for 𝑥:
𝑥 = μ + 𝑧𝜎
Thus,
𝑥 = 20 + (−0.84)(5) = 15.8
Example 9
Let 𝑥 be a normally distributed random variable with 𝜇 = 20 and 𝜎 = 5. The area under the normal curve to
right of an unknown value of 𝑥 is 0.2546. Find the 𝑥 value.
CH 3—The Normal Distribution
Page 12 of 16
Solution
Find the x values that bound a right-tail area of 0.3000
μ = 74 x
x
In the 𝑧 table, the 𝑧 score corresponding to the area 0.2546 is −0.66. However, since we are interested in the
right tail area, using the symmetric property of the standard normal distribution, we ignore the negative sign.
Thus:
𝑥 = μ + 𝑧σ = 20 + (0.66)(5) = 23.3
Example 10
Suppose the vehicle speed on the rural stretch of I-65 is normally distributed with a mean of 74 mph and a
standard deviation of 8 mph. If the State Police planned to ticket the top 30 percent of the speeders, above
what speed should the State Police issue tickets?
Find the x values that bound a middle area of 0.3000
μ = 74 x
x
The closest area in the z table to 0.3000 is 0.3015, the corresponding z score for which is −0.52. But since we
are looking for the z score for the top 30 percent we ignore the negative sign.
𝑥 = μ + 𝑧σ = 74 + (0.52)(8) = 78.16
The State Police should issue tickets for vehicles going over 78 mph.
Example 11
Suppose the vehicle speed on the rural stretch of I-65 is normally distributed with a mean of 74 mph and a
standard deviation of 8 mph. Find the middle interval of speeds within which one-half (0.5 or 50%) of the
vehicles drive.
CH 3—The Normal Distribution
Page 13 of 16
Solutions
Denote the lower end or boundary of the interval as 𝑥1 and the upper end as 𝑥2 . Then we must find the values
for 𝑥1 and 𝑥2 such that
P(𝑥1 < 𝑥 < 𝑥2 ) = 0.50
Find the x values that bound a middle area of 0.5000
x₁
μ = 74
x₂
x
We can find the lower and upper boundary, respectively from 𝑥1 = μ − 𝑧σ and 𝑧2 = μ + 𝑧σ. Here for 𝑧 we
must insert the 𝑧 score that bounds a tail area of 0.25. This 𝑧 score is 0.67.
𝑥1 = μ − 𝑧σ = 74 − (0.67)(8) = 74 − 5.36 = 68.64
𝑥2 = μ + 𝑧σ = 74 + (0.67)(8) = 74 + 5.36 = 79.36
The above computation indicates that 50% of vehicles drive within ±5.36 mph from the mean speed.
Example 12
In the previous problem, within what speed interval do 90% of vehicles drive? Given 𝜇 = 74 and 𝜎 = 8, find
𝑥1 and 𝑥2 such that,
P(𝑥1 < 𝑥 < 𝑥2 ) = 0.90
Solution
Again, 𝑥1 = μ − 𝑧σ and 𝑥2 = μ + 𝑧σ . Here for z we must insert the z score that bounds a tail area of 0.05. This
z score is 1.64.
𝑥1 = μ − 𝑧0.05 σ = 74 − (1.64)(8) = 74 − 13.12 = 60.9
𝑥2 = μ + 𝑧0.05 σ = 74 + (1.64)(8) = 74 + 13.12 = 87.1
The above computation indicates that 90% of vehicles drive within ±13.12 mph from the average speed.
Example 13
Find the z scores corresponding to the following right tail areas under the standard normal curve.
a) 0.10
b) 0.05
c) 0.025
a) Denote the z score corresponding to the 0.10 right tail area as 𝑧0.10 . Then, using the symmetric property
of the standard normal distribution,
CH 3—The Normal Distribution
Page 14 of 16
P(𝑧 > 𝑧0.10 ) = P(𝑧 < −𝑧0.10 ) = 0.10
In the z table the area closest to 0.10 is 0.1003. The z score corresponding to this area is −1.28. Therefore, on
the right tail z0.10 = 1.28
The z score that bounds a right tail area of 0.1003
0.1003
0
z = 1.28
z
b) P(𝑧 > 𝑧0.05 ) = 0.05
P(𝑧 > 𝑧0.05 ) = P(𝑧 < −𝑧0.05 ) = 0.05
In the z table there are two areas that are equally close to 0.05. These are 0.0495 and 0.0505. Generally, the
higher number, 0.0505, is chosen because if the area were rounded to five decimals, 0.05050 is closer to 0.05
than 0.04947. The z score corresponding to 0.0505 is −1.64. Therefore,
𝑧0.05 = 1.64.
c) P(𝑧 > 𝑧0.025 ) = P(𝑧 < −𝑧0.025 )
The z score corresponding to 0.025 is −1.96. Therefore
𝑧0.025 = 1.96.
Generally, let α denote the right tail area under the z curve bounded by zα. Then
P(𝑧 > 𝑧α ) = P(𝑧 < −𝑧α )
CH 3—The Normal Distribution
Page 15 of 16
2.3.
z scores Frequently Used in Inferential Statistics
The three z scores calculated above will be frequently used in the subsequent chapters covering inferential
statistics. You should simply memorize them.

The z score that bounds a right tail area of 0.10: 𝒛𝟎.𝟏𝟎 = 𝟏. 𝟐𝟖

The z score that bounds a right tail area of 0.05: 𝒛𝟎.𝟎𝟓 = 𝟏. 𝟔𝟒

The z score that bounds a right-tail area of 0.025: 𝒛𝟎.𝟎𝟐𝟓 = 𝟏. 𝟗𝟔
0.05
0.10
0
1.28
CH 3—The Normal Distribution
z
0
1.64
0.025
z
0
1.96
z
Page 16 of 16
Download