Normal Distribution

advertisement
Normal Distribution
Definition
A continuous rv X is said to have a normal distribution with parameter µ
and σ (µ and σ 2 ), where −∞ < µ < ∞ and σ > 0, if the pdf of X is
f (x; µ, σ) = √
1
2
2
e −(x−µ) /(2σ )
2πσ
We use the notation X ∼ N(µ, σ 2 ) to denote that X is rormally
distributed with parameters µ and σ 2 .
Remark:
1. Obviously, f (x) ≥ 0 for
R ∞all x;1 −(y −µ)2 /(2σ2 )
2. It is guaranteed that −∞ √2πσ
e
dy = 1.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
1 / 20
Normal Distribution
Proposition
For X ∼ N(µ, σ 2 ), we have
E (X ) = µ and V (X ) = σ 2
σ=1
Liang Zhang (UofU)
σ=2
Applied Statistics I
σ = 0.5
June 30, 2008
2 / 20
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
3 / 20
Normal Distribution
The cdf of a normal random variable X is
Z x
F (x) = P(X ≤ x) =
f (y ; µ, σ)dy
−∞
Z x
1
2
2
√
=
e −(y −µ) /(2σ ) dy
2πσ
−∞ Z
x−µ
1
2
2
=√
e −(z) /(2σ ) dz
change of variable:z = y − µ
2πσ −∞
Z x−µ
σ
z
1
2
e −(w ) /2 · σdw
change of variable:w =
=√
σ
2πσ −∞
Z x−µ
σ
1
2
√ e −(w ) /2 dw
=
2π
−∞
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
4 / 20
Normal Distribution
Definition
The normal distribution with parameter values µ = 0 and σ = 1 is called
the standard normal distribution. A random variable having a standard
normal distribution is called a standard normal random variable and will
be denoted by Z . The pdf of Z is
1
2
f (z; 0, 1) = √ e −z /2
2π
−∞<z <∞
The graph of f (z; 0, R1) is called the standard normal (or z) curve. The cdf
z
of Z is P(Z ≤ z) = −∞ f (y ; 0, 1)dy , which we will denote by Φ(z).
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
5 / 20
Normal Distribution
Shaded area = Φ(0.5)
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
6 / 20
Normal Distribution
Table A.3
z
···
-1.2
-1.1
···
1.6
1.7
···
Standard Normal Curve Areas
.00
···
0.1151
0.1357
···
0.9452
0.9554
···
Liang Zhang (UofU)
.01
···
0.1131
0.1335
···
0.9463
0.9564
···
.02
···
0.1112
0.1314
···
0.9474
0.9573
···
.03
···
0.1094
0.1292
···
0.9484
0.9582
···
Applied Statistics I
.04
···
0.1075
0.1271
···
0.9495
0.9591
···
···
···
···
···
···
···
···
···
.09
···
0.0985
0.1170
···
0.9545
0.9633
···
June 30, 2008
7 / 20
Normal Distribution
Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and
(c)P(−1.12 < Z ≤ 1.61).
z
···
-1.2
-1.1
···
1.6
1.7
···
.00
···
0.1151
0.1357
···
0.9452
0.9554
···
.01
···
0.1131
0.1335
···
0.9463
0.9564
···
.02
···
0.1112
0.1314
···
0.9474
0.9573
···
.03
···
0.1094
0.1292
···
0.9484
0.9582
···
.04
···
0.1075
0.1271
···
0.9495
0.9591
···
···
···
···
···
···
···
···
···
.09
···
0.0985
0.1170
···
0.9545
0.9633
···
P(Z ≤ 1.61) = 0.9463;
P(Z > −1.12) = 1 − P(Z ≤ −1.12) = 1 − 0.1314 = 0.8686;
P(−1.12 < Z ≤ 1.61) = P(Z ≤ 1.61) − P(Z ≤ −1.12) =
0.9463 − 0.1314 = 0.8149.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
8 / 20
Normal Distribution
Many tables for the normal distribution contain only the nonnegative part.
z
.00
.01
.02
.03
.04
···
.09
···
···
···
···
···
···
···
···
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 · · · 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 · · · 0.9633
···
···
···
···
···
···
···
···
What is P(Z < −1.63)?
By symmetry of the pdf of Z , we know that
P(Z < −1.63) = P(Z > 1.63) = 1 − P(Z ≤ 1.63) = 1 − 0.9484 = 0.0516
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
9 / 20
Normal Distribution
Recall: The (100p)th percentile of the distribution of a continuous rv X ,
η(p), is defined by
Z
η(p)
p = F (η(p)) =
f (y )dy
−∞
Similarly, the (100p)th percentile of the standard normal rv Z is defined by
Z
η(p)
p = F (η(p)) =
−∞
1
2
√ e −y /2 dy
2π
We need to use the table for normal distribution to find (100p)th
percentile.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
10 / 20
Normal Distribution
e.g. Find the 95th percentile for the standard normal rv Z
z
.00
.01
.02
.03
.04
0.5
···
···
···
···
···
···
···
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599
···
···
···
···
···
···
···
η(95) = 1.645, a linear interpolation of 1.64 and 1.65.
···
···
···
···
···
.09
···
0.9545
0.9633
···
Remark: If p does not appear in the table, we can either use the number
closest to it, or use the linear interpolation of the closest two.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
11 / 20
Normal Distribution
In statistical inference, the percentiles corresponding to right small tails
are heavily used.
Notation
zα will denote the value on the z axis for which α of the area under the z
curve lies to the right of zα .
zα
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
12 / 20
Normal Distribution
Remark:
1. zα is the 100(1 − α)th percentile of the standard normal distribution.
2. By symmetry the area under the standard normal curve to the left of
−zα is also α.
3. The zα s are usually referred to as z critical values.
Percentile
α (tail area)
zα
90
0.1
1.28
Liang Zhang (UofU)
95
0.05
1.645
97.5
0.025
1.96
···
···
···
Applied Statistics I
99.95
0.0005
3.27
June 30, 2008
13 / 20
Normal Distribution
Proposition
If X has a normal distribution with mean µ and stadard deviation σ, then
Z=
X −µ
σ
has a standard normal distribution. Thus
a−µ
b−µ
≤Z ≤
)
σ
σ
b−µ
a−µ
= Φ(
) − Φ(
)
σ
σ
P(a ≤ X ≤ b) = P(
P(X ≤ a) = Φ(
Liang Zhang (UofU)
a−µ
)
σ
P(X ≥ b) = 1 − Φ(
Applied Statistics I
b−µ
)
σ
June 30, 2008
14 / 20
Normal Distribution
Example (Problem 38):
There are two machines available for cutting corks intended for use in wine
bottles. The first produces corks with diameters that are normally
distributed with mean 3cm and standard deviation 0.1cm. The second
produces corks with diameters that have a normal distribution with mean
3.04cm and standard deviation 0.02cm. Acceptable corks have diameters
between 2.9cm and 3.1cm. Which machine is more likely to produce an
acceptable cork?
3.1 − 3
2.9 − 3
≤Z ≤
)
0.1
0.1
= P(−1 ≤ Z ≤ 1) = 0.8413 − 0.1587 = 0.6826
2.9 − 3.04
3.1 − 3.04
P(2.9 ≤ X2 ≤ 3.1) = P(
≤Z ≤
)
0.02
0.02
= P(−7 ≤ Z ≤ 3) = 0.9987 − 0 = 0.9987
P(2.9 ≤ X1 ≤ 3.1) = P(
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
15 / 20
Normal Distribution
Example (Problem 44):
If bolt thread length is normally distributed, what is the probability that
the thread length of a randomly selected bolt is (a)within 1.5 SDs of its
mean value? (b)between 1 and 2 SDs from its mean value?
µ + 1.5σ − µ
µ − 1.5σ − µ
≤Z ≤
)
σ
σ
= P(−1.5 ≤ Z ≤ 1.5)
P(µ − 1.5σ ≤ X1 ≤ µ + 1.5σ) = P(
= 0.9332 − 0.0668 = 0.8664
µ+σ−µ
µ + 2σ − µ
≤Z ≤
)
σ
σ
= 2P(1 ≤ Z ≤ 2)
2 · P(µ + σ ≤ X1 ≤ µ + 2σ) = 2P(
= 2(0.9772 − 0.8413) = 0.0.2718
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
16 / 20
Normal Distribution
Proposition
{(100p)th percentile for N(µ, σ 2 )} =
µ + {(100p)th percentile for N(0, 1)} · σ
Example (Problem 39)
The width of a line etched on an integrated circuit chip is normally
distributed with mean 3.000 µm and standard deviation 0.140. What
width value separates the widest 10% of all such lines from the other 90%?
ηN(3,0.1402 ) (90) = 3.0 + 0.140 · ηN(0,1) (90) = 3.0 + 0.140 · 1.28 = 3.1792
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
17 / 20
Normal Distribution
Proposition
Let X be a binomial rv based on n trials with success probability p. Then
if the binomial probability histogram is not too skewed, X has
√
approximately a normal distribution with µ = np and σ = npq, where
q = 1 − p. In particular, for x = a posible value of X ,
area under the normal curve
P(X ≤ x) = B(x; n, p) ≈
to the left of x+0.5
x+0.5 − np
= Φ( √
)
npq
In practice, the approximation is adequate provided that both np ≥ 10 and
nq ≥ 10, since there is then enough symmetry in the underlying binomial
distribution.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
18 / 20
Normal Distribution
A graphical explanation for
P(X ≤ x) = B(x; n, p) ≈
= Φ(
Liang Zhang (UofU)
area under the normal curve
to the left of x+0.5
x+0.5 − np
)
√
npq
Applied Statistics I
June 30, 2008
19 / 20
Normal Distribution
Example (Problem 54)
Suppose that 10% of all steel shafts produced by a certain process are
nonconforming but can be reworked (rather than having to be scrapped).
Consider a random sample of 200 shafts, and let X denote the number
among these that are nonconforming and can be reworked. What is the
(approximate) probability that X is between 15 and 25 (inclusive)?
In this problem n = 200, p = 0.1 and q = 1 − p = 0.9. Thus
np = 20 > 10 and nq = 180 > 10
P(15 ≤ X ≤ 25) = Bin(25; 200, 0.1) − Bin(14; 200, 0.1)
25 + 0.5 − 20
15 + 0.5 − 20
) − Φ( √
)
≈ Φ( √
200 · 0.1 · 0.9
200 · 0.1 · 0.9
= Φ(0.3056) − Φ(−0.2500)
= 0.6217 − 0.4013
= 0.2204
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
20 / 20
Download