Applied Statistics I Liang Zhang June 30, 2008

advertisement
Applied Statistics I
Liang Zhang
Department of Mathematics, University of Utah
June 30, 2008
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
1 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
2 / 41
Normal Distribution
Definition
A continuous rv X is said to have a normal distribution with parameter µ
and σ (µ and σ 2 ), where −∞ < µ < ∞ and σ > 0, if the pdf of X is
f (x; µ, σ) = √
1
2
2
e −(x−µ) /(2σ )
2πσ
We use the notation X ∼ N(µ, σ 2 ) to denote that X is rormally
distributed with parameters µ and σ 2 .
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
2 / 41
Normal Distribution
Definition
A continuous rv X is said to have a normal distribution with parameter µ
and σ (µ and σ 2 ), where −∞ < µ < ∞ and σ > 0, if the pdf of X is
f (x; µ, σ) = √
1
2
2
e −(x−µ) /(2σ )
2πσ
We use the notation X ∼ N(µ, σ 2 ) to denote that X is rormally
distributed with parameters µ and σ 2 .
Remark:
1. Obviously, f (x) ≥ 0 for
R ∞all x;1 −(y −µ)2 /(2σ2 )
2. It is guaranteed that −∞ √2πσ
e
dy = 1.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
2 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
3 / 41
Normal Distribution
Proposition
For X ∼ N(µ, σ 2 ), we have
E (X ) = µ and V (X ) = σ 2
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
3 / 41
Normal Distribution
Proposition
For X ∼ N(µ, σ 2 ), we have
E (X ) = µ and V (X ) = σ 2
σ=1
Liang Zhang (UofU)
σ=2
Applied Statistics I
σ = 0.5
June 30, 2008
3 / 41
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
4 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
5 / 41
Normal Distribution
The cdf of a normal random variable X is
Z x
F (x) = P(X ≤ x) =
f (y ; µ, σ)dy
−∞
Z x
1
2
2
√
e −(y −µ) /(2σ ) dy
=
2πσ
−∞ Z
x−µ
1
2
2
=√
e −(z) /(2σ ) dz
change of variable:z = y − µ
2πσ −∞
Z x−µ
σ
z
1
2
e −(w ) /2 · σdw
change of variable:w =
=√
σ
2πσ −∞
Z x−µ
σ
1
2
√ e −(w ) /2 dw
=
2π
−∞
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
5 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
6 / 41
Normal Distribution
Definition
The normal distribution with parameter values µ = 0 and σ = 1 is called
the standard normal distribution. A random variable having a standard
normal distribution is called a standard normal random variable and will
be denoted by Z . The pdf of Z is
1
2
f (z; 0, 1) = √ e −z /2
2π
−∞<z <∞
The graph of f (z; 0, R1) is called the standard normal (or z) curve. The cdf
z
of Z is P(Z ≤ z) = −∞ f (y ; 0, 1)dy , which we will denote by Φ(z).
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
6 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
7 / 41
Normal Distribution
Shaded area = Φ(0.5)
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
7 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
8 / 41
Normal Distribution
Table A.3
z
···
-1.2
-1.1
···
1.6
1.7
···
Standard Normal Curve Areas
.00
···
0.1151
0.1357
···
0.9452
0.9554
···
Liang Zhang (UofU)
.01
···
0.1131
0.1335
···
0.9463
0.9564
···
.02
···
0.1112
0.1314
···
0.9474
0.9573
···
.03
···
0.1094
0.1292
···
0.9484
0.9582
···
Applied Statistics I
.04
···
0.1075
0.1271
···
0.9495
0.9591
···
···
···
···
···
···
···
···
···
.09
···
0.0985
0.1170
···
0.9545
0.9633
···
June 30, 2008
8 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
9 / 41
Normal Distribution
Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and
(c)P(−1.12 < Z ≤ 1.61).
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
9 / 41
Normal Distribution
Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and
(c)P(−1.12 < Z ≤ 1.61).
z
···
-1.2
-1.1
···
1.6
1.7
···
.00
···
0.1151
0.1357
···
0.9452
0.9554
···
.01
···
0.1131
0.1335
···
0.9463
0.9564
···
Liang Zhang (UofU)
.02
···
0.1112
0.1314
···
0.9474
0.9573
···
.03
···
0.1094
0.1292
···
0.9484
0.9582
···
.04
···
0.1075
0.1271
···
0.9495
0.9591
···
Applied Statistics I
···
···
···
···
···
···
···
···
.09
···
0.0985
0.1170
···
0.9545
0.9633
···
June 30, 2008
9 / 41
Normal Distribution
Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and
(c)P(−1.12 < Z ≤ 1.61).
z
···
-1.2
-1.1
···
1.6
1.7
···
.00
···
0.1151
0.1357
···
0.9452
0.9554
···
.01
···
0.1131
0.1335
···
0.9463
0.9564
···
.02
···
0.1112
0.1314
···
0.9474
0.9573
···
.03
···
0.1094
0.1292
···
0.9484
0.9582
···
.04
···
0.1075
0.1271
···
0.9495
0.9591
···
···
···
···
···
···
···
···
···
.09
···
0.0985
0.1170
···
0.9545
0.9633
···
P(Z ≤ 1.61) = 0.9463;
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
9 / 41
Normal Distribution
Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and
(c)P(−1.12 < Z ≤ 1.61).
z
···
-1.2
-1.1
···
1.6
1.7
···
.00
···
0.1151
0.1357
···
0.9452
0.9554
···
.01
···
0.1131
0.1335
···
0.9463
0.9564
···
.02
···
0.1112
0.1314
···
0.9474
0.9573
···
.03
···
0.1094
0.1292
···
0.9484
0.9582
···
.04
···
0.1075
0.1271
···
0.9495
0.9591
···
···
···
···
···
···
···
···
···
.09
···
0.0985
0.1170
···
0.9545
0.9633
···
P(Z ≤ 1.61) = 0.9463;
P(Z > −1.12) = 1 − P(Z ≤ −1.12) = 1 − 0.1314 = 0.8686;
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
9 / 41
Normal Distribution
Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and
(c)P(−1.12 < Z ≤ 1.61).
z
···
-1.2
-1.1
···
1.6
1.7
···
.00
···
0.1151
0.1357
···
0.9452
0.9554
···
.01
···
0.1131
0.1335
···
0.9463
0.9564
···
.02
···
0.1112
0.1314
···
0.9474
0.9573
···
.03
···
0.1094
0.1292
···
0.9484
0.9582
···
.04
···
0.1075
0.1271
···
0.9495
0.9591
···
···
···
···
···
···
···
···
···
.09
···
0.0985
0.1170
···
0.9545
0.9633
···
P(Z ≤ 1.61) = 0.9463;
P(Z > −1.12) = 1 − P(Z ≤ −1.12) = 1 − 0.1314 = 0.8686;
P(−1.12 < Z ≤ 1.61) = P(Z ≤ 1.61) − P(Z ≤ −1.12) =
0.9463 − 0.1314 = 0.8149.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
9 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
10 / 41
Normal Distribution
Many tables for the normal distribution contain only the nonnegative part.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
10 / 41
Normal Distribution
Many tables for the normal distribution contain only the nonnegative part.
z
.00
.01
.02
···
···
···
···
1.6 0.9452 0.9463 0.9474
1.7 0.9554 0.9564 0.9573
···
···
···
···
What is P(Z < −1.63)?
Liang Zhang (UofU)
.03
···
0.9484
0.9582
···
Applied Statistics I
.04
···
0.9495
0.9591
···
···
···
···
···
···
.09
···
0.9545
0.9633
···
June 30, 2008
10 / 41
Normal Distribution
Many tables for the normal distribution contain only the nonnegative part.
z
.00
.01
.02
.03
.04
···
.09
···
···
···
···
···
···
···
···
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 · · · 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 · · · 0.9633
···
···
···
···
···
···
···
···
What is P(Z < −1.63)?
By symmetry of the pdf of Z , we know that
P(Z < −1.63) = P(Z > 1.63) = 1 − P(Z ≤ 1.63) = 1 − 0.9484 = 0.0516
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
10 / 41
Normal Distribution
Many tables for the normal distribution contain only the nonnegative part.
z
.00
.01
.02
.03
.04
···
.09
···
···
···
···
···
···
···
···
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 · · · 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 · · · 0.9633
···
···
···
···
···
···
···
···
What is P(Z < −1.63)?
By symmetry of the pdf of Z , we know that
P(Z < −1.63) = P(Z > 1.63) = 1 − P(Z ≤ 1.63) = 1 − 0.9484 = 0.0516
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
10 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
11 / 41
Normal Distribution
Recall: The (100p)th percentile of the distribution of a continuous rv X ,
η(p), is defined by
Z
η(p)
p = F (η(p)) =
f (y )dy
−∞
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
11 / 41
Normal Distribution
Recall: The (100p)th percentile of the distribution of a continuous rv X ,
η(p), is defined by
Z
η(p)
p = F (η(p)) =
f (y )dy
−∞
Similarly, the (100p)th percentile of the standard normal rv Z is defined by
Z
η(p)
p = F (η(p)) =
−∞
Liang Zhang (UofU)
1
2
√ e −y /2 dy
2π
Applied Statistics I
June 30, 2008
11 / 41
Normal Distribution
Recall: The (100p)th percentile of the distribution of a continuous rv X ,
η(p), is defined by
Z
η(p)
p = F (η(p)) =
f (y )dy
−∞
Similarly, the (100p)th percentile of the standard normal rv Z is defined by
Z
η(p)
p = F (η(p)) =
−∞
1
2
√ e −y /2 dy
2π
We need to use the table for normal distribution to find (100p)th
percentile.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
11 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
12 / 41
Normal Distribution
e.g. Find the 95th percentile for the standard normal rv Z
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
12 / 41
Normal Distribution
e.g. Find the 95th percentile for the standard normal rv Z
z
···
1.6
1.7
···
.00
···
0.9452
0.9554
···
.01
···
0.9463
0.9564
···
Liang Zhang (UofU)
.02
···
0.9474
0.9573
···
.03
···
0.9484
0.9582
···
Applied Statistics I
.04
···
0.9495
0.9591
···
0.5
···
0.9505
0.9599
···
···
···
···
···
···
June 30, 2008
.09
···
0.9545
0.9633
···
12 / 41
Normal Distribution
e.g. Find the 95th percentile for the standard normal rv Z
z
.00
.01
.02
.03
.04
0.5
···
···
···
···
···
···
···
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599
···
···
···
···
···
···
···
η(95) = 1.645, a linear interpolation of 1.64 and 1.65.
Liang Zhang (UofU)
Applied Statistics I
···
···
···
···
···
June 30, 2008
.09
···
0.9545
0.9633
···
12 / 41
Normal Distribution
e.g. Find the 95th percentile for the standard normal rv Z
z
.00
.01
.02
.03
.04
0.5
···
···
···
···
···
···
···
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599
···
···
···
···
···
···
···
η(95) = 1.645, a linear interpolation of 1.64 and 1.65.
···
···
···
···
···
.09
···
0.9545
0.9633
···
Remark: If p does not appear in the table, we can either use the number
closest to it, or use the linear interpolation of the closest two.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
12 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
13 / 41
Normal Distribution
In statistical inference, the percentiles corresponding to right small tails
are heavily used.
Notation
zα will denote the value on the z axis for which α of the area under the z
curve lies to the right of zα .
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
13 / 41
Normal Distribution
In statistical inference, the percentiles corresponding to right small tails
are heavily used.
Notation
zα will denote the value on the z axis for which α of the area under the z
curve lies to the right of zα .
zα
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
13 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
14 / 41
Normal Distribution
Remark:
1. zα is the 100(1 − α)th percentile of the standard normal distribution.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
14 / 41
Normal Distribution
Remark:
1. zα is the 100(1 − α)th percentile of the standard normal distribution.
2. By symmetry the area under the standard normal curve to the left of
−zα is also α.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
14 / 41
Normal Distribution
Remark:
1. zα is the 100(1 − α)th percentile of the standard normal distribution.
2. By symmetry the area under the standard normal curve to the left of
−zα is also α.
3. The zα s are usually referred to as z critical values.
Percentile
α (tail area)
zα
90
0.1
1.28
Liang Zhang (UofU)
95
0.05
1.645
97.5
0.025
1.96
···
···
···
Applied Statistics I
99.95
0.0005
3.27
June 30, 2008
14 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
15 / 41
Normal Distribution
Proposition
If X has a normal distribution with mean µ and stadard deviation σ, then
Z=
X −µ
σ
has a standard normal distribution. Thus
a−µ
b−µ
≤Z ≤
)
σ
σ
b−µ
a−µ
= Φ(
) − Φ(
)
σ
σ
P(a ≤ X ≤ b) = P(
P(X ≤ a) = Φ(
Liang Zhang (UofU)
a−µ
)
σ
P(X ≥ b) = 1 − Φ(
Applied Statistics I
b−µ
)
σ
June 30, 2008
15 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
16 / 41
Normal Distribution
Example (Problem 38):
There are two machines available for cutting corks intended for use in wine
bottles. The first produces corks with diameters that are normally
distributed with mean 3cm and standard deviation 0.1cm. The second
produces corks with diameters that have a normal distribution with mean
3.04cm and standard deviation 0.02cm. Acceptable corks have diameters
between 2.9cm and 3.1cm. Which machine is more likely to produce an
acceptable cork?
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
16 / 41
Normal Distribution
Example (Problem 38):
There are two machines available for cutting corks intended for use in wine
bottles. The first produces corks with diameters that are normally
distributed with mean 3cm and standard deviation 0.1cm. The second
produces corks with diameters that have a normal distribution with mean
3.04cm and standard deviation 0.02cm. Acceptable corks have diameters
between 2.9cm and 3.1cm. Which machine is more likely to produce an
acceptable cork?
2.9 − 3
3.1 − 3
≤Z ≤
)
0.1
0.1
= P(−1 ≤ Z ≤ 1) = 0.8413 − 0.1587 = 0.6826
2.9 − 3.04
3.1 − 3.04
P(2.9 ≤ X2 ≤ 3.1) = P(
≤Z ≤
)
0.02
0.02
= P(−7 ≤ Z ≤ 3) = 0.9987 − 0 = 0.9987
P(2.9 ≤ X1 ≤ 3.1) = P(
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
16 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
17 / 41
Normal Distribution
Example (Problem 44):
If bolt thread length is normally distributed, what is the probability that
the thread length of a randomly selected bolt is (a)within 1.5 SDs of its
mean value? (b)between 1 and 2 SDs from its mean value?
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
17 / 41
Normal Distribution
Example (Problem 44):
If bolt thread length is normally distributed, what is the probability that
the thread length of a randomly selected bolt is (a)within 1.5 SDs of its
mean value? (b)between 1 and 2 SDs from its mean value?
µ + 1.5σ − µ
µ − 1.5σ − µ
≤Z ≤
)
σ
σ
= P(−1.5 ≤ Z ≤ 1.5)
P(µ − 1.5σ ≤ X1 ≤ µ + 1.5σ) = P(
= 0.9332 − 0.0668 = 0.8664
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
17 / 41
Normal Distribution
Example (Problem 44):
If bolt thread length is normally distributed, what is the probability that
the thread length of a randomly selected bolt is (a)within 1.5 SDs of its
mean value? (b)between 1 and 2 SDs from its mean value?
µ + 1.5σ − µ
µ − 1.5σ − µ
≤Z ≤
)
σ
σ
= P(−1.5 ≤ Z ≤ 1.5)
P(µ − 1.5σ ≤ X1 ≤ µ + 1.5σ) = P(
= 0.9332 − 0.0668 = 0.8664
µ+σ−µ
µ + 2σ − µ
≤Z ≤
)
σ
σ
= 2P(1 ≤ Z ≤ 2)
2 · P(µ + σ ≤ X1 ≤ µ + 2σ) = 2P(
= 2(0.9772 − 0.8413) = 0.0.2718
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
17 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
18 / 41
Normal Distribution
Proposition
{(100p)th percentile for N(µ, σ 2 )} =
µ + {(100p)th percentile for N(0, 1)} · σ
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
18 / 41
Normal Distribution
Proposition
{(100p)th percentile for N(µ, σ 2 )} =
µ + {(100p)th percentile for N(0, 1)} · σ
Example (Problem 39)
The width of a line etched on an integrated circuit chip is normally
distributed with mean 3.000 µm and standard deviation 0.140. What
width value separates the widest 10% of all such lines from the other 90%?
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
18 / 41
Normal Distribution
Proposition
{(100p)th percentile for N(µ, σ 2 )} =
µ + {(100p)th percentile for N(0, 1)} · σ
Example (Problem 39)
The width of a line etched on an integrated circuit chip is normally
distributed with mean 3.000 µm and standard deviation 0.140. What
width value separates the widest 10% of all such lines from the other 90%?
ηN(3,0.1402 ) (90) = 3.0 + 0.140 · ηN(0,1) (90) = 3.0 + 0.140 · 1.28 = 3.1792
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
18 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
19 / 41
Normal Distribution
Proposition
Let X be a binomial rv based on n trials with success probability p. Then
if the binomial probability histogram is not too skewed, X has
√
approximately a normal distribution with µ = np and σ = npq, where
q = 1 − p. In particular, for x = a posible value of X ,
area under the normal curve
P(X ≤ x) = B(x; n, p) ≈
to the left of x+0.5
x+0.5 − np
= Φ( √
)
npq
In practice, the approximation is adequate provided that both np ≥ 10 and
nq ≥ 10, since there is then enough symmetry in the underlying binomial
distribution.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
19 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
20 / 41
Normal Distribution
A graphical explanation for
P(X ≤ x) = B(x; n, p) ≈
= Φ(
Liang Zhang (UofU)
area under the normal curve
to the left of x+0.5
x+0.5 − np
)
√
npq
Applied Statistics I
June 30, 2008
20 / 41
Normal Distribution
A graphical explanation for
P(X ≤ x) = B(x; n, p) ≈
= Φ(
Liang Zhang (UofU)
area under the normal curve
to the left of x+0.5
x+0.5 − np
)
√
npq
Applied Statistics I
June 30, 2008
20 / 41
Normal Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
21 / 41
Normal Distribution
Example (Problem 54)
Suppose that 10% of all steel shafts produced by a certain process are
nonconforming but can be reworked (rather than having to be scrapped).
Consider a random sample of 200 shafts, and let X denote the number
among these that are nonconforming and can be reworked. What is the
(approximate) probability that X is between 15 and 25 (inclusive)?
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
21 / 41
Normal Distribution
Example (Problem 54)
Suppose that 10% of all steel shafts produced by a certain process are
nonconforming but can be reworked (rather than having to be scrapped).
Consider a random sample of 200 shafts, and let X denote the number
among these that are nonconforming and can be reworked. What is the
(approximate) probability that X is between 15 and 25 (inclusive)?
In this problem n = 200, p = 0.1 and q = 1 − p = 0.9. Thus
np = 20 > 10 and nq = 180 > 10
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
21 / 41
Normal Distribution
Example (Problem 54)
Suppose that 10% of all steel shafts produced by a certain process are
nonconforming but can be reworked (rather than having to be scrapped).
Consider a random sample of 200 shafts, and let X denote the number
among these that are nonconforming and can be reworked. What is the
(approximate) probability that X is between 15 and 25 (inclusive)?
In this problem n = 200, p = 0.1 and q = 1 − p = 0.9. Thus
np = 20 > 10 and nq = 180 > 10
P(15 ≤ X ≤ 25) = Bin(25; 200, 0.1) − Bin(14; 200, 0.1)
15 + 0.5 − 20
25 + 0.5 − 20
) − Φ( √
)
≈ Φ( √
200 · 0.1 · 0.9
200 · 0.1 · 0.9
= Φ(0.3056) − Φ(−0.2500)
= 0.6217 − 0.4013
= 0.2204
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
21 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
22 / 41
Exponential Distribution
Definition
X is said to have an exponential distribution with parameter λ(λ > 0) if
the pdf of X is
(
λe −λx x ≥ 0
f (x; λ) =
0
otherwise
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
22 / 41
Exponential Distribution
Definition
X is said to have an exponential distribution with parameter λ(λ > 0) if
the pdf of X is
(
λe −λx x ≥ 0
f (x; λ) =
0
otherwise
Remark:
1. Usually we use X ∼ EXP(λ) to denote that the random variable X has
an exponential distribution with parameter λ.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
22 / 41
Exponential Distribution
Definition
X is said to have an exponential distribution with parameter λ(λ > 0) if
the pdf of X is
(
λe −λx x ≥ 0
f (x; λ) =
0
otherwise
Remark:
1. Usually we use X ∼ EXP(λ) to denote that the random variable X has
an exponential distribution with parameter λ.
2. In some sources, the pdf of exponential distribution is given by
(
1 − θx
e
x ≥0
f (x; θ) = θ
0
otherwise
The difference is that λ → 1θ .
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
22 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
23 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
23 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
24 / 41
Exponential Distribution
Proposition
If X ∼ EXP(λ), then
E (X ) =
1
λ
and
V (X ) =
1
λ2
And the cdf for X is
(
1 − e −λx
F (x; λ) =
0
Liang Zhang (UofU)
Applied Statistics I
x ≥0
x <0
June 30, 2008
24 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
25 / 41
Exponential Distribution
Proof:
Z
E (X ) =
=
=
=
=
=
∞
xλe −λx dx
0
Z
1 ∞
(λx)e −λx d(λx)
λ 0
Z
1 ∞ −y
ye dy
y = λx
λ 0
Z ∞
1
[−ye −y |∞
e −y dy ] integration by parts:u = y , v = −e −y
0 +
λ
0
1
−y ∞
[0 + (−e |0 )]
λ
1
λ
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
25 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
26 / 41
Exponential Distribution
Proof (continued):
Z ∞
2
E (X ) =
x 2 λe −λx dx
0
Z ∞
1
= 2
(λx)2 e −λx d(λx)
λ 0
Z ∞
1
= 2
y 2 e −y dy
λ 0
Z ∞
1
= 2 [−y 2 e −y |∞
+
2ye −y dy ]
0
λ
0
Z ∞
1
−y ∞
= 2 [0 + 2(−ye |0 +
e −y dy )]
λ
0
1
= 2 2[0 + (−ye −y |∞
0 )]
λ
2
= 2
λ
Liang Zhang (UofU)
Applied Statistics I
y = λx
integration by parts
integration by parts
June 30, 2008
26 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
27 / 41
Exponential Distribution
Proof (continued):
2
1
1
V (X ) = E (X 2 ) − [E (X )]2 = 2 − ( )2 = 2
λ
λ
λ
Z x
−λy
F (x) =
λe
dy
0
Z x
=
e −λy d(λy )
0
Z x
=
e −z dz
z = λy
0
= −e −z |x0
= 1 − e −x
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
27 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
28 / 41
Exponential Distribution
Example (Problem 108)
The article “Determination of the MTF of Positive Photoresists Using the
Monte Carlo method” (Photographic Sci. and Engr., 1983:
254-260) proposes the exponential distribution with parameter λ = 0.93
as a model for the distribution of a photon’s free path length (µm) under
certain circumstances. Suppose this is the correct model.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
28 / 41
Exponential Distribution
Example (Problem 108)
The article “Determination of the MTF of Positive Photoresists Using the
Monte Carlo method” (Photographic Sci. and Engr., 1983:
254-260) proposes the exponential distribution with parameter λ = 0.93
as a model for the distribution of a photon’s free path length (µm) under
certain circumstances. Suppose this is the correct model.
a. What is the expected path length, and what is the standard deviation
of path length?
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
28 / 41
Exponential Distribution
Example (Problem 108)
The article “Determination of the MTF of Positive Photoresists Using the
Monte Carlo method” (Photographic Sci. and Engr., 1983:
254-260) proposes the exponential distribution with parameter λ = 0.93
as a model for the distribution of a photon’s free path length (µm) under
certain circumstances. Suppose this is the correct model.
a. What is the expected path length, and what is the standard deviation
of path length?
b. What is the probability that path length exceeds 3.0?
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
28 / 41
Exponential Distribution
Example (Problem 108)
The article “Determination of the MTF of Positive Photoresists Using the
Monte Carlo method” (Photographic Sci. and Engr., 1983:
254-260) proposes the exponential distribution with parameter λ = 0.93
as a model for the distribution of a photon’s free path length (µm) under
certain circumstances. Suppose this is the correct model.
a. What is the expected path length, and what is the standard deviation
of path length?
b. What is the probability that path length exceeds 3.0?
c. What value is exceeded by only 10% of all path lengths?
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
28 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
29 / 41
Exponential Distribution
Proposition
Suppose that the number of events occurring in any time interval of length
t has a Poisson distribution with parameter αt (where α, the rate of the
event process, is the expected number of events occurring in 1 unit of
time) and that numbers of occurrences in nonoverlappong intervals are
independent of one another. Then the distribution of elapsed time
between the occurrence of two successive events is exponential with
parameter λ = α.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
29 / 41
Exponential Distribution
Proposition
Suppose that the number of events occurring in any time interval of length
t has a Poisson distribution with parameter αt (where α, the rate of the
event process, is the expected number of events occurring in 1 unit of
time) and that numbers of occurrences in nonoverlappong intervals are
independent of one another. Then the distribution of elapsed time
between the occurrence of two successive events is exponential with
parameter λ = α.
e.g.
the number of customers visiting Costco in each hour =⇒ Poisson
distribution;
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
29 / 41
Exponential Distribution
Proposition
Suppose that the number of events occurring in any time interval of length
t has a Poisson distribution with parameter αt (where α, the rate of the
event process, is the expected number of events occurring in 1 unit of
time) and that numbers of occurrences in nonoverlappong intervals are
independent of one another. Then the distribution of elapsed time
between the occurrence of two successive events is exponential with
parameter λ = α.
e.g.
the number of customers visiting Costco in each hour =⇒ Poisson
distribution;
the time between every two successive customers visiting Costco =⇒
Exponential distribution.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
29 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
30 / 41
Exponential Distribution
Example (Example 4.22)
Suppose that calls are received at a 24-hour hotline according to a Poisson
process with rate α = 0.5 call per day.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
30 / 41
Exponential Distribution
Example (Example 4.22)
Suppose that calls are received at a 24-hour hotline according to a Poisson
process with rate α = 0.5 call per day.
Then the number of days X between successive calls has an exponential
distribution with parameter value 0.5.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
30 / 41
Exponential Distribution
Example (Example 4.22)
Suppose that calls are received at a 24-hour hotline according to a Poisson
process with rate α = 0.5 call per day.
Then the number of days X between successive calls has an exponential
distribution with parameter value 0.5.
The probability that more than 3 days elapse between calls is
P(X > 3) = 1 − P(X ≤ 3) = 1 − F (3; 0.5) = e −0.5·3 = 0.223.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
30 / 41
Exponential Distribution
Example (Example 4.22)
Suppose that calls are received at a 24-hour hotline according to a Poisson
process with rate α = 0.5 call per day.
Then the number of days X between successive calls has an exponential
distribution with parameter value 0.5.
The probability that more than 3 days elapse between calls is
P(X > 3) = 1 − P(X ≤ 3) = 1 − F (3; 0.5) = e −0.5·3 = 0.223.
The expected time between successive calls is 1/0.5 = 2 days.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
30 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
31 / 41
Exponential Distribution
“Memoryless” Property
Let X = the time certain component lasts (in hours) and we
assume the component lifetime is exponentially distributed with parameter
λ. Then what is the probability that the component can last at least an
additional t hours after working for t0 hours, i.e. what is
P(X ≥ t + t0 | X ≥ t0 )?
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
31 / 41
Exponential Distribution
“Memoryless” Property
Let X = the time certain component lasts (in hours) and we
assume the component lifetime is exponentially distributed with parameter
λ. Then what is the probability that the component can last at least an
additional t hours after working for t0 hours, i.e. what is
P(X ≥ t + t0 | X ≥ t0 )?
P({X ≥ t + t0 } ∩ {X ≥ t0 })
P(X ≥ t0 )
P(X ≥ t + t0 )
=
P(X ≥ t0 )
1 − F (t + t0 ; λ)
=
F (t0 ; λ)
P(X ≥ t + t0 | X ≥ t0 ) =
= e −λt
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
31 / 41
Exponential Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
32 / 41
Exponential Distribution
“Memoryless” Property
However, we have
P(X ≥ t) = 1 − F (t; λ) = e −λt
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
32 / 41
Exponential Distribution
“Memoryless” Property
However, we have
P(X ≥ t) = 1 − F (t; λ) = e −λt
Therefore, we have
P(X ≥ t) = P(X ≥ t + t0 | X ≥ t0 )
for any positive t and t0 .
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
32 / 41
Exponential Distribution
“Memoryless” Property
However, we have
P(X ≥ t) = 1 − F (t; λ) = e −λt
Therefore, we have
P(X ≥ t) = P(X ≥ t + t0 | X ≥ t0 )
for any positive t and t0 .
In words, the distribution of additional lifetime is exactly the same as the
original distribution of lifetime, so at each point in time the component
shows no effect of wear. In other words, the distribution of remaining
lifetime is independent of current age.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
32 / 41
Gamma Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
33 / 41
Gamma Distribution
Definition
For α > 0, the gamma function Γ(α) is defined by
Z ∞
x α−1 e −x dx
Γ(α) =
0
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
33 / 41
Gamma Distribution
Definition
For α > 0, the gamma function Γ(α) is defined by
Z ∞
x α−1 e −x dx
Γ(α) =
0
Properties for gamma function:
1. For any α > 1, Γ(α) = (α − 1) · Γ(α − 1) [via integration by parts];
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
33 / 41
Gamma Distribution
Definition
For α > 0, the gamma function Γ(α) is defined by
Z ∞
x α−1 e −x dx
Γ(α) =
0
Properties for gamma function:
1. For any α > 1, Γ(α) = (α − 1) · Γ(α − 1) [via integration by parts];
2. For any positive integer, n, Γ(n) = (n − 1)!;
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
33 / 41
Gamma Distribution
Definition
For α > 0, the gamma function Γ(α) is defined by
Z ∞
x α−1 e −x dx
Γ(α) =
0
Properties for gamma function:
1. For any α > 1, Γ(α) = (α − 1) · Γ(α − 1) [via integration by parts];
2. For any positive integer, n, Γ(n) = (n − 1)!;
√
3. Γ( 12 ) = π.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
33 / 41
Gamma Distribution
Definition
For α > 0, the gamma function Γ(α) is defined by
Z ∞
x α−1 e −x dx
Γ(α) =
0
Properties for gamma function:
1. For any α > 1, Γ(α) = (α − 1) · Γ(α − 1) [via integration by parts];
2. For any positive integer, n, Γ(n) = (n − 1)!;
√
3. Γ( 12 ) = π.
√
e.g. Γ(4) = (4 − 1)! = 6 and Γ( 52 ) = 23 · Γ( 32 ) = 23 [ 12 · Γ( 12 )] = 34 π
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
33 / 41
Gamma Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
34 / 41
Gamma Distribution
Definition
A continuous random variable X is said to have a gamma distribution if
the pdf of X is
(
1
x α−1 e −x/β x ≥ 0
α
f (x; α, β) = β Γ(α)
0
otherwise
where the parameters α and β satisfy α > 0, β > 0. The standard
gamma distribution has β = 1, so the pdf of a standard gamma rv is
(
1
x α−1 e −x x ≥ 0
f (x; α) = Γ(α)
0
otherwise
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
34 / 41
Gamma Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
35 / 41
Gamma Distribution
Remark:
1. We use X ∼ GAM(α, β) to denote that the rv X has a gamma
distribution with parameter α and β.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
35 / 41
Gamma Distribution
Remark:
1. We use X ∼ GAM(α, β) to denote that the rv X has a gamma
distribution with parameter α and β.
2. If we let α = 1 and β = 1/λ, then we get the exponential distribution:
1
f (x; 1, ) =
λ
Liang Zhang (UofU)
(
1
1
Γ(1)
λ
1
x 1−1 e −x/ λ = λe −λx
0
x ≥0
otherwise
Applied Statistics I
June 30, 2008
35 / 41
Gamma Distribution
Remark:
1. We use X ∼ GAM(α, β) to denote that the rv X has a gamma
distribution with parameter α and β.
2. If we let α = 1 and β = 1/λ, then we get the exponential distribution:
1
f (x; 1, ) =
λ
(
1
1
Γ(1)
λ
1
x 1−1 e −x/ λ = λe −λx
0
x ≥0
otherwise
3. When X is a standard gamma rv (β = 1), the cdf of X ,
Z
F (x; α) =
0
x
y α−1 e −y
dy
Γ(α)
is called the incomplete gamma function.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
35 / 41
Gamma Distribution
Remark:
1. We use X ∼ GAM(α, β) to denote that the rv X has a gamma
distribution with parameter α and β.
2. If we let α = 1 and β = 1/λ, then we get the exponential distribution:
1
f (x; 1, ) =
λ
(
1
1
Γ(1)
λ
1
x 1−1 e −x/ λ = λe −λx
0
x ≥0
otherwise
3. When X is a standard gamma rv (β = 1), the cdf of X ,
Z
F (x; α) =
0
x
y α−1 e −y
dy
Γ(α)
is called the incomplete gamma function.
There are extensive tables of F (x; α) available (Appendix Table A.4).
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
35 / 41
Gamma Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
36 / 41
Gamma Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
36 / 41
Gamma Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
37 / 41
Gamma Distribution
Proposition
If X ∼ GAM(α, β), then
E (X ) = αβ
and
V (X ) = αβ 2
Furthermore, for any x > 0, the cdf of X is given by
x
;α
P(X ≤ x) = F (x; α, β) = F
β
where F (•; α) is the incomplete gamma function.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
37 / 41
Gamma Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
38 / 41
Gamma Distribution
Example:
The survival time (in days) of a white rat that was subjected to a certain
level of X-ray radiation is a random variable X ∼ GAM(5, 4). Then what is
a. the probability that the survival time is at most 16 days;
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
38 / 41
Gamma Distribution
Example:
The survival time (in days) of a white rat that was subjected to a certain
level of X-ray radiation is a random variable X ∼ GAM(5, 4). Then what is
a. the probability that the survival time is at most 16 days;
b. the probability that the survival time is between 16 days and 20 days
(not inclusive);
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
38 / 41
Gamma Distribution
Example:
The survival time (in days) of a white rat that was subjected to a certain
level of X-ray radiation is a random variable X ∼ GAM(5, 4). Then what is
a. the probability that the survival time is at most 16 days;
b. the probability that the survival time is between 16 days and 20 days
(not inclusive);
c. the expected survival time.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
38 / 41
Gamma Distribution
Example:
The survival time (in days) of a white rat that was subjected to a certain
level of X-ray radiation is a random variable X ∼ GAM(5, 4). Then what is
a. the probability that the survival time is at most 16 days;
b. the probability that the survival time is between 16 days and 20 days
(not inclusive);
c. the expected survival time.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
38 / 41
Chi-Squared Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
39 / 41
Chi-Squared Distribution
Definition
Let ν be a positive integer. Then a random variable X is said to have a
chi-squared distribution with parameter ν if the pdf of X is the gamma
density with α = ν/2 and β = 2. The pdf of a chi-squared rv is thus
(
1
x (ν/2)−1 e −x/2 x ≥ 0
ν/2
f (x; ν) = 2 Γ(ν/2)
0
x <0
The parameter ν is called the number of degrees of freedom (df) of X .
The symbol χ2 is often used in place of “chi-squared”.
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
39 / 41
Chi-Squared Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
40 / 41
Chi-Squared Distribution
Remark:
1. Usually, we use X ∼ χ2 (ν) to denote that X is a chi-squared rv with
parameter ν;
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
40 / 41
Chi-Squared Distribution
Remark:
1. Usually, we use X ∼ χ2 (ν) to denote that X is a chi-squared rv with
parameter ν;
2. If X1 , X2 , . . . , Xn is n independent standard normal rv’s, then
X12 + X22 + · · · + Xn2 has the same distribution as χ2 (n).
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
40 / 41
Chi-Squared Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
41 / 41
Chi-Squared Distribution
Liang Zhang (UofU)
Applied Statistics I
June 30, 2008
41 / 41
Download