Uploaded by rowinfang

Probability: Continuous Distributions Lecture Notes

advertisement
SDS, CUHKSZ
STA2003 Probability
STA2003
Probability
Chapter 5: Continuous Distributions
5.1
Distribution of the Continuous Type
Definition 5.1.
A random variable X is said to be of the (absolute) continuous type if its distribution function
F (x) = Pr(X ≤ x) has the form
x
−∞ < x < ∞,
f (t)dt,
F (x) =
−∞
for some function f : R → [0, ∞).
If a random variable X is of continuous type, then its probabilistic behaviour is no longer described
by pmf defined by p(x) = Pr(X = x). Instead, the function f , called the probability density
function (pdf ), will be used.
◀
5.1.1
Probability Density Function of Continuous Random Variable
A continuous variable of a group of individuals can be represented by a histogram as shown below.
Relative Frequency Density
Monthly salary of 5000 workers under 25
0.10
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
2
6 10 14 18 22 26 30 34 38
Salary (in thousands of HK dollars)
In a histogram, the area of each rectangular block should be directly proportional to the frequency
of the corresponding class. Usually, the height of each block is set to the following relative frequency
density so that the total area of all the blocks will be equal to 1.
relative frequency density =
Chapter 5: Continuous Distributions
relative frequency
.
class width
∼ Page 81 ∼
SDS, CUHKSZ
STA2003 Probability
For a large data set, one can use smaller class width to produce finer histogram.
Relative Frequency Density
Monthly salary of 5000 workers under 25
0.10
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
0
10
20
30
40
Salary (in thousands of HK dollars)
Relative Frequency Density
Monthly salary of 5000 workers under 25
0.10
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
0
10
20
30
40
Salary (in thousands of HK dollars)
Therefore for an infinite population, it is reasonable to model the distribution of a continuous variable
by a smooth curve, that is, the probability density function.
Relative Frequency Density
Monthly salary of 5000 workers under 25
0.10
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
10
20
30
40
Salary (in thousands of HK dollars)
Chapter 5: Continuous Distributions
∼ Page 82 ∼
SDS, CUHKSZ
STA2003 Probability
Properties of a pdf
1. If F is differentiable, then
f (x) = lim
t→0
Pr(X ≤ x + t) − Pr(X ≤ x)
d
=
F (x).
t
dx
2. f (x) ≥ 0 for all x.
3.
∞
−∞
f (x)dx = F (∞) = 1.
4. Pr(X ∈ A) =
A
f (x)dx where A is any subset of R.
5. If X is of continuous type, then
(a) Pr(X = a) =
a
a
f (x)dx = 0 for all a; and hence
(b) Pr(a ≤ X ≤ b) = Pr(a < X ≤ b) = Pr(a ≤ X < b) = Pr(a < X < b)
=
b
a
f (x)dx = F (b) − F (a).
(c) The distribution function F is continuous everywhere.
Example 5.1.
Consider the following probability density function.
(
c(2x − x2 ), for 0 < x < 2;
f (x) =
0,
otherwise.
The constant c can be solved by
2
2
x3
=⇒ c x −
=1
3 0
2
c(2x − x )dx = 1
0
2
=⇒
4c
=1
3
3
=⇒ c = .
4
f (x)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
x
−1
Chapter 5: Continuous Distributions
0
1
∼ Page 83 ∼
2
3
SDS, CUHKSZ
STA2003 Probability
For any a, b ∈ (0, 2) where a < b,
b
Pr(a ≤ X ≤ b) =
3
1 2
(2x − x2 )dx =
3(b − a2 ) − (b3 − a3 ) .
4
a 4
Distribution function:
x
F (x) =
0
3
1
(2t − t2 )dt = (3x2 − x3 ),
4
4
F (x) = 0
for x ≤ 0;
F (x) = 1
for 0 < x < 2.
for x ≥ 2.
Pr(0.5 ≤ X ≤ 1) = F (1) − F (0.5) = 0.5 − 0.15625 = 0.34375.
⋆
Remarks
Note that f (x) is not a probability. For a very small ε > 0,
ε
ε
Pr a − ≤ X ≤ a +
=
2
2
a+ ε
2
f (x)dx ≈ εf (a).
a− 2ε
Hence, f (a) is a measure of how likely it is that the random variable will be near a.
5.1.2
Mean, Variance and Moment Generating Function
Definition 5.2.
If f (x) is the pdf of a continuous random variable, then
∞
E[g(X)] =
g(x)f (x)dx
−∞
is the mathematical expectation (expected value) of g(X) if it exists.
◀
The properties of the mathematical expectation of a discrete random variable also apply to the
mathematical expectation of a continuous random variable.
Chapter 5: Continuous Distributions
∼ Page 84 ∼
SDS, CUHKSZ
STA2003 Probability
Properties
1. If c is a constant, then E(c) = c.
2. If c is a constant, then E[cg(X)] = cE[g(X)].
" n
#
n
X
X
3. If c1 , c2 , . . . , cn are constants, then E
ci gi (X) =
ci E [gi (X)].
i=1
i=1
4. X(ω) ≥ Y (ω) for all ω ∈ Ω =⇒ E(X) ≥ E(Y ).
5. E(|X|) ≥ |E(X)|.
∞
Mean:
xf (x)dx.
µ = E(X) =
−∞
∞
Variance:
σ = Var(X) = E (X − µ)2 =
2
(x − µ)2 f (x)dx = E(X 2 ) − µ2 .
−∞
Moment generating function:
∞
tX
etx f (x)dx,
MX (t) = E(e ) =
−∞
(r)
E(X r ) = MX (0).
Cumulant generating function:
RX (t) = ln MX (t),
′
µ = RX
(0),
′′
σ 2 = RX
(0).
Example 5.2.
Consider the following probability density function.
(
xe−x , for 0 < x < ∞;
f (x) =
0,
otherwise.
The mgf is
∞
∞
tx
MX (t) =
−x
xe−(1−t)x dx =
e xe dx =
0
0
1
,
(1 − t)2
t < 1.
The cgf is
RX (t) = ln MX (t) = −2 ln(1 − t).
And the mean and the variance are
µ = R′ (0) = 2
and
σ 2 = R′′ (0) = 2.
⋆
Chapter 5: Continuous Distributions
∼ Page 85 ∼
SDS, CUHKSZ
STA2003 Probability
5.2
Common Continuous Distributions
5.2.1
Uniform Distribution
Definition 5.3.
For an interval (a, b), let X be the point randomly drawn from this interval. If the pdf of X is a
constant function on (a, b), i.e.,

 1 , for a < x < b;
f (x) = b − a
0,
otherwise,
then X is said to have a uniform distribution and is denoted as X ∼ U(a, b).
◀
f (x)
F (x)
1
1
b−a
0
x
a
0
b
a
x
b
Roughly speaking, it is a point randomly selected in such a way that it has no preference to be
located near any particular region.
Distribution function:


0,
for x ≤ a;

x − a
, for a < x < b;
F (x) =

b−a


1,
for x ≥ b.
Mean:
µ = E(X) =
a+b
2
(midpoint of the interval).
Variance:
σ 2 = Var(X) = E(X 2 ) − µ2 =
Chapter 5: Continuous Distributions
∼ Page 86 ∼
(b − a)2
.
12
SDS, CUHKSZ
STA2003 Probability
Example 5.3.
A straight rod drops freely onto a horizontal plane. Let X be the angle between the rod and North
direction: 0 ≤ X ≤ 2π. Then, X ∼ U(0, 2π).
(2π − 0)2
π2
0 + 2π
= π,
σ2 =
= .
2
12
3
x−0
x
F (x) =
=
for 0 ≤ x < 2π.
2π − 0
2π
µ=
π
π
Pr(pointing towards direction between NE and E) = Pr
≤X≤
π2
π4
−F
= F
2
4
1 π π =
−
2π 2
4
1
.
=
8
⋆
Property
Let X ∼ U(0, 1) and Y = cX + d, then
Y ∼ U(d, c + d),
if c is positive;
Y ∼ U(c + d, d),
if c is negative.
Proof. Consider
FX (x) = Pr(X ≤ x) =
x−0
=x
1−0
for 0 ≤ x ≤ 1.
If c > 0, then the cdf of Y is given by
FY (y) = Pr(Y ≤ y) = Pr(cX + d ≤ y)
y−d
y−d
y−d
= Pr X ≤
=
=
c
c
(c + d) − d
for d ≤ y ≤ c + d.
Compare with the cdf of a uniform random variable, we have
Y ∼ U(d, c + d).
Similarly for c < 0, then
Y ∼ U(c + d, d).
Example 5.4.
If X ∼ U(0, 1), then 2X ∼ U(0, 2) and 2X − 1 ∼ U(−1, 1).
X +2
6−X
∼ U(0, 1). Also, 6 − X ∼ U(0, 8) and
∼ U(0, 1).
If X ∼ U(−2, 6), then
8
8
⋆
Chapter 5: Continuous Distributions
∼ Page 87 ∼
SDS, CUHKSZ
STA2003 Probability
5.2.2
Exponential Distribution
Definition 5.4.
Let X be a positive random variable with pdf
(
λe−λx , for x > 0;
f (x) =
0,
for x ≤ 0,
then X is said to have an exponential distribution and is denoted as X ∼ Exp(λ).
◀
Probability Density Functions of Exp(λ)
f (x)
λ=2
λ=1
λ = 0.5
2
1
x
0
1
2
3
4
Recall in Poisson process,
Random variable N (t) = number of occurrences in a time interval [0, t]
p(y) = Pr(N (t) = y) =
e−λt (λt)y
,
y!
y = 0, 1, 2, . . . .
Define random variable X be the waiting time until the first occurrence in a Poisson process with
rate λ. Then the distribution function of X can be derived as follows:
For x ≤ 0,
F (x) = Pr(X ≤ x) = 0.
For x > 0,
F (x) = Pr(X ≤ x) = 1 − Pr(X > x)
= 1 − Pr(no occurrence in time interval [0, x])
= 1 − Pr(N (x) = 0) = 1 − e−λx
(Set y = 0 in p(y).)
Chapter 5: Continuous Distributions
∼ Page 88 ∼
SDS, CUHKSZ
STA2003 Probability
Therefore, the cdf of X is
(
1 − e−λx , for x > 0;
F (x) =
0,
for x ≤ 0.
Hence, the pdf of X is given by
f (x) = F ′ (x) = λe−λx ,
for x > 0.
Therefore X is distributed as exponential with parameter λ. An exponential random variable therefore can describe the random time elapsing between unpredictable events (e.g., telephone calls, earthquakes, arrivals of buses or customers, etc.)
The moment generating function of Exp(λ) is given by
∞
∞
tx
−λx
e λe
MX (t) =
−(λ−t)x
λe
dx =
0
0
∞
λe−(λ−t)x
λ
dx = −
=
,
λ−t 0
λ−t
t < λ.
The cumulant generating function is then
RX (t) = ln MX (t) = ln λ − ln(λ − t).
R′ (t) =
1
,
λ−t
µ = R′ (0) =
1
,
λ
R′′ (t) =
1
,
(λ − t)2
σ 2 = R′′ (0) =
1
.
λ2
In summary, we have for X ∼ Exp(λ),
Distribution function:
(
1 − e−λx , for x > 0;
F (x) =
0,
for x ≤ 0.
Moment generating function:
MX (t) =
λ
,
λ−t
1
λ
and
Mean and variance:
µ=
t < λ.
σ2 =
1
.
λ2
Example 5.5.
Let X be the failure time of a machine. Assume that X follows an exponential distribution with rate
1
λ = 2 failures per month. Then the average failure time is E(X) = month. The probability that
2
the machine can run over 4 months without any failure is given by
Pr(X > 4) = 1 − F (4) = 1 − 1 − e−2(4) = e−8 = 0.000335.
⋆
Chapter 5: Continuous Distributions
∼ Page 89 ∼
SDS, CUHKSZ
STA2003 Probability
Remarks
For any a > 0 and b > 0,
Pr(X > a + b) =
=
=
=
=
1 − F (a + b)
e−λ(a+b)
e−λa e−λb
(1 − F (a)) (1 − F (b))
Pr(X > a) Pr(X > b).
This implies
Pr(X > a + b|X > a) = Pr(X > b).
That is, knowing that event has not occurred in the past a units of time does not alter the distribution
of the arrival time in the future, i.e., we may assume the process starts afresh at any point of
observation.
Among all continuous random variable with support (0, ∞), the exponential distribution is the only
distribution that has the memoryless property.
Example 5.6.
In the previous example (Example 5.5.), what is the probability that the machine can run 4 months
more given that it has run for 1 year already?
Solution:
Pr(X > 16|X > 12) = Pr(X > 4) = e−8 = 0.000335.
⋆
Remarks
Another commonly used parametrization of exponential distribution is to define the pdf as

 1 e− βx , for x > 0;
f (x) = β
0,
for x ≤ 0,
where β > 0 is mean, standard deviation, and scale parameter of the distribution, the reciprocal of
the rate parameter, λ, defined earlier.
In this parametrization, β is a survival parameter that if a random variable X is considered the
survival time of some systems or lives, the expected duration of survival is β units of time. The
previous parametrization involving the rate parameter λ arises in the context of events arriving at a
rate λ, when the time between events (which might be modeled using an exponential distribution)
has a mean of β = λ−1 . The alternative specification with β is sometimes more convenient than the
one with λ, and some authors will use it as a standard definition.
Chapter 5: Continuous Distributions
∼ Page 90 ∼
SDS, CUHKSZ
STA2003 Probability
5.2.3
Gamma and Chi-Squared Distribution
Definition 5.5.
The gamma function is defined by
∞
xα−1 e−x dx,
Γ(α) =
for α > 0.
0
It is also called the Euler integral of the second kind.
◀
Properties of the gamma function
1. Γ(1) = 1.
2. For α > 1, Γ(α) = (α − 1)Γ(α − 1).
3. For any integer n ≥ 1, Γ(n) = (n − 1)!.
√
1
4. Γ
= π.
2
Definition 5.6.
Let X be a positive random variable with pdf
 α
 λ xα−1 e−λx , for x > 0;
f (x) = Γ(α)

0,
for x ≤ 0,
where α > 0 and λ > 0, then X is said to have a gamma distribution and is denoted as X ∼ Γ(α, λ),
or X ∼ Gam(α, λ).
◀
Chapter 5: Continuous Distributions
∼ Page 91 ∼
SDS, CUHKSZ
STA2003 Probability
Probability Density Functions of Γ(α = 4, λ)
f (x)
λ=1
λ = 0.5
λ = 0.25
0.2
0.1
x
0
5
10
15
20
25
30
35
Gamma random variable as a waiting time
Let Tn be the waiting time until the n-th occurrence of an event according to a Poisson process with
rate λ. Then the distribution function of Tn can be derived as
F (t) = Pr(Tn ≤ t) = 1 − Pr(Tn > t)
= 1 − Pr(N (t) < n)
n−1
X
(λt)k e−λt
,
t > 0.
= 1−
k!
k=0
Hence, the pdf of Tn is
f (t) = F ′ (t) =
λn n−1 −λt
λ(λt)n−1 e−λt
=
t e ,
(n − 1)!
Γ(n)
t > 0.
Therefore Tn is distributed as gamma with parameters α = n and rate λ. A gamma random variable
therefore can describe the random time elapsing until the accumulation of a specific number of
unpredictable events (e.g., telephone calls, earthquakes, arrivals of buses or customers, etc.)
The moment generating function of X ∼ Γ(α, λ) is given by
∞
∞
λα α−1 −λx
λα
x e dx =
xα−1 e−(λ−t)x dx
MX (t) =
e
Γ(α)
Γ(α)
0
0
∞
λα
(λ − t)α α−1 −(λ−t)x
=
x e
dx
(λ − t)α 0
Γ(α)
α
λ
=
,
t < λ.
λ−t
tx
From this mgf, one can easily derive the mean and variance of X.
Chapter 5: Continuous Distributions
∼ Page 92 ∼
SDS, CUHKSZ
STA2003 Probability
In summary, we have for X ∼ Γ(α, λ),
x
Distribution function:
F (x) =
0
λα α−1 −λt
t e dt,
Γ(α)
for x > 0.
Moment generating function:
MX (t) =
λ
λ−t
α
λ
and
Mean and variance:
µ=
α
,
t < λ.
σ2 =
α
.
λ2
For a Poisson process, the mean waiting time until the n-th occurrence is E(Tn ) =
n
.
λ
Example 5.7.
Assume that the number of phone calls received by a customer service representative follows a Poisson
process with rate 20 calls per hour.
Let T8 be the waiting time (in hours) until the 8th phone call. Then T8 ∼ Γ(8, 20).
2
8
= hour.
Mean waiting time for 8 phone calls is E(T8 ) =
20
5
Suppose that a customer service representative only needs to serve 8 calls before taking a break.
Then the probability that he/she has to work for more than 48 minutes (i.e., 0.8 hour) before taking
a rest is
Pr(T8 > 0.8) = 1 − Pr(T8 ≤ 0.8)
0.8
= 1−
0
208 7 −20x
xe
dx = 1 −
Γ(8)
1−
7
X
16k e−16
k=0
!
= 0.01.
k!
⋆
Properties of gamma distribution
1. The exponential distribution is a special case of the gamma distribution with α = 1, i.e.,
Γ(1, λ) ≡ Exp(λ).
2. Suppose X ∼ Γ(α, λ). Let Y = bX where b is a positive number. Then,
λ
.
Y ∼ Γ α,
b
Proof. The moment generating function of Y is given by
α
λ
tY
tbX
MY (t) = E(e ) = E(e ) = MX (bt) =
=
λ − bt
λ
which is the moment generating function of Γ α,
.
b
Chapter 5: Continuous Distributions
∼ Page 93 ∼
λ
b
λ
−t
b
!α
,
t<
λ
,
b
SDS, CUHKSZ
STA2003 Probability
3. The parameter α is called the shape parameter while λ is called the rate parameter. According
to property 2, we may tabulate the distribution function for the gamma distribution with a
standardized scale parameter.
4. Similar to exponential distribution, there is another parametrization of gamma distribution.
Consider a shape parameter of k > 0 and a scale parameter of θ > 0, the corresponding pdf is

 1 xk−1 e− xθ , for x > 0;
f (x) = Γ(k)θk

0,
for x ≤ 0.
Essentially, k = α and θ = λ−1 .
Definition 5.7.
r
1
Let r be a positive integer. If X ∼ Γ(α, λ) with α = , λ = , then we say that X has a
2
2
chi-squared distribution with degrees of freedom r and is denoted by X ∼ χ2 (r) or X ∼ χ2r .
Probability density function:
f (x) =
1
r
Γ
2
r
2
r
2
x
x 2 −1 e− 2 ,
x > 0.
Moment generating function:
MX (t) =
1
r ,
(1 − 2t) 2
µ=r
and
1
t< .
2
Mean and variance:
σ 2 = 2r.
◀
Probability Density Functions of χ2 (r)
f0.6
(x)
r=1
r=2
r=3
r=4
r=6
r=9
0.5
0.4
0.3
0.2
0.1
x
0
1
2
3
4
5
6
7
8
The values of the chi-squared distribution function are tabulated. (on Page 107)
Chapter 5: Continuous Distributions
∼ Page 94 ∼
SDS, CUHKSZ
STA2003 Probability
Example 5.8.
1
For Example 5.7., T8 ∼ Γ(8, 20). Consider Y = 40T8 ∼ Γ 8,
2
≡ χ2 (16).
From the chi-squared distribution table,
Pr(T8 > 0.8) = 1 − Pr(T8 ≤ 0.8) = 1 − Pr(Y ≤ 32) = 1 − 0.99 = 0.01.
⋆
Chapter 5: Continuous Distributions
∼ Page 95 ∼
SDS, CUHKSZ
STA2003 Probability
5.2.4
Normal Distribution
Many naturally occurring variables have distributions that are well-approximated by a “bell-shaped
curve”, or a normal distribution. These variables have histograms which are approximately symmetric,
have a single mode at the center, and tail away to both sides. Two parameters, the mean µ and
the standard deviation σ describe a normal distribution completely and allow one to approximate
the proportions of observations in any interval by finding corresponding areas under the appropriate
normal curve.
Definition 5.8.
A random variable X is said to have a normal distribution (Gaussian distribution) if its pdf is
defined by
(x−µ)2
1
−∞ < x < ∞,
f (x) = √
e− 2σ2 ,
2πσ 2
where −∞ < µ < ∞ and σ 2 > 0 are the location parameter and scale parameter respectively. It
is denoted as X ∼ N(µ, σ 2 ).
Distribution function:
x
(t−µ)2
1
√
e− 2σ2 dt,
2πσ 2
−∞
−∞ < x < ∞.
1 22
MX (t) = exp µt + σ t ,
2
t ∈ R.
F (x) =
Moment generating function:
Mean and variance:
E(X) = µ
and
Var(X) = σ 2 .
◀
Characteristics of all normal curves
1. Each bell-shaped normal curve is symmetric and centered at its mean µ.
2. About 68% of the area is within one standard deviation of the mean, about 95% of the area is
within two standard deviations of the mean, and almost all (99.7%) of the area is within three
standard deviations of the mean.
3. The places where the normal curve is the steepest are one standard deviation below and above
the mean (µ − σ and µ + σ).
Chapter 5: Continuous Distributions
∼ Page 96 ∼
SDS, CUHKSZ
STA2003 Probability
Probability Density Functions of N(µ, σ 2 )
f (x)
µ = 0, σ 2 = 1
µ = 0, σ 2 = 3
µ = 0, σ 2 = 2
µ = 4, σ 2 = 2
0.4
0.3
0.2
0.1
x
−10
−8
−6
−4
−2
0
2
4
6
8
10
The standard normal distribution
Areas under all normal curves are related. For example, the area to the right of 1.76 standard
deviations above the mean is identical for all normal curves. Because of this, we can find an area
over an interval for any normal curve by finding the corresponding area under a standard normal
curve which has mean µ = 0 and standard deviation σ = 1.
If µ = 0 and σ 2 = 1, then Z ∼ N(0, 1) is said to have the standard normal distribution.
Standard Normal Density
0.4
ϕ(x)
0.3
N(0, 1)
0.2
0.1
0.0
−3
Chapter 5: Continuous Distributions
−2
−1
0
x
∼ Page 97 ∼
1
2
3
SDS, CUHKSZ
STA2003 Probability
Usually the probability density function and distribution function of the standard normal distribution
are denoted as
z2
1
ϕ(z) = √ e− 2 ,
2π
Φ(z) = Pr(Z ≤ z) =
−∞ < z < ∞,
z
t2
1
√ e− 2 dt,
2π
−∞
−∞ < z < ∞.
The values of the standard normal distribution function are tabulated.
The standard normal distribution table (on Page 108)
The standard normal table tells the area under a standard normal curve to the left of a number z, i.e.,
the value of Φ(z). The number z is rounded to two decimal places in the table. The ones place and
the tenths place are listed at the left side of the table. The hundredths place is listed across the top.
Areas are listed in the table entries. The standard normal table here does not have information for
negative z values. Because of symmetry, this is not necessary, one may consider Φ(−z) = 1 − Φ(z).
Remarks
There are other two different versions of the standard normal distribution tables, one is “Area under
the standard normal curve”, another is “The standard normal right-tail probabilities”. Google them!
Chapter 5: Continuous Distributions
∼ Page 98 ∼
SDS, CUHKSZ
STA2003 Probability
Properties
1. Normal distribution is symmetric about its mean. That is, if X ∼ N(µ, σ 2 ), then
Pr(X ≤ µ − x) = Pr(X ≥ µ + x).
In particular, Φ(x) = 1 − Φ(−x).
2. If X ∼ N(µ, σ 2 ), then aX + b ∼ N(aµ + b, a2 σ 2 ). In particular, the normal score Z =
is distributed as standard normal.
X −µ
σ
3. If X ∼ N(0, 1), then X 2 ∼ χ2 (1).
Standardization
In working with normal curves, the first step in a calculation is invariably to standardize.
Z=
X −µ
∼ N(0, 1).
σ
This normal score (standard score, z-score) tells how many standard deviations an observation X is
from its mean. Positive z-score indicates that X is greater than the mean; negative z-score indicates
that X is below the mean.
If the z-score is known and the value of X is needed, solving the previous equation of X gives
X = µ + Z × σ.
In words, this states that X is Z standard deviations above the mean.
Example 5.9.
Studies show that gasoline use for compact cars sold in the United States is normally distributed,
with a mean use of 30.5 miles per gallon (mpg) and a standard deviation of 4.5 mpg. What percentage
of compact cars use between 25 and 40 mpg?
Solution:
Let X mpg be the gasoline used for a compact car. Then, X ∼ N(30.5, 4.52 ).
25 − 30.5
X − 30.5
40 − 30.5
≤
≤
Pr(25 ≤ X ≤ 40) = Pr
4.5
4.5
4.5
≈ Pr(−1.22 ≤ Z ≤ 2.11),
Z ∼ N(0, 1)
= Φ(2.11) − [1 − Φ(1.22)]
= 0.9826 − (1 − 0.8888)
= 0.8714
= 87.14%.
⋆
2
In general, if X ∼ N(µ, σ ), then Pr(a ≤ X ≤ b) = Φ
Chapter 5: Continuous Distributions
∼ Page 99 ∼
b−µ
σ
−Φ
a−µ
.
σ
SDS, CUHKSZ
STA2003 Probability
Example 5.10.
In general, for a normal random variable X ∼ N(µ, σ 2 ), the probability of that its value is within
one standard deviation from the mean is
Pr(µ − σ ≤ X ≤ µ + σ) = Φ(1) − Φ(−1) = 0.682 = 68.2%.
Similarly, the probability of that its value is within two standard deviations from the mean is
Pr(µ − 2σ ≤ X ≤ µ + 2σ) = Φ(2) − Φ(−2) = 0.954 = 95.4%,
and the probability of that its value is within three standard deviations from the mean is
Pr(µ − 3σ ≤ X ≤ µ + 3σ) = Φ(3) − Φ(−3) = 0.997 = 99.7%.
⋆
Example 5.11.
A manufacturer needs washers between 0.1180 and 0.1220 inches thick; any thickness outside this
range is unusable. One machine shop sells washers as $30 per 100. Their thickness is normally
distributed with a mean of 0.1200 inch and a standard deviation of 0.0010 inch.
A second machine shop sells washers at $26 per 100. Their thickness is normally distributed with a
mean of 0.1200 inch and a standard deviation of 0.0015 inch.
Which shop offers the better deal?
Solution:
X ∼ N(0.12, 0.0012 )
0.122 − 0.12
0.118 − 0.12
Pr(0.118 ≤ X ≤ 0.122) = Φ
−Φ
0.001
0.001
= Φ(2) − Φ(−2)
= 0.9772 − (1 − 0.9772)
= 0.9544.
Washers purchased from shop 1:
Hence on average, 95.44 of 100 washers from shop 1 are usable. Average cost of each effective washer
is $30/95.44 = $0.3143.
Y ∼ N(0.12, 0.00152 )
0.118 − 0.12
0.122 − 0.12
Pr(0.118 ≤ Y ≤ 0.122) = Φ
−Φ
0.0015
0.0015
≈ Φ(1.33) − Φ(−1.33)
= 0.9082 − (1 − 0.9082)
= 0.8164.
Washers purchased from shop 2:
Hence on average, 81.64 of 100 washers from shop 2 are usable. Average cost of each effective washer
is $26/81.64 = $0.3185.
We may conclude that washers from shop 1 are slightly better.
⋆
Chapter 5: Continuous Distributions
∼ Page 100 ∼
SDS, CUHKSZ
STA2003 Probability
Example 5.12.
A brass polish manufacturer wishes to set his filling equipment so that in the long run only five cans
in 1,000 will contain less than a desired minimum net fill of 800 gm. It is known from experience
that the filled weights are approximately normally distributed with a standard deviation of 6 gm. At
what level will the mean fill have to be set in order to meet this requirement?
Solution:
Let X gm be the net fill of a particular can. Then X ∼ N(µ, 36). The requirement can be expressed
as the following probability statement:
X −µ
800 − µ
Pr(X < 800) = 0.005 =⇒ Pr
<
= 0.005
σ
6
800 − µ
=⇒ Φ
= 0.005
6
800 − µ
= 1 − 0.005 = 0.995
=⇒ Φ −
6
800 − µ
=⇒ −
= 2.575
6
=⇒ µ = 815.45.
⋆
Chapter 5: Continuous Distributions
∼ Page 101 ∼
SDS, CUHKSZ
STA2003 Probability
Example 5.13.
For the normal probability calculation in Example 5.9., to use the above table to find the area
between −1.22 and 2.11, we can add up the two areas shown in the following graph:
Standard Normal Density
0.4
0.3
N(0, 1)
0.2
A2
A1
0.1
0.0
−3
−2
−1
0
−1.22
1
2
3
2.11
From the table, the area A1 is obtained as 0.4826. Based on the symmetric property of normal
distribution, the area A2 is equal to the area between zero and 1.22, which can be obtained directly
from the table as 0.3888. Therefore, the total area between −1.22 and 2.11 is 0.4826 + 0.3888 =
87.14%.
⋆
Chapter 5: Continuous Distributions
∼ Page 102 ∼
SDS, CUHKSZ
STA2003 Probability
Example 5.14.
Refer to Example 5.9.. In times of scarce energy resources, a competitive advantage is given to an
automobile manufacturer who can produce a car obtaining substantially better fuel economy than
the competitors’ cars. If a manufacturer wishes to develop a compact car that outperforms 95% of
the current compacts in fuel economy, what must the gasoline use rate for the new car be?
The gasoline use X (mpg) is normally distributed: X ∼ N(30.5, 4.52 ). According to the manufacturer’s requirement, we want to find the value c such that Pr(X ≤ c) = 0.95. (In fact, c is said to be
the 95th percentile of the distribution of X).
Using the standard normal distribution table (of the version with areas between zero and z), we need
to find the table entry as close as possible to 0.95 − 0.5 = 0.45. We observe that the table entry is
0.4495 for z = 1.64 and is 0.4505 for z = 1.65. Therefore, the required value of the z-score is halfway
between 1.64 and 1.65, i.e., z = 1.645.
The required value of c is thus given by c = 30.5 + 1.645 × 4.5 = 37.9 mpg.
The manufacturer’s new compact car must therefore obtain a fuel economy of 37.9 mpg to outperform
95% of the compact cars currently available on the U.S. market.
⋆
5.2.4.1
Normal Approximation to Binomial Probability
One useful application of normal distribution is to approximate the probabilities of other distributions, in particular, binomial distribution. The theory behind is indeed an application of the central
limit theorem (CLT ) which is introduced in Chapter 11 and is a topic in STAT2602 Probability and
Statistics II. Here we only focus on the application of normal approximation to binomial probability, which is common in practice. (Recall from Chapter 4 that binomial distribution can also be
approximated by Poisson distribution.)
For a random variable X following a binomial distribution B(n, p), it is known that
E(X) = np
and
Var(X) = np(1 − p).
If n is sufficiently large (a classical general guideline is to require np ≥ 5 and n(1 − p) ≥ 5 while some
stricter sources request 9 rather than 5), then X can be approximated by the normal distribution
N(np, np(1 − p)).
The approximation works especially well if n is large and p is close to 0.5, because in this case the
binomial distribution is more symmetric like a bell-shaped normal distribution.
Continuity Correction
Since binomial distribution is a discrete distribution while normal distribution is a continuous distribution, the approximation should be made more accurate by making an adjustment of values. This
is called continuity correction.
For example, let X be a discrete binomial random variable and Y be a continuous normal random
variable to approximate X. Suppose Pr(X = a) is to be calculated where a is an integer, it is clear
that using Pr(Y = a) for approximation does not work because it must be zero. Therefore, it is more
appropriate to consider
Pr(X = a) = Pr(a − 0.5 < X < a + 0.5) ≈ Pr(a − 0.5 < Y < a + 0.5).
Chapter 5: Continuous Distributions
∼ Page 103 ∼
SDS, CUHKSZ
STA2003 Probability
Example 5.15.
1
Recall Example 4.22., X ∼ B 8000, 1000
can be approximated by Po(8) in calculating Pr(X < 8).
Here we consider a normal approximation by first checking that
8000 ×
1
=8≥5
1000
and
8000 ×
999
= 7992 ≥ 5.
1000
Note that
1
= 8,
1000
1
999
Var(X) = 8000 ×
×
= 7.992.
1000 1000
The binomial random variable X can be approximated by Y ∼ N(8, 7.992).
E(X) = 8000 ×
Therefore,
Pr(X < 8) = Pr(X ≤ 7)
≈ Pr(Y ≤ 7.5)
7.5 − 8
= Pr Z ≤ √
where Z ∼ N(0, 1)
7.992
= Pr(Z ≤ −0.17686515)
≈ 0.429807157.
Note that the previous Poisson approximation gives 0.452960809 while the accurate binomial probability is 0.452890977.
We see that in this case normal approximation does not perform as accurate as Poisson approximation
1
is quite far from 0.5.
because p = 1000
⋆
5.2.5
Beta Distribution
Definition 5.9.
Let X be a positive random variable with pdf

 Γ(α + β) xα−1 (1 − x)β−1 , for 0 < x < 1;
f (x) = Γ(α)Γ(β)

0,
otherwise,
where α and β are positive numbers, then X is said to have a beta distribution and is denoted as
X ∼ Beta(α, β).
Mean and variance:
µ=
α
α+β
and
σ2 =
αβ
.
(α + β)2 (α + β + 1)
◀
Chapter 5: Continuous Distributions
∼ Page 104 ∼
SDS, CUHKSZ
STA2003 Probability
It is commonly used for modelling a random quantity distributed within a finite interval.
Probability Density Functions of Beta(α, β)
f (x)
α = 0.5, β = 0.5
α = 5, β = 1
α = 1, β = 3
α = 2, β = 2
α = 2, β = 5
2.6
2.4
2.2
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
x
1.1
1
Remarks
1. The function
1
xα−1 (1 − x)β−1 dx =
B(α, β) =
0
Γ(α)Γ(β)
Γ(α + β)
is called the beta function, it is also the Euler integral of the first kind. Therefore the pdf of a
beta distribution is sometimes expressed as

 1 xα−1 (1 − x)β−1 , for 0 < x < 1;
f (x) = B(α, β)

0,
otherwise.
2. If α = β = 1, the beta distribution becomes the uniform distribution U(0, 1).
Example 5.16.
After assessing the current political, social, economical and financial factors, a financial analyst
believes that the proportion of the stocks that will increase in value tomorrow is a beta random
variable with α = 5 and β = 3. What is the expected value of this proportion? How likely is that
the values of at least 70% of the stocks will move up?
Solution:
Let P be the proportion. Then, P ∼ Beta(5, 3) and E(P ) =
1
Γ(8)
7!
Pr(P ≥ 0.7) =
x5−1 (1 − x)3−1 dx =
4!2!
0.7 Γ(5)Γ(3)
5
5
= = 0.625.
5+3
8
1
(x4 − 2x5 + x6 )dx = 0.3529.
0.7
⋆
Chapter 5: Continuous Distributions
∼ Page 105 ∼
SDS, CUHKSZ
STA2003 Probability
5.3
Reading of textbook
Chapter 5: sections 5.1, 5.2, 5.3, 5.5
Chapter 8: section 8.4
Chapter 5: section 5.4
Acknowledgment:
We thank Dr. Kam Pui WAT for his tremendous contribution to these lecture notes.
∼ End of Chapter 5 ∼
Chapter 5: Continuous Distributions
∼ Page 106 ∼
SDS, CUHKSZ
STA2003 Probability
Chapter 5: Continuous Distributions
∼ Page 107 ∼
SDS, CUHKSZ
STA2003 Probability
Table Va The Standard Normal Distribution Function
f(z)
0.4
0.3
Φ(z0)
0.2
0.1
−3
−2
−1
z
1 z0
0
2
3
z
1
2
√ e−w /2 dw
2π
(−z) = 1 − (z)
P(Z ≤ z) = (z) =
−∞
z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.7881
0.8159
0.8413
0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.9987
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7611
0.7910
0.8186
0.8438
0.8665
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.9826
0.9864
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.9987
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461
0.8686
0.8888
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.9830
0.9868
0.9898
0.9922
0.9941
0.9956
0.9967
0.9976
0.9982
0.9987
0.5120
0.5517
0.5910
0.6293
0.6664
0.7019
0.7357
0.7673
0.7967
0.8238
0.8485
0.8708
0.8907
0.9082
0.9236
0.9370
0.9484
0.9582
0.9664
0.9732
0.9788
0.9834
0.9871
0.9901
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983
0.9988
0.5160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
0.7703
0.7995
0.8264
0.8508
0.8729
0.8925
0.9099
0.9251
0.9382
0.9495
0.9591
0.9671
0.9738
0.9793
0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984
0.9988
0.5199
0.5596
0.5987
0.6368
0.6736
0.7088
0.7422
0.7734
0.8023
0.8289
0.8531
0.8749
0.8944
0.9115
0.9265
0.9394
0.9505
0.9599
0.9678
0.9744
0.9798
0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984
0.9989
0.5239
0.5636
0.6026
0.6406
0.6772
0.7123
0.7454
0.7764
0.8051
0.8315
0.8554
0.8770
0.8962
0.9131
0.9279
0.9406
0.9515
0.9608
0.9686
0.9750
0.9803
0.9846
0.9881
0.9909
0.9931
0.9948
0.9961
0.9971
0.9979
0.9985
0.9989
0.5279
0.5675
0.6064
0.6443
0.6808
0.7157
0.7486
0.7794
0.8078
0.8340
0.8577
0.8790
0.8980
0.9147
0.9292
0.9418
0.9525
0.9616
0.9693
0.9756
0.9808
0.9850
0.9884
0.9911
0.9932
0.9949
0.9962
0.9972
0.9979
0.9985
0.9989
0.5319
0.5714
0.6103
0.6480
0.6844
0.7190
0.7517
0.7823
0.8106
0.8365
0.8599
0.8810
0.8997
0.9162
0.9306
0.9429
0.9535
0.9625
0.9699
0.9761
0.9812
0.9854
0.9887
0.9913
0.9934
0.9951
0.9963
0.9973
0.9980
0.9986
0.9990
0.5359
0.5753
0.6141
0.6517
0.6879
0.7224
0.7549
0.7852
0.8133
0.8389
0.8621
0.8830
0.9015
0.9177
0.9319
0.9441
0.9545
0.9633
0.9706
0.9767
0.9817
0.9857
0.9890
0.9916
0.9936
0.9952
0.9964
0.9974
0.9981
0.9986
0.9990
α
0.400
0.300
0.200
0.100
0.050
0.025
0.020
0.010
0.005
0.001
zα
zα/2
0.253
0.842
0.524
1.036
0.842
1.282
1.282
1.645
1.645
1.960
1.960
2.240
2.054
2.326
2.326
2.576
2.576
2.807
3.090
3.291
Chapter 5: Continuous Distributions
∼ Page 108 ∼
Download