Applied Statistics I Liang Zhang July 1, 2008

advertisement
Applied Statistics I
Liang Zhang
Department of Mathematics, University of Utah
July 1, 2008
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
1 / 36
Weibull Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
2 / 36
Weibull Distribution
Definition
A random variable X is said to have a Weibull distribution with
parameters α and β (α > 0, β > 0) if the pdf of X is
(
α α−1 −(x/β)α
e
x ≥0
αx
f (x; α, β) = β
0
x <0
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
2 / 36
Weibull Distribution
Definition
A random variable X is said to have a Weibull distribution with
parameters α and β (α > 0, β > 0) if the pdf of X is
(
α α−1 −(x/β)α
e
x ≥0
αx
f (x; α, β) = β
0
x <0
Remark:
1. The family of Weibull distributions was introduced by the Swedish
physicist Waloddi Weibull in 1939.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
2 / 36
Weibull Distribution
Definition
A random variable X is said to have a Weibull distribution with
parameters α and β (α > 0, β > 0) if the pdf of X is
(
α α−1 −(x/β)α
e
x ≥0
αx
f (x; α, β) = β
0
x <0
Remark:
1. The family of Weibull distributions was introduced by the Swedish
physicist Waloddi Weibull in 1939.
2. We use X ∼ WEB(α, β) to denote that the rv X has a Weibull
distribution with parameters α and β.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
2 / 36
Weibull Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
3 / 36
Weibull Distribution
Remark:
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
3 / 36
Weibull Distribution
Remark:
3. When α = 1, the pdf becomes
(
f (x; β) =
1 −x/β
βe
x ≥0
0
x <0
which is the pdf for an exponential distribution with parameter λ = β1 .
Thus we see that the exponential distribution is a special case of both the
gamma and Weibull distributions.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
3 / 36
Weibull Distribution
Remark:
3. When α = 1, the pdf becomes
(
f (x; β) =
1 −x/β
βe
x ≥0
0
x <0
which is the pdf for an exponential distribution with parameter λ = β1 .
Thus we see that the exponential distribution is a special case of both the
gamma and Weibull distributions.
4. There are gamma distributions that are not Weibull distributios and
vice versa, so one family is not a subset of the other.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
3 / 36
Weibull Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
4 / 36
Weibull Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
4 / 36
Weibull Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
5 / 36
Weibull Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
5 / 36
Weibull Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
6 / 36
Weibull Distribution
Proposition
Let X be a random variable such that X ∼ WEI(α, β). Then
( 2 )
2
1
1
and V (X ) = β 2 Γ 1 +
− Γ 1+
E (X ) = βΓ 1 +
α
α
α
The cdf of X is
(
α
1 − e −(x/β)
F (x; α, β) =
0
Liang Zhang (UofU)
Applied Statistics I
x ≥0
x <0
July 1, 2008
6 / 36
Weibull Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
7 / 36
Weibull Distribution
Example:
The shear strength (in pounds) of a spot weld is a Weibull distributed
random variable, X ∼ WEB(400, 2/3).
a. Find P(X > 410).
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
7 / 36
Weibull Distribution
Example:
The shear strength (in pounds) of a spot weld is a Weibull distributed
random variable, X ∼ WEB(400, 2/3).
a. Find P(X > 410).
b. Find P(X > 410 | X > 390).
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
7 / 36
Weibull Distribution
Example:
The shear strength (in pounds) of a spot weld is a Weibull distributed
random variable, X ∼ WEB(400, 2/3).
a. Find P(X > 410).
b. Find P(X > 410 | X > 390).
c. Find E (X ) and V (X ).
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
7 / 36
Weibull Distribution
Example:
The shear strength (in pounds) of a spot weld is a Weibull distributed
random variable, X ∼ WEB(400, 2/3).
a. Find P(X > 410).
b. Find P(X > 410 | X > 390).
c. Find E (X ) and V (X ).
d. Find the 95th percentile.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
7 / 36
Weibull Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
8 / 36
Weibull Distribution
In practical situations, γ = min(X ) > 0 and X − γ has a Weibull
distribution.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
8 / 36
Weibull Distribution
In practical situations, γ = min(X ) > 0 and X − γ has a Weibull
distribution.
Example (Problem 74):
Let X = the time (in 10−1 weeks) from shipment of a
defective product until the customer returns the product.
Suppose that the minimum return time is γ = 3.5 and that the excess
X − 3.5 over the minimum has a Weibull distribution with parameters
α = 2 and β = 1.5.
a. What is the cdf of X ?
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
8 / 36
Weibull Distribution
In practical situations, γ = min(X ) > 0 and X − γ has a Weibull
distribution.
Example (Problem 74):
Let X = the time (in 10−1 weeks) from shipment of a
defective product until the customer returns the product.
Suppose that the minimum return time is γ = 3.5 and that the excess
X − 3.5 over the minimum has a Weibull distribution with parameters
α = 2 and β = 1.5.
a. What is the cdf of X ?
b. What are the expected return time and variance of return time?
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
8 / 36
Weibull Distribution
In practical situations, γ = min(X ) > 0 and X − γ has a Weibull
distribution.
Example (Problem 74):
Let X = the time (in 10−1 weeks) from shipment of a
defective product until the customer returns the product.
Suppose that the minimum return time is γ = 3.5 and that the excess
X − 3.5 over the minimum has a Weibull distribution with parameters
α = 2 and β = 1.5.
a. What is the cdf of X ?
b. What are the expected return time and variance of return time?
c. Compute P(X > 5).
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
8 / 36
Weibull Distribution
In practical situations, γ = min(X ) > 0 and X − γ has a Weibull
distribution.
Example (Problem 74):
Let X = the time (in 10−1 weeks) from shipment of a
defective product until the customer returns the product.
Suppose that the minimum return time is γ = 3.5 and that the excess
X − 3.5 over the minimum has a Weibull distribution with parameters
α = 2 and β = 1.5.
a. What is the cdf of X ?
b. What are the expected return time and variance of return time?
c. Compute P(X > 5).
d. Compute P(5 ≤ X ≤ 8).
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
8 / 36
Lognormal Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
9 / 36
Lognormal Distribution
Definition
A nonnegative rv X is said to have a lognormal distribution if the rv
Y = ln(X ) has a normal distribution. The resulting pdf of a lognormal rv
when ln(X ) is normally distributed with parameters µ and σ is
(
2
2
√ 1
e −[ln(x)−µ] /(2σ ) x ≤ 0
2πσx
f (x; µ, σ) =
0
x <0
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
9 / 36
Lognormal Distribution
Definition
A nonnegative rv X is said to have a lognormal distribution if the rv
Y = ln(X ) has a normal distribution. The resulting pdf of a lognormal rv
when ln(X ) is normally distributed with parameters µ and σ is
(
2
2
√ 1
e −[ln(x)−µ] /(2σ ) x ≤ 0
2πσx
f (x; µ, σ) =
0
x <0
Remark:
1. We use X ∼ LOGN(µ, σ 2 ) to denote that rv X have a lognormal
distribution with parameters µ and σ.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
9 / 36
Lognormal Distribution
Definition
A nonnegative rv X is said to have a lognormal distribution if the rv
Y = ln(X ) has a normal distribution. The resulting pdf of a lognormal rv
when ln(X ) is normally distributed with parameters µ and σ is
(
2
2
√ 1
e −[ln(x)−µ] /(2σ ) x ≤ 0
2πσx
f (x; µ, σ) =
0
x <0
Remark:
1. We use X ∼ LOGN(µ, σ 2 ) to denote that rv X have a lognormal
distribution with parameters µ and σ.
2. Notice here that the parameter µ is not the mean and σ 2 is not the
variance, i.e.
µ 6= E (X ) and σ 2 6= V (X )
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
9 / 36
Lognormal Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
10 / 36
Lognormal Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
10 / 36
Lognormal Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
11 / 36
Lognormal Distribution
Proposition
If X ∼ LOGN(µ, σ 2 ), then
E (X ) = e µ+σ
2 /2
2
2
and V (X ) = e 2µ+σ · (e σ − 1)
The cdf of X is
F (x; µ, σ) = P(X ≤ x) = P[ln(X ) ≤ ln(x)]
ln(x) − µ
ln(x) − µ
=P Z ≤
=Φ
σ
σ
x ≤0
where Φ(z) is the cdf of the standard normal rv Z .
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
11 / 36
Lognormal Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
12 / 36
Lognormal Distribution
Example (Problem 115)
Let Ii be the input current to a transistor and I0 be the output current.
Then the current gain is proportional to ln(I0 /Ii ). Suppose the constant of
proportionality is 1 (which amounts to choosing a particular unit of
measurement), so that current gain = X = ln(I0 /Ii ). Assume X is
normally distributed with µ = 1 and σ = 0.05.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
12 / 36
Lognormal Distribution
Example (Problem 115)
Let Ii be the input current to a transistor and I0 be the output current.
Then the current gain is proportional to ln(I0 /Ii ). Suppose the constant of
proportionality is 1 (which amounts to choosing a particular unit of
measurement), so that current gain = X = ln(I0 /Ii ). Assume X is
normally distributed with µ = 1 and σ = 0.05.
a. What is the probability that the output current is more than twice the
input current?
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
12 / 36
Lognormal Distribution
Example (Problem 115)
Let Ii be the input current to a transistor and I0 be the output current.
Then the current gain is proportional to ln(I0 /Ii ). Suppose the constant of
proportionality is 1 (which amounts to choosing a particular unit of
measurement), so that current gain = X = ln(I0 /Ii ). Assume X is
normally distributed with µ = 1 and σ = 0.05.
a. What is the probability that the output current is more than twice the
input current?
b. What are the expected value and variance of the ratio of output to
input current?
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
12 / 36
Lognormal Distribution
Example (Problem 115)
Let Ii be the input current to a transistor and I0 be the output current.
Then the current gain is proportional to ln(I0 /Ii ). Suppose the constant of
proportionality is 1 (which amounts to choosing a particular unit of
measurement), so that current gain = X = ln(I0 /Ii ). Assume X is
normally distributed with µ = 1 and σ = 0.05.
a. What is the probability that the output current is more than twice the
input current?
b. What are the expected value and variance of the ratio of output to
input current?
c. What value r is such that only 5% chance we will have the ratio of
output to input current exceed r ?
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
12 / 36
Beta Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
13 / 36
Beta Distribution
Definition
A random variable X is said to have a beta distribution with parameters
α, β(both positive), A, and B if the pdf of X is

α−1 β−1
 1
Γ(α+β)
x−A
B−x
·
·
A≤x ≤B
·
B−A
B−A
f (x; α, β, A, B) = B−A Γ(α)·Γ(β)
0
otherwise
The case A = 0, B = 1 gives the standard beta distribution.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
13 / 36
Beta Distribution
Definition
A random variable X is said to have a beta distribution with parameters
α, β(both positive), A, and B if the pdf of X is

α−1 β−1
 1
Γ(α+β)
x−A
B−x
·
·
A≤x ≤B
·
B−A
B−A
f (x; α, β, A, B) = B−A Γ(α)·Γ(β)
0
otherwise
The case A = 0, B = 1 gives the standard beta distribution.
Remark: We use X ∼ BETA(α, β, A, B) to denote that rv X has a beta
distribution with parameters α, β, A, and B.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
13 / 36
Beta Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
14 / 36
Beta Distribution
Proposition
If X ∼ BETA(α, β, A, B), then
E (X ) = A + (B − A) ·
Liang Zhang (UofU)
α
(B − A)2 αβ
and V (X ) =
α+β
(α + β)2 (α + β + 1)
Applied Statistics I
July 1, 2008
14 / 36
Beta Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
15 / 36
Beta Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
15 / 36
Beta Distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
16 / 36
Beta Distribution
Example (Problem 127)
An individual’s credit score is a number calculated based on that person’s
credit history which helps a lender determine how much he/she should be
loaned or what credit limit should be established for a credit card. An
article in the Los Angeles Times gave data which suggested that a beta
distribution with parameters A = 150, B = 850, α = 8, β = 2 would
provide a reasonable approximation to the distribution of American credit
scores. [Note: credit scores are integer-valued].
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
16 / 36
Beta Distribution
Example (Problem 127)
An individual’s credit score is a number calculated based on that person’s
credit history which helps a lender determine how much he/she should be
loaned or what credit limit should be established for a credit card. An
article in the Los Angeles Times gave data which suggested that a beta
distribution with parameters A = 150, B = 850, α = 8, β = 2 would
provide a reasonable approximation to the distribution of American credit
scores. [Note: credit scores are integer-valued].
a. Let X represent a randomly selected American credit score. What are
the mean value and standard deviation of this random variable? What
is the probability that X is within 1 standard deviation of its mean
value?
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
16 / 36
Beta Distribution
Example (Problem 127)
An individual’s credit score is a number calculated based on that person’s
credit history which helps a lender determine how much he/she should be
loaned or what credit limit should be established for a credit card. An
article in the Los Angeles Times gave data which suggested that a beta
distribution with parameters A = 150, B = 850, α = 8, β = 2 would
provide a reasonable approximation to the distribution of American credit
scores. [Note: credit scores are integer-valued].
a. Let X represent a randomly selected American credit score. What are
the mean value and standard deviation of this random variable? What
is the probability that X is within 1 standard deviation of its mean
value?
b. What is the approximate probability that a randomly selected score
will exceed 750 (which lenders consider a very good score)?
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
16 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
17 / 36
Probability Plot
Example:
There is a machine available for cutting corks intended for use in wine
bottles. We want to find out the distribution of the diameters of the corks
produced by that machine. Assume we have 10 samples produced by that
machine and the diameters is recorded as following:
3.0879 3.2546 2.8970 2.7377 2.7740
2.6030 3.5931 3.1253 2.4756 2.5133
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
17 / 36
Probability Plot
Example:
There is a machine available for cutting corks intended for use in wine
bottles. We want to find out the distribution of the diameters of the corks
produced by that machine. Assume we have 10 samples produced by that
machine and the diameters is recorded as following:
3.0879 3.2546 2.8970 2.7377 2.7740
2.6030 3.5931 3.1253 2.4756 2.5133
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
17 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
18 / 36
Probability Plot
3.0879
2.6030
3.2546
3.5931
Liang Zhang (UofU)
2.8970
3.1253
2.7377
2.4756
2.7740
2.5133
Applied Statistics I
July 1, 2008
18 / 36
Probability Plot
3.0879
2.6030
3.2546
3.5931
Liang Zhang (UofU)
2.8970
3.1253
2.7377
2.4756
2.7740
2.5133
Applied Statistics I
July 1, 2008
18 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
19 / 36
Probability Plot
Sample Percentile
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
19 / 36
Probability Plot
Sample Percentile
Recall: The (100p)th percentile of the distribution of a continuous rv X ,
R η(p)
denoted by η(p), is defined by p = F (η(p)) = −∞ f (y )dy .
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
19 / 36
Probability Plot
Sample Percentile
Recall: The (100p)th percentile of the distribution of a continuous rv X ,
R η(p)
denoted by η(p), is defined by p = F (η(p)) = −∞ f (y )dy .
In words, the (100p)th percentile η(p) is the X value such that there are
100p% X values below η(p).
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
19 / 36
Probability Plot
Sample Percentile
Recall: The (100p)th percentile of the distribution of a continuous rv X ,
R η(p)
denoted by η(p), is defined by p = F (η(p)) = −∞ f (y )dy .
In words, the (100p)th percentile η(p) is the X value such that there are
100p% X values below η(p).
Similarly, we can define sample percentile in the same manner, i.e. the
(100p)th percentile xp is the value such that there are 100p% sample
values below xp .
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
19 / 36
Probability Plot
Sample Percentile
Recall: The (100p)th percentile of the distribution of a continuous rv X ,
R η(p)
denoted by η(p), is defined by p = F (η(p)) = −∞ f (y )dy .
In words, the (100p)th percentile η(p) is the X value such that there are
100p% X values below η(p).
Similarly, we can define sample percentile in the same manner, i.e. the
(100p)th percentile xp is the value such that there are 100p% sample
values below xp .
Unfortunately, xp may not be a sample value for some p.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
19 / 36
Probability Plot
Sample Percentile
Recall: The (100p)th percentile of the distribution of a continuous rv X ,
R η(p)
denoted by η(p), is defined by p = F (η(p)) = −∞ f (y )dy .
In words, the (100p)th percentile η(p) is the X value such that there are
100p% X values below η(p).
Similarly, we can define sample percentile in the same manner, i.e. the
(100p)th percentile xp is the value such that there are 100p% sample
values below xp .
Unfortunately, xp may not be a sample value for some p.
e.g. for the previous example, what is the 35th percentile for the ten
sample values?
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
19 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
20 / 36
Probability Plot
Definition
Assume we have a sample with size n. Order the n sample observations
from smallest to largest. Then the ith smallest observation in the list is
taken to be the [100(i − 0.5)/n]th sample percentile.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
20 / 36
Probability Plot
Definition
Assume we have a sample with size n. Order the n sample observations
from smallest to largest. Then the ith smallest observation in the list is
taken to be the [100(i − 0.5)/n]th sample percentile.
Remark:
1. Why “i − 0.5”?
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
20 / 36
Probability Plot
Definition
Assume we have a sample with size n. Order the n sample observations
from smallest to largest. Then the ith smallest observation in the list is
taken to be the [100(i − 0.5)/n]th sample percentile.
Remark:
1. Why “i − 0.5”? We regard the sample observation as being half in the
lower group and half in the upper group.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
20 / 36
Probability Plot
Definition
Assume we have a sample with size n. Order the n sample observations
from smallest to largest. Then the ith smallest observation in the list is
taken to be the [100(i − 0.5)/n]th sample percentile.
Remark:
1. Why “i − 0.5”? We regard the sample observation as being half in the
lower group and half in the upper group.
e.g. if n = 9, then the sample median is the 5th largest observation and
this observation is regarded as two parts: one in the lower half and one in
the upper half.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
20 / 36
Probability Plot
Definition
Assume we have a sample with size n. Order the n sample observations
from smallest to largest. Then the ith smallest observation in the list is
taken to be the [100(i − 0.5)/n]th sample percentile.
Remark:
1. Why “i − 0.5”? We regard the sample observation as being half in the
lower group and half in the upper group.
e.g. if n = 9, then the sample median is the 5th largest observation and
this observation is regarded as two parts: one in the lower half and one in
the upper half.
2. Once the percentage values 100(i − 0.5)/n(i = 1, 2, . . . , n) have been
calculated, sample percentiles corresponding to intermediate percentages
can be obtained by linear interpolation.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
20 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
21 / 36
Probability Plot
Example: for the previous example, the [100(i − 0.5)/n]th sample
percentile is tabulated as following:
2.4756
2.5133
2.6030
100(1-.5)/10 = 5%
100(2-.5)/10 = 15% 100(3-.5)/10 = 25%
2.7377
2.7740
100(4-.5)/10 = 35% 100(5-.5)/10 = 45%
2.8970
3.0879
3.1253
100(6-.5)/10 = 55% 100(7-.5)/10 = 65% 100(8-.5)/10 = 75%
3.2546
3.5931
100(9-.5)/10 = 85% 100(10-.5)/10 = 95%
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
21 / 36
Probability Plot
Example: for the previous example, the [100(i − 0.5)/n]th sample
percentile is tabulated as following:
2.4756
2.5133
2.6030
100(1-.5)/10 = 5%
100(2-.5)/10 = 15% 100(3-.5)/10 = 25%
2.7377
2.7740
100(4-.5)/10 = 35% 100(5-.5)/10 = 45%
2.8970
3.0879
3.1253
100(6-.5)/10 = 55% 100(7-.5)/10 = 65% 100(8-.5)/10 = 75%
3.2546
3.5931
100(9-.5)/10 = 85% 100(10-.5)/10 = 95%
The 10th percentile would be (2.4756 + 2.5133)/2 = 2.49445
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
21 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
22 / 36
Probability Plot
Idea for Quantile-Quantile Plot:
1. Determine the “[100(i − 0.5)/n]th sample percentile” for a given
sample.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
22 / 36
Probability Plot
Idea for Quantile-Quantile Plot:
1. Determine the “[100(i − 0.5)/n]th sample percentile” for a given
sample.
2. Find the corresponding [100(i − 0.5)/n]th percentile from the
population with the assumed distribution; for example, if the assumed
distribution is standard normal, then find corresponding
[100(i − 0.5)/n]th percentile from the standard normal distribution.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
22 / 36
Probability Plot
Idea for Quantile-Quantile Plot:
1. Determine the “[100(i − 0.5)/n]th sample percentile” for a given
sample.
2. Find the corresponding [100(i − 0.5)/n]th percentile from the
population with the assumed distribution; for example, if the assumed
distribution is standard normal, then find corresponding
[100(i − 0.5)/n]th percentile from the standard normal distribution.
3. Consider the (population percentile, sample percentile) pairs, i.e.
[100(i − 0.5)/n]th percentile, ith smallest sample
of the distribution
observation
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
22 / 36
Probability Plot
Idea for Quantile-Quantile Plot:
1. Determine the “[100(i − 0.5)/n]th sample percentile” for a given
sample.
2. Find the corresponding [100(i − 0.5)/n]th percentile from the
population with the assumed distribution; for example, if the assumed
distribution is standard normal, then find corresponding
[100(i − 0.5)/n]th percentile from the standard normal distribution.
3. Consider the (population percentile, sample percentile) pairs, i.e.
[100(i − 0.5)/n]th percentile, ith smallest sample
of the distribution
observation
4. Each pair plotted as a point on a two-dimensional coordinate system
should fall close to a 45◦ line.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
22 / 36
Probability Plot
Idea for Quantile-Quantile Plot:
1. Determine the “[100(i − 0.5)/n]th sample percentile” for a given
sample.
2. Find the corresponding [100(i − 0.5)/n]th percentile from the
population with the assumed distribution; for example, if the assumed
distribution is standard normal, then find corresponding
[100(i − 0.5)/n]th percentile from the standard normal distribution.
3. Consider the (population percentile, sample percentile) pairs, i.e.
[100(i − 0.5)/n]th percentile, ith smallest sample
of the distribution
observation
4. Each pair plotted as a point on a two-dimensional coordinate system
should fall close to a 45◦ line. Substantial deviations of the plotted
points from a 45◦ line cast doubt on the assumption that the
distribution under consideration is the correct one.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
22 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
23 / 36
Probability Plot
Example 4.29:
The value of a certain physical constant is known to an experimenter. The
experimenter makes n = 10 independent measurements of this value using
a particular measurement device and records the resulting measurement
errors (error = observed value - true value). These observations appear in
the following table.
Percentage
Sample Observation
Percentage
Sample Observation
Liang Zhang (UofU)
5
-1.91
55
0.35
15
-1.25
65
0.72
25
-0.75
75
0.87
Applied Statistics I
35
-0.53
85
1.40
45
0.20
95
1.56
July 1, 2008
23 / 36
Probability Plot
Example 4.29:
The value of a certain physical constant is known to an experimenter. The
experimenter makes n = 10 independent measurements of this value using
a particular measurement device and records the resulting measurement
errors (error = observed value - true value). These observations appear in
the following table.
Percentage
Sample Observation
Percentage
Sample Observation
5
-1.91
55
0.35
15
-1.25
65
0.72
25
-0.75
75
0.87
35
-0.53
85
1.40
45
0.20
95
1.56
Is it plausible that the random variable measurement error has standard
normal distribution?
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
23 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
24 / 36
Probability Plot
We first find the corresponding
case, the z percentiles:
Percentage
5
Sample Observation -1.91
z percentile
-1.645
Percentage
55
Sample Observation
0.35
z percentile
0.126
Liang Zhang (UofU)
population distribution percentiles, in this
15
-1.25
-1.037
65
0.72
0.385
25
-0.75
-0.675
75
0.87
0.675
Applied Statistics I
35
-0.53
-0.385
85
1.40
1.037
45
0.20
-0.126
95
1.56
1.645
July 1, 2008
24 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
25 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
25 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
26 / 36
Probability Plot
What about the first example? We are only interested in whether the ten
sample observations come from a normal distribution.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
26 / 36
Probability Plot
What about the first example? We are only interested in whether the ten
sample observations come from a normal distribution.
Recall:
{(100p)th percentile for N(µ, σ 2 )} =
µ + {(100p)th percentile for N(0, 1)} · σ
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
26 / 36
Probability Plot
What about the first example? We are only interested in whether the ten
sample observations come from a normal distribution.
Recall:
{(100p)th percentile for N(µ, σ 2 )} =
µ + {(100p)th percentile for N(0, 1)} · σ
If µ = 0, then the pairs (σ · [z percentile], observation) fall on a 45◦ line,
which has slope 1.
Therefore the pairs ([z percentile], observation) fall on a line passing
through (0,0) (i.e., one with y -intercept 0) but having slope σ rather than
1.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
26 / 36
Probability Plot
What about the first example? We are only interested in whether the ten
sample observations come from a normal distribution.
Recall:
{(100p)th percentile for N(µ, σ 2 )} =
µ + {(100p)th percentile for N(0, 1)} · σ
If µ = 0, then the pairs (σ · [z percentile], observation) fall on a 45◦ line,
which has slope 1.
Therefore the pairs ([z percentile], observation) fall on a line passing
through (0,0) (i.e., one with y -intercept 0) but having slope σ rather than
1.
Now for µ 6= 0, the y -intercept is µ instead of 0.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
26 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
27 / 36
Probability Plot
Normal Probability Plot
A plot of the n pairs
([100(i − 0.5)/n]th z percentile, ith smallest observation)
on a two-dimensional coordinate system is called a normal probability
plot. If the sample observations are in fact drawn from a normal
distribution with mean value µ and standard deviation σ, the points should
fall close to a straight line with slope σ and y -intercept µ. Thus a plot for
which the points fall close to some straight line suggests that the
assumption of a normal population distribution is plausible.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
27 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
28 / 36
Probability Plot
First Example:
Percentage
Sample Observation
z percentile
Percentage
Sample Observation
z percentile
Liang Zhang (UofU)
5
2.4756
-1.645
55
2.8970
0.126
15
2.5133
-1.037
65
3.0879
0.385
25
2.6030
-0.675
75
3.1253
0.675
Applied Statistics I
35
2.7377
-0.385
85
3.2546
1.037
45
2.7740
-0.126
95
3.5931
1.645
July 1, 2008
28 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
29 / 36
Probability Plot
A nonnormal population distribution can often be placed in one of the
following three categories:
1. It is symmetric and has “lighter tails” than does a normal
distribution; that is, the density curve declines more rapidly out in the
tails than does a normal curve.
2. It is symmetric and heavy-tailed compared to a normal distribution.
3. It is skewed.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
29 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
30 / 36
Probability Plot
Symmetric and “light-tailed”: e.g. Uniform distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
30 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
31 / 36
Probability Plot
Symmetric and heavy-tailed: e.g. Cauchy distribution with pdf
f (x) = 1/[π(1 + x 2 )] for −∞ < x < ∞
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
31 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
32 / 36
Probability Plot
Skewed: e.g. lognormal distribution
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
32 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
33 / 36
Probability Plot
Some guidances for probability plot for normal distributions
(from the book Fitting Equations to Data (2nd ed.) Daniel, Cuthbert,
and Fed Wood, Wiley, New York, 1980)
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
33 / 36
Probability Plot
Some guidances for probability plot for normal distributions
(from the book Fitting Equations to Data (2nd ed.) Daniel, Cuthbert,
and Fed Wood, Wiley, New York, 1980)
1. For sample size smaller than 30, there is typically greater variation in
the apperance of the probability plot.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
33 / 36
Probability Plot
Some guidances for probability plot for normal distributions
(from the book Fitting Equations to Data (2nd ed.) Daniel, Cuthbert,
and Fed Wood, Wiley, New York, 1980)
1. For sample size smaller than 30, there is typically greater variation in
the apperance of the probability plot.
2. Only for much larger sample sizes does a linear pattern generally
predominate.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
33 / 36
Probability Plot
Some guidances for probability plot for normal distributions
(from the book Fitting Equations to Data (2nd ed.) Daniel, Cuthbert,
and Fed Wood, Wiley, New York, 1980)
1. For sample size smaller than 30, there is typically greater variation in
the apperance of the probability plot.
2. Only for much larger sample sizes does a linear pattern generally
predominate.
Therefore, when a plot is based on a small sample size, only a very
substantial departure from linearity should be taken as conclusive evidence
of nonnorality.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
33 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
34 / 36
Probability Plot
Definition
Consider a family of probability distributions involving two parameters, θ1
and θ2 , and let F (x; θ1 , θ2 ) denote the corresponding cdf’s.
The parameters θ1 and θ2 are said to be location and scale parameters,
respectively, if F (x; θ1 , θ2 ) is a function of (x − θ1 )/θ2 .
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
34 / 36
Probability Plot
Definition
Consider a family of probability distributions involving two parameters, θ1
and θ2 , and let F (x; θ1 , θ2 ) denote the corresponding cdf’s.
The parameters θ1 and θ2 are said to be location and scale parameters,
respectively, if F (x; θ1 , θ2 ) is a function of (x − θ1 )/θ2 .
e.g.
1. Normal distributions N(µ, σ): F (x; µ, σ) = Φ( x−µ
σ ).
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
34 / 36
Probability Plot
Definition
Consider a family of probability distributions involving two parameters, θ1
and θ2 , and let F (x; θ1 , θ2 ) denote the corresponding cdf’s.
The parameters θ1 and θ2 are said to be location and scale parameters,
respectively, if F (x; θ1 , θ2 ) is a function of (x − θ1 )/θ2 .
e.g.
1. Normal distributions N(µ, σ): F (x; µ, σ) = Φ( x−µ
σ ).
2. The extreme value distribution with cdf
F (x; θ1 , θ2 ) = 1 − e −e
Liang Zhang (UofU)
Applied Statistics I
(x−θ1 )/θ2
July 1, 2008
34 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
35 / 36
Probability Plot
For Weibull distribution:
α
F (x; α, β) = 1 − e −(x/β) ,
the parameter β is a scale parameter but α is NOT a location parameter.
α is usually referred to as a shape parameter.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
35 / 36
Probability Plot
For Weibull distribution:
α
F (x; α, β) = 1 − e −(x/β) ,
the parameter β is a scale parameter but α is NOT a location parameter.
α is usually referred to as a shape parameter.
Fortunately, if X has a Weibull distribution with shape parameter α and
scale parameter β, then the transformed variable ln(X ) has an extreme
value distribution with location parameter θ1 = ln(β) and scale parameter
θ2 = 1/α.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
35 / 36
Probability Plot
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
36 / 36
Probability Plot
The gamma distribution also has a shape parameter α. However, there is
no transformation h(•) such that h(X ) has a distribution that depends
only on location and scale parameters.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
36 / 36
Probability Plot
The gamma distribution also has a shape parameter α. However, there is
no transformation h(•) such that h(X ) has a distribution that depends
only on location and scale parameters.
Thus, before we construct a probability plot, we have to estimate the
shape parameter from the sample data.
Liang Zhang (UofU)
Applied Statistics I
July 1, 2008
36 / 36
Download