Example: There is a machine available for cutting corks intended for... wine bottles. We want to find out the distribution of...

advertisement
Probability Plot
Probability Plot
Example:
There is a machine available for cutting corks intended for use in
wine bottles. We want to find out the distribution of the diameters
of the corks produced by that machine. Assume we have 10
samples produced by that machine and the diameters is recorded
as following:
3.0879
2.6030
3.2546
3.5931
2.8970
3.1253
2.7377
2.4756
2.7740
2.5133
Probability Plot
Probability Plot
Probability Plot
Probability Plot
3.0879
2.6030
3.2546
3.5931
2.8970
3.1253
2.7377
2.4756
2.7740
2.5133
Probability Plot
3.0879
2.6030
3.2546
3.5931
2.8970
3.1253
2.7377
2.4756
2.7740
2.5133
Probability Plot
Probability Plot
Sample Percentile
Probability Plot
Sample Percentile
Recall: The (100p)th percentile of the distribution of a
continuous rv X , denoted by η(p), is defined by
R η(p)
p = F (η(p)) = −∞ f (y )dy .
Probability Plot
Sample Percentile
Recall: The (100p)th percentile of the distribution of a
continuous rv X , denoted by η(p), is defined by
R η(p)
p = F (η(p)) = −∞ f (y )dy .
In words, the (100p)th percentile η(p) is the X value such that
there are 100p% X values below η(p).
Probability Plot
Sample Percentile
Recall: The (100p)th percentile of the distribution of a
continuous rv X , denoted by η(p), is defined by
R η(p)
p = F (η(p)) = −∞ f (y )dy .
In words, the (100p)th percentile η(p) is the X value such that
there are 100p% X values below η(p).
Similarly, we can define sample percentile in the same manner,
i.e. the (100p)th percentile xp is the value such that there are
100p% sample values below xp .
Probability Plot
Sample Percentile
Recall: The (100p)th percentile of the distribution of a
continuous rv X , denoted by η(p), is defined by
R η(p)
p = F (η(p)) = −∞ f (y )dy .
In words, the (100p)th percentile η(p) is the X value such that
there are 100p% X values below η(p).
Similarly, we can define sample percentile in the same manner,
i.e. the (100p)th percentile xp is the value such that there are
100p% sample values below xp .
Unfortunately, xp may not be a sample value for some p.
Probability Plot
Sample Percentile
Recall: The (100p)th percentile of the distribution of a
continuous rv X , denoted by η(p), is defined by
R η(p)
p = F (η(p)) = −∞ f (y )dy .
In words, the (100p)th percentile η(p) is the X value such that
there are 100p% X values below η(p).
Similarly, we can define sample percentile in the same manner,
i.e. the (100p)th percentile xp is the value such that there are
100p% sample values below xp .
Unfortunately, xp may not be a sample value for some p.
e.g. for the previous example, what is the 35th percentile for the
ten sample values?
Probability Plot
Probability Plot
Definition
Assume we have a sample with size n. Order the n sample
observations from smallest to largest. Then the ith smallest
observation in the list is taken to be the [100(i − 0.5)/n]th sample
percentile.
Probability Plot
Definition
Assume we have a sample with size n. Order the n sample
observations from smallest to largest. Then the ith smallest
observation in the list is taken to be the [100(i − 0.5)/n]th sample
percentile.
Remark:
1. Why “i − 0.5”?
Probability Plot
Definition
Assume we have a sample with size n. Order the n sample
observations from smallest to largest. Then the ith smallest
observation in the list is taken to be the [100(i − 0.5)/n]th sample
percentile.
Remark:
1. Why “i − 0.5”? We regard the sample observation as being half
in the lower group and half in the upper group.
Probability Plot
Definition
Assume we have a sample with size n. Order the n sample
observations from smallest to largest. Then the ith smallest
observation in the list is taken to be the [100(i − 0.5)/n]th sample
percentile.
Remark:
1. Why “i − 0.5”? We regard the sample observation as being half
in the lower group and half in the upper group.
e.g. if n = 9, then the sample median is the 5th largest
observation and this observation is regarded as two parts: one in
the lower half and one in the upper half.
Probability Plot
Definition
Assume we have a sample with size n. Order the n sample
observations from smallest to largest. Then the ith smallest
observation in the list is taken to be the [100(i − 0.5)/n]th sample
percentile.
Remark:
1. Why “i − 0.5”? We regard the sample observation as being half
in the lower group and half in the upper group.
e.g. if n = 9, then the sample median is the 5th largest
observation and this observation is regarded as two parts: one in
the lower half and one in the upper half.
2. Once the percentage values 100(i − 0.5)/n(i = 1, 2, . . . , n) have
been calculated, sample percentiles corresponding to intermediate
percentages can be obtained by linear interpolation.
Probability Plot
Probability Plot
Example: for the previous example, the [100(i − 0.5)/n]th sample
percentile is tabulated as following:
2.4756
2.5133
2.6030
100(1-.5)/10 = 5%
100(2-.5)/10 = 15% 100(3-.5)/10 = 25%
2.7377
2.7740
100(4-.5)/10 = 35% 100(5-.5)/10 = 45%
2.8970
3.0879
3.1253
100(6-.5)/10 = 55% 100(7-.5)/10 = 65% 100(8-.5)/10 = 75%
3.2546
3.5931
100(9-.5)/10 = 85% 100(10-.5)/10 = 95%
Probability Plot
Example: for the previous example, the [100(i − 0.5)/n]th sample
percentile is tabulated as following:
2.4756
2.5133
2.6030
100(1-.5)/10 = 5%
100(2-.5)/10 = 15% 100(3-.5)/10 = 25%
2.7377
2.7740
100(4-.5)/10 = 35% 100(5-.5)/10 = 45%
2.8970
3.0879
3.1253
100(6-.5)/10 = 55% 100(7-.5)/10 = 65% 100(8-.5)/10 = 75%
3.2546
3.5931
100(9-.5)/10 = 85% 100(10-.5)/10 = 95%
The 10th percentile would be (2.4756 + 2.5133)/2 = 2.49445
Probability Plot
Probability Plot
Idea for Quantile-Quantile Plot:
1. Determine the “[100(i − 0.5)/n]th sample percentile” for a
given sample.
Probability Plot
Idea for Quantile-Quantile Plot:
1. Determine the “[100(i − 0.5)/n]th sample percentile” for a
given sample.
2. Find the corresponding [100(i − 0.5)/n]th percentile from the
population with the assumed distribution; for example, if the
assumed distribution is standard normal, then find
corresponding [100(i − 0.5)/n]th percentile from the standard
normal distribution.
Probability Plot
Idea for Quantile-Quantile Plot:
1. Determine the “[100(i − 0.5)/n]th sample percentile” for a
given sample.
2. Find the corresponding [100(i − 0.5)/n]th percentile from the
population with the assumed distribution; for example, if the
assumed distribution is standard normal, then find
corresponding [100(i − 0.5)/n]th percentile from the standard
normal distribution.
3. Consider the (population percentile, sample percentile) pairs,
i.e.
[100(i − 0.5)/n]th percentile, ith smallest sample
of the distribution
observation
Probability Plot
Idea for Quantile-Quantile Plot:
1. Determine the “[100(i − 0.5)/n]th sample percentile” for a
given sample.
2. Find the corresponding [100(i − 0.5)/n]th percentile from the
population with the assumed distribution; for example, if the
assumed distribution is standard normal, then find
corresponding [100(i − 0.5)/n]th percentile from the standard
normal distribution.
3. Consider the (population percentile, sample percentile) pairs,
i.e.
[100(i − 0.5)/n]th percentile, ith smallest sample
of the distribution
observation
4. Each pair plotted as a point on a two-dimensional coordinate
system should fall close to a 45◦ line.
Probability Plot
Idea for Quantile-Quantile Plot:
1. Determine the “[100(i − 0.5)/n]th sample percentile” for a
given sample.
2. Find the corresponding [100(i − 0.5)/n]th percentile from the
population with the assumed distribution; for example, if the
assumed distribution is standard normal, then find
corresponding [100(i − 0.5)/n]th percentile from the standard
normal distribution.
3. Consider the (population percentile, sample percentile) pairs,
i.e.
[100(i − 0.5)/n]th percentile, ith smallest sample
of the distribution
observation
4. Each pair plotted as a point on a two-dimensional coordinate
system should fall close to a 45◦ line. Substantial deviations
of the plotted points from a 45◦ line cast doubt on the
assumption that the distribution under consideration is the
correct one.
Probability Plot
Probability Plot
Example 4.29:
The value of a certain physical constant is known to an
experimenter. The experimenter makes n = 10 independent
measurements of this value using a particular measurement device
and records the resulting measurement errors (error = observed
value - true value). These observations appear in the following
table.
Percentage
Sample Observation
Percentage
Sample Observation
5
-1.91
55
0.35
15
-1.25
65
0.72
25
-0.75
75
0.87
35
-0.53
85
1.40
45
0.20
95
1.56
Probability Plot
Example 4.29:
The value of a certain physical constant is known to an
experimenter. The experimenter makes n = 10 independent
measurements of this value using a particular measurement device
and records the resulting measurement errors (error = observed
value - true value). These observations appear in the following
table.
Percentage
Sample Observation
Percentage
Sample Observation
5
-1.91
55
0.35
15
-1.25
65
0.72
25
-0.75
75
0.87
35
-0.53
85
1.40
45
0.20
95
1.56
Is it plausible that the random variable measurement error has
standard normal distribution?
Probability Plot
Probability Plot
We first find the corresponding population distribution percentiles,
in this case, the z percentiles:
Percentage
Sample Observation
z percentile
Percentage
Sample Observation
z percentile
5
-1.91
-1.645
55
0.35
0.126
15
-1.25
-1.037
65
0.72
0.385
25
-0.75
-0.675
75
0.87
0.675
35
-0.53
-0.385
85
1.40
1.037
45
0.20
-0.126
95
1.56
1.645
Probability Plot
Probability Plot
Probability Plot
Probability Plot
What about the first example? We are only interested in whether
the ten sample observations come from a normal distribution.
Probability Plot
What about the first example? We are only interested in whether
the ten sample observations come from a normal distribution.
Recall:
{(100p)th percentile for N(µ, σ 2 )} =
µ + {(100p)th percentile for N(0, 1)} · σ
Probability Plot
What about the first example? We are only interested in whether
the ten sample observations come from a normal distribution.
Recall:
{(100p)th percentile for N(µ, σ 2 )} =
µ + {(100p)th percentile for N(0, 1)} · σ
If µ = 0, then the pairs (σ · [z percentile], observation) fall on a
45◦ line, which has slope 1.
Probability Plot
What about the first example? We are only interested in whether
the ten sample observations come from a normal distribution.
Recall:
{(100p)th percentile for N(µ, σ 2 )} =
µ + {(100p)th percentile for N(0, 1)} · σ
If µ = 0, then the pairs (σ · [z percentile], observation) fall on a
45◦ line, which has slope 1.
Therefore the pairs ([z percentile], observation) fall on a line
passing through (0,0) (i.e., one with y -intercept 0) but having
slope σ rather than 1.
Probability Plot
What about the first example? We are only interested in whether
the ten sample observations come from a normal distribution.
Recall:
{(100p)th percentile for N(µ, σ 2 )} =
µ + {(100p)th percentile for N(0, 1)} · σ
If µ = 0, then the pairs (σ · [z percentile], observation) fall on a
45◦ line, which has slope 1.
Therefore the pairs ([z percentile], observation) fall on a line
passing through (0,0) (i.e., one with y -intercept 0) but having
slope σ rather than 1.
Now for µ 6= 0, the y -intercept is µ instead of 0.
Probability Plot
Probability Plot
Normal Probability Plot
A plot of the n pairs
([100(i − 0.5)/n]th z percentile, ith smallest observation)
on a two-dimensional coordinate system is called a normal
probability plot. If the sample observations are in fact drawn from
a normal distribution with mean value µ and standard deviation σ,
the points should fall close to a straight line with slope σ and
y -intercept µ. Thus a plot for which the points fall close to some
straight line suggests that the assumption of a normal population
distribution is plausible.
Probability Plot
Probability Plot
Percentage
Sample Observation
z percentile
Percentage
Sample Observation
z percentile
5
2.4756
-1.645
55
2.8970
0.126
15
2.5133
-1.037
65
3.0879
0.385
25
2.6030
-0.675
75
3.1253
0.675
35
2.7377
-0.385
85
3.2546
1.037
45
2.7740
-0.126
95
3.5931
1.645
Probability Plot
Probability Plot
A nonnormal population distribution can often be placed in one of
the following three categories:
1. It is symmetric and has “lighter tails” than does a normal
distribution; that is, the density curve declines more rapidly
out in the tails than does a normal curve.
2. It is symmetric and heavy-tailed compared to a normal
distribution.
3. It is skewed.
Probability Plot
Probability Plot
Symmetric and “light-tailed”: e.g. Uniform distribution
Probability Plot
Probability Plot
Symmetric and heavy-tailed: e.g. Cauchy distribution with pdf
f (x) = 1/[π(1 + x 2 )] for −∞ < x < ∞
Probability Plot
Probability Plot
Skewed: e.g. lognormal distribution
Probability Plot
Probability Plot
Some guidances for probability plot for normal distributions
(from the book Fitting Equations to Data (2nd ed.) Daniel,
Cuthbert, and Fed Wood, Wiley, New York, 1980)
Probability Plot
Some guidances for probability plot for normal distributions
(from the book Fitting Equations to Data (2nd ed.) Daniel,
Cuthbert, and Fed Wood, Wiley, New York, 1980)
1. For sample size smaller than 30, there is typically greater
variation in the apperance of the probability plot.
Probability Plot
Some guidances for probability plot for normal distributions
(from the book Fitting Equations to Data (2nd ed.) Daniel,
Cuthbert, and Fed Wood, Wiley, New York, 1980)
1. For sample size smaller than 30, there is typically greater
variation in the apperance of the probability plot.
2. Only for much larger sample sizes does a linear pattern generally
predominate.
Probability Plot
Some guidances for probability plot for normal distributions
(from the book Fitting Equations to Data (2nd ed.) Daniel,
Cuthbert, and Fed Wood, Wiley, New York, 1980)
1. For sample size smaller than 30, there is typically greater
variation in the apperance of the probability plot.
2. Only for much larger sample sizes does a linear pattern generally
predominate.
Therefore, when a plot is based on a small sample size, only a very
substantial departure from linearity should be taken as conclusive
evidence of nonnorality.
Probability Plot
Probability Plot
Definition
Consider a family of probability distributions involving two
parameters, θ1 and θ2 , and let F (x; θ1 , θ2 ) denote the
corresponding cdf’s.
The parameters θ1 and θ2 are said to be location and scale
parameters, respectively, if F (x; θ1 , θ2 ) is a function of
(x − θ1 )/θ2 .
Probability Plot
Definition
Consider a family of probability distributions involving two
parameters, θ1 and θ2 , and let F (x; θ1 , θ2 ) denote the
corresponding cdf’s.
The parameters θ1 and θ2 are said to be location and scale
parameters, respectively, if F (x; θ1 , θ2 ) is a function of
(x − θ1 )/θ2 .
e.g.
1. Normal distributions N(µ, σ): F (x; µ, σ) = Φ( x−µ
σ ).
Probability Plot
Definition
Consider a family of probability distributions involving two
parameters, θ1 and θ2 , and let F (x; θ1 , θ2 ) denote the
corresponding cdf’s.
The parameters θ1 and θ2 are said to be location and scale
parameters, respectively, if F (x; θ1 , θ2 ) is a function of
(x − θ1 )/θ2 .
e.g.
1. Normal distributions N(µ, σ): F (x; µ, σ) = Φ( x−µ
σ ).
2. The extreme value distribution with cdf
F (x; θ1 , θ2 ) = 1 − e −e
(x−θ1 )/θ2
Probability Plot
Probability Plot
For Weibull distribution:
α
F (x; α, β) = 1 − e −(x/β) ,
the parameter β is a scale parameter but α is NOT a location
parameter. α is usually referred to as a shape parameter.
Probability Plot
For Weibull distribution:
α
F (x; α, β) = 1 − e −(x/β) ,
the parameter β is a scale parameter but α is NOT a location
parameter. α is usually referred to as a shape parameter.
Fortunately, if X has a Weibull distribution with shape parameter
α and scale parameter β, then the transformed variable ln(X ) has
an extreme value distribution with location parameter θ1 = ln(β)
and scale parameter θ2 = 1/α.
Probability Plot
Probability Plot
The gamma distribution also has a shape parameter α. However,
there is no transformation h(•) such that h(X ) has a distribution
that depends only on location and scale parameters.
Probability Plot
The gamma distribution also has a shape parameter α. However,
there is no transformation h(•) such that h(X ) has a distribution
that depends only on location and scale parameters.
Thus, before we construct a probability plot, we have to estimate
the shape parameter from the sample data.
Download