4
Continuous Random
Variables and
Probability Distributions
Copyright © Cengage Learning. All rights reserved.
4.6
Probability Plots
Copyright © Cengage Learning. All rights reserved.
Probability Plots
An investigator will often have obtained a numerical sample
x1, x2,…, xn and wish to know whether it is plausible that it
came from a population distribution of some particular type
(e.g., from a normal distribution).
For one thing, many formal procedures from statistical
inference are based on the assumption that the population
distribution is of a specified type. The use of such a
procedure is inappropriate if the actual underlying
probability distribution differs greatly from the assumed
type.
3
Probability Plots
For example, the article “Toothpaste Detergents: A
Potential Source of Oral Soft Tissue Damage” (Intl. J. of
Dental Hygiene, 2008: 193–198) contains the following
statement:
“Because the sample number for each experiment
(replication) was limited to three wells per treatment type,
the data were assumed to be normally distributed.”
4
Probability Plots
As justification for this leap of faith, the authors wrote that
“Descriptive statistics showed standard deviations that
suggested a normal distribution to be highly likely.” Note:
This argument is not very persuasive.
Additionally, understanding the underlying distribution can
sometimes give insight into the physical mechanisms
involved in generating the data. An effective way to check a
distributional assumption is to construct what is called a
probability plot.
5
Probability Plots
The essence of such a plot is that if the distribution on
which the plot is based is correct, the points in the plot
should fall close to a straight line.
If the actual distribution is quite different from the one used
to construct the plot, the points will likely depart
substantially from a linear pattern.
6
Sample Percentiles
7
Sample Percentiles
The details involved in constructing probability plots differ a
bit from source to source. The basis for our construction is
a comparison between percentiles of the sample data and
the corresponding percentiles of the distribution under
consideration.
We know that the (100p)th percentile of a continuous
distribution with cdf F() is the number (p) that satisfies
F((p)) = p. That is, (p) is the number on the
measurement scale such that the area under the density
curve to the left of (p) is p.
8
Sample Percentiles
Thus the 50th percentile satisfies , and the 90th percentile
(.5) satisfies F((.5)) = .5, and the 90th percentile satisfies
F((.9)) = .9. Consider as an example the standard normal
distribution, for which we have denoted the cdf by () .
From Appendix Table A.3, we find the 20th percentile by
locating the row and column in which .2000 (or a number
as close to it as possible) appears inside the table.
9
Sample Percentiles
Since .2005 appears at the intersection of the –.8 row and
the .04 column, the 20th percentile is approximately –.84.
Similarly, the 25th percentile of the standard normal
distribution is (using linear interpolation) approximately
–.675 .
Roughly speaking, sample percentiles are defined in the
same way that percentiles of a population distribution are
defined.
10
Sample Percentiles
The 50th-sample percentile should separate the smallest
50% of the sample from the largest 50%, the 90th
percentile should be such that 90% of the sample lies
below that value and 10% lies above, and so on.
Unfortunately, we run into problems when we actually try to
compute the sample percentiles for a particular sample of n
observations. If, for example, n = 10 we can split off 20% of
these values or 30% of the data, but there is no value that
will split off exactly 23% of these ten observations.
11
Sample Percentiles
To proceed further, we need an operational definition of
sample percentiles (this is one place where different people
do slightly different things). Recall that when n is odd, the
sample median or 50thsample percentile is the middle
value in the ordered list, for example, the sixth-largest
value when n = 11.
This amounts to regarding the middle observation as being
half in the lower half of the data and half in the upper half.
Similarly, suppose n = 10.
12
Sample Percentiles
Then if we call the third-smallest value the 25th percentile,
we are regarding that value as being half in the lower group
(consisting of the two smallest observations) and half in the
upper group (the seven largest observations).
13
Sample Percentiles
This leads to the following general definition of sample
percentiles.
Definition
Order the n sample observations from smallest to largest.
Then the ith smallest observation in the list is taken to be
the [100(i – .5)/n]th sample percentile.
Once the percentage values 100(i – .5)/n(i = 1, 2,…, n)
have been calculated, sample percentiles corresponding to
intermediate percentages can be obtained by linear
interpolation.
14
Sample Percentiles
For example, if n = 10, the percentages corresponding to
the ordered sample observations are 100(1 – .5)/10 = 5%,
100(2 – .5)/10 = 15%, 25%,…, and 100(10 – .5)/10 = 95%.
The 10th percentile is then halfway between the 5th
percentile (smallest sample observation) and the 15th
percentile (second-smallest observation).
For our purposes, such interpolation is not necessary
because a probability plot will be based only on the
percentages 100(i – .5)/n corresponding to the n sample
observations.
15
A Probability Plot
16
A Probability Plot
Suppose now that for percentages 100(i – .5)/n(i = 1,…, n)
the percentiles are determined for a specified population
distribution whose plausibility is being investigated.
If the sample was actually selected from the specified
distribution, the sample percentiles (ordered sample
observations) should be reasonably close to the
corresponding population distribution percentiles.
17
A Probability Plot
That is, for i = 1, 2,…, n there should be reasonable
agreement between the ith smallest sample observation
and the [100(i – .5)/n]th percentile for the specified
distribution. Let’s consider the (population percentile,
sample percentile) pairs—that is, the pairs
for i = 1,…, n. Each such pair can be plotted as a point on a
two-dimensional coordinate system.
18
A Probability Plot
If the sample percentiles are close to the corresponding
population distribution percentiles, the first number in each
pair will be roughly equal to the second number. The
plotted points will then fall close to a 45 line .
Substantial deviations of the plotted points from a 45 line
cast doubt on the assumption that the distribution under
consideration is the correct one.
19
Example 29
The value of a certain physical constant is known to an
experimenter. The experimenter makes n = 10 independent
measurements of this value using a particular
measurement device and records the resulting
measurement errors (error = observed value – true value).
These observations appear in the accompanying table.
20
Example 29
cont’d
Is it plausible that the random variable measurement error
has a standard normal distribution? The needed standard
normal (z) percentiles are also displayed in the table.
Thus the points in the probability plot are (–1.645, –1.91),
(–1.037, –1.25),…, and (1.645, 1.56).
21
Example 29
cont’d
Figure 4.33 shows the resulting plot. Although the points
deviate a bit from the 45 line, the predominant impression
is that this line fits the points very well. The plot suggests
that the standard normal distribution is a reasonable
probability model for measurement error.
Plots of pairs (z percentile, observed value) for the data of Example 29:
Figure 4.33
22
Example 29
cont’d
Figure 4.34 shows a plot of pairs (z percentile, observation)
for a second sample of ten observations 45. The line gives
a good fit to the middle part of the sample but not to the
extremes.
The plot has a well-defined
S-shaped appearance. The
two smallest sample
observations are considerably
larger than the corresponding
z percentiles (the points on
the far left of the plot are well
above the 45 line).
Plots of pairs (z percentile, observed value)
for the data of Example 29:
Figure 4.34
23
Example 29
cont’d
Similarly, the two largest sample observations are much
smaller than the associated z percentiles.
This plot indicates that the standard normal distribution
would not be a plausible choice for the probability model
that gave rise to these observed measurement errors.
24
A Probability Plot
An investigator is typically not interested in knowing just
whether a specified probability distribution, such as the
standard normal distribution (normal with  = 0 and  = 1)
or the exponential distribution with  = .1, is a plausible
model for the population distribution from which the sample
was selected.
Instead, the issue is whether some member of a family of
probability distributions specifies a plausible model—the
family of normal distributions, the family of exponential
distributions, the family of Weibull distributions, and so on.
25
A Probability Plot
The values of the parameters of a distribution are usually
not specified at the outset. If the family of Weibull
distributions is under consideration as a model for lifetime
data, are there any values of the parameters  and  for
which the corresponding Weibull distribution gives a good
fit to the data?
Fortunately, it is almost always the case that just one
probability plot will suffice for assessing the plausibility of
an entire family.
26
A Probability Plot
If the plot deviates substantially from a straight line, no
member of the family is plausible. When the plot is quite
straight, further work is necessary to estimate values of the
parameters that yield the most reasonable distribution of
the specified type.
Let’s focus on a plot for checking normality. Such a plot is
useful in applied work because many formal statistical
procedures give accurate inferences only when the
population distribution is at least approximately normal.
27
A Probability Plot
These procedures should generally not be used if the
normal probability plot shows a very pronounced departure
from linearity. The key to constructing an omnibus normal
probability plot is the relationship between standard normal
(z) percentiles and those for any other normal distribution:
percentile for a normal (, ) distribution =  +  
(corresponding z percentile)
Consider first the case  = 0.
28
A Probability Plot
If each observation is exactly equal to the corresponding
normal percentile for some value of , the pairs
(  [ z percentile], observation) fall on a 45 line, which has
slope 1.
This then implies that the (z percentile, observation) pairs
fall on a line passing through (0, 0) (i.e., one with y-intercept
0) but having slope  rather than 1. The effect of a nonzero
value of  is simply to change the y-intercept from 0 to .
29
A Probability Plot
A plot of the n pairs
([100(i – .5)/n]th z percentile, ith smallest observation)
on a two-dimensional coordinate system is called a normal
probability plot. If the sample observations are in fact
drawn from a normal distribution with mean value  and 
standard deviation, the points should fall close to a straight
line with slope and intercept .
Thus a plot for which the points fall close to some straight
line suggests that the assumption of a normal population
distribution is plausible.
30
Example 30
The accompanying sample consisting of n = 20
observations on dielectric breakdown voltage of a piece of
epoxy resin appeared in the article “Maximum Likelihood
Estimation in the 3-Parameter Weibull Distribution (IEEE
Trans. on Dielectrics and Elec. Insul., 1996: 43–55).
The values of (i – .5)/n for which z percentiles are needed
are (1 – .5)/20 = .025, (2 – .5)/20 = .075,…, and .975.
31
Example 30
cont’d
Figure 4.35 shows the resulting normal probability plot. The
pattern in the plot is quite straight, indicating it is plausible
that the population distribution of dielectric breakdown
voltage is normal.
Normal probability plot for the dielectric breakdown voltage sample
Figure 4.35
32
A Probability Plot
There is an alternative version of a normal probability plot
in which the z percentile axis is replaced by a nonlinear
probability axis. The scaling on this axis is constructed so
that plotted points should again fall close to a line when the
sampled distribution is normal. Figure 4.36 shows such a
plot from Minitab for the breakdown voltage data of
Example 4.30.
Normal probability plot of the breakdown voltage data from Minitab
Figure 4.36
33
A Probability Plot
A nonnormal population distribution can often be placed in
one of the following three categories:
1. It is symmetric and has “lighter tails” than does a normal
distribution; that is, the density curve declines more
rapidly out in the tails than does a normal curve.
2. It is symmetric and heavy-tailed compared to a normal
distribution.
3. It is skewed.
34
A Probability Plot
A uniform distribution is light-tailed, since its density
function drops to zero outside a finite interval.
The density function f (x) = 1/[(1 + x2)] for
<x<
heavy-tailed, since declines much less rapidly than
does
.
is
Lognormal and Weibull distributions are among those that
are skewed. When the points in a normal probability plot do
not adhere to a straight line, the pattern will frequently
suggest that the population distribution is in a particular one
of these three categories.
35
A Probability Plot
When the distribution from which the sample is selected is
light-tailed, the largest and smallest observations are
usually not as extreme as would be expected from a
normal random sample.
Visualize a straight line drawn through the middle part of
the plot; points on the far right tend to be below the line
(observed value < z percentile), whereas points on the left
end of the plot tend to fall above the straight line (observed
value > percentile).
36
A Probability Plot
The result is an S-shaped pattern of the type pictured in
Figure 4.34.
Plots of pairs (z percentile, observed value)
for the data of Example 29:
Figure 4.34
37
A Probability Plot
A sample from a heavy-tailed distribution also tends to
produce an S-shaped plot. However, in contrast to the lighttailed case, the left end of the plot curves downward
(observed percentile), as shown in Figure 4.37(a).
Probability plots that suggest a nonnormal distribution:
(a) a plot consistent with a heavy-tailed distribution;
Figure 4.37(a)
38
A Probability Plot
If the underlying distribution is positively skewed (a short
left tail and a long right tail), the smallest sample
observations will be larger than expected from a normal
sample and so will the largest observations.
In this case, points on both ends of the plot will fall above a
straight line through the middle part, yielding a curved
pattern, as illustrated in Figure 4.37(b).
(b) a plot consistent with a positively skewed distribution
Figure 4.37(b)
39
A Probability Plot
A sample from a lognormal distribution will usually produce
such a pattern. A plot of (z percentile, ln(x)) pairs should
then resemble a straight line.
40
Beyond Normality
41
Beyond Normality
Consider a family of probability distributions involving two
parameters, 1 and 2, and let F(x; 1 and 2) denote the
corresponding cdf’s. The family of normal distributions is
one such family, with 1 = , and 2 =  and
F(x; ,  ) =  [(x – )/]. Another example is the Weibull
family, 1 =  with 2 = , and
F(x; , ) = 1 –
Still another family of this type is the gamma family, for
which the cdf is an integral involving the incomplete gamma
function that cannot be expressed in any simpler form.
42
Beyond Normality
The parameters 1 and 2 are said to be location and
scale parameters, respectively, if F(x; 1 and 2) is a
function of (x – 1)/2. The parameters  and  of the
normal family are location and scale parameters,
respectively.
Changing  shifts the location of the bell-shaped density
curve to the right or left, and changing  amounts to
stretching or compressing the measurement scale (the
scale on the horizontal axis when the density function is
graphed). Another example is given by the cdf
F(x; 1, 2) = 1 –
<x<
43
Beyond Normality
A random variable with this cdf is said to have an extreme
value distribution. It is used in applications involving
component lifetime and material strength.
Although the form of the extreme value cdf might at first
glance suggest that 1 is the point of symmetry for the
density function, and therefore the mean and median, this
is not the case.
Instead, P(X  1) = F(1; 1, 2) = 1 – e–1 = .632, and the
density function f(x; 1, 2 ) 5 F (x; 1, 2 ) is negatively
skewed (a long lower tail).
44
Beyond Normality
Similarly, the scale parameter 2 is not the standard
deviation ( = 1 – .57722 and  = 1.2832). However,
changing the value of 1 does change the location of the
density curve, whereas a change in 2 rescales the
measurement axis.
The parameter  of the Weibull distribution is a scale
parameter, but  is not a location parameter. A similar
comment applies to the parameters  and  of the gamma
distribution.
45
Beyond Normality
In the usual form, the density function for any member of
either the gamma or Weibull distribution is positive for x > 0
and zero otherwise. A location parameter can be introduced
as a third parameter (we did this for the Weibull
distribution) to shift the density function so that it is positive
if x > y and zero otherwise.
When the family under consideration has only location and
scale parameters, the issue of whether any member of the
family is a plausible population distribution can be
addressed via a single, easily constructed probability plot.
46
Beyond Normality
One first obtains the percentiles of the standard
distribution, the one with 1 = 0 and 2 = 1, for percentages
100(i – .5)/n (i = 1,…, n).
The n (standardized percentile, observation) pairs give the
points in the plot. This is exactly what we did to obtain an
omnibus normal probability plot. Somewhat surprisingly,
this methodology can be applied to yield an omnibus
Weibull probability plot.
47
Beyond Normality
The key result is that if X has a Weibull distribution with
shape parameter  and scale parameter , then the
transformed variable ln(X) has an extreme value
distribution with location parameter 1 = ln() and scale
parameter 1/.
Thus a plot of the (extreme value standardized percentile,
ln(x)) pairs showing a strong linear pattern provides support
for choosing the Weibull distribution as a population model.
48
Example 31
The accompanying observations are on lifetime (in hours)
of power apparatus insulation when thermal and electrical
stress acceleration were fixed at particular values (“On the
Estimation of Life of Power Apparatus Insulation Under
Combined Electrical and Thermal Stress,” IEEE Trans. on
Electrical Insulation, 1985: 70–78).
A Weibull probability plot necessitates first computing the
5th, 15th, . . . , and 95th percentiles of the standard
extreme value distribution. The (100p)th percentile (p)
satisfies
p = F((p)) = 1 –
49
Example 31
cont’d
from which (p) = ln[–ln(1 – p)].
50
Example 31
cont’d
The pairs (–2.97, 5.64), (–1.82, 6.22),…, (1.10, 7.67) are
plotted as points in Figure 4.38. The straightness of the plot
argues strongly for using the Weibull distribution as a
model for insulation life, a conclusion also reached by the
author of the cited article.
A Weibull probability plot of the insulation lifetime data
Figure 4.38
51