Uploaded by todd

RethinkingControlChartDesignandEvaluation3-17-19

advertisement
Rethinking control chart design and evaluation (3/17/19)
(To appear in Quality Engineering)
William H. Woodalla and Frederick W. Faltina
a
Department of Statistics, Virginia Tech, Blacksburg, VA 24061-0439, USA
ABSTRACT
We discuss some practical issues involving the control of the number of false alarms in
process monitoring. This topic is of growing importance as the number of variables being
monitored and the frequency of measurement increase. An alternative formulation for
evaluating and comparing the performance of control charts is given based on defining incontrol, indifference and out-of-control regions of the parameter space. Methods are
designed so that only changes of practical importance are to be detected quickly. This
generalization of the existing framework makes control charting much more useful in
practice, especially when many variables are being monitored. It also justifies to a greater
extent the use of cumulative sum (CUSUM) methods.
KEYWORDS average run length; cumulative sum chart; false alarm rate; hypothesis
testing; practical significance; statistical significance; statistical process monitoring
Introduction
We have two goals. First, we would like to offer a rationale for practitioners to justify
modifying the standard control charting methods, using engineering and subject matter
knowledge, so that signals have both practical importance and statistical significance.
Second, we want to encourage researchers to work on methods that can be desensitized, if
necessary, to avoid the quick detection of small process shifts which may not be of practical
importance in applications. Many of the ideas we present are not entirely new, as our
citations to past work demonstrate. They are, however, of increasing relevance.
Many practitioners and researchers tend to think solely in terms of false alarm rates, or
related performance metrics, of individual charts. Understandable as that may be in light
of the historical focus of training materials and the literature, it can be dangerously
misleading in environments where systems of many control charts are being used
simultaneously. In such instances, a multiple testing effect comes into play which can make
the probability of obtaining false alarms quite high. The estimation of overall false alarm
rates for many-chart systems is difficult, as the covariance structure among the variables
being monitored may be complex and is almost certainly highly applicationspecific. Nonetheless, as discussed by Faltin (1986), it is straightforward to show that for
even modestly sized systems of independent charts, the probability of at least one false
1
alarm can be close to one. We can ill afford to ignore this effect, as such systems of charts
are increasingly common. The easiest way to reduce this adverse effect may be to redesign
as many charts in the system as possible to have sharply lower false alarm rates. One
important step in this direction is to ensure that only genuinely important changes are to be
detected. We offer an approach to accomplish this.
It is assumed our paper that the reader is familiar with the basics of control charting,
e.g., in the construction of X-bar and other types of basic Shewhart charts, the use of runs
rules and the use of cumulative sum (CUSUM) and exponentially weighted moving
average (EWMA) charts. Background on these methods can be found in Montgomery
(2013).
Process monitoring is typically broken up into two phases. In Phase I a historical set of
data that has been collected over time is analyzed to understand the process variation. Often
a great deal of process understanding and process improvement results from the insights
obtained during Phase I. Jones-Farmer et al. (2014) discussed many of the issues that arise
in Phase I, one end result of which is the fitting of some model of the variation and the
estimation of the in-control parameters of the fitted model. Data are collected over time,
sample-by-sample, in Phase II and one checks to see if there has been some deviation from
the estimated in-control model.
As stated by Montgomery (2013, p. 191), there is a close connection between the use of
control charts and hypothesis testing. A false alarm with a control chart is frequently
referred to as a Type I error, hypothesis testing terminology. The connection between
control charting and hypothesis testing was discussed in detail by Woodall and Faltin
(1996) and Woodall (2000). In much of the research on control charts it is assumed that
the values of the in-control parameters are known. Under this assumption and the
assumption of independence of the data over time, the basic Shewhart chart in Phase II can
be viewed as being equivalent to a sequence of hypothesis tests. As pointed out by Page
(1954), the one-sided CUSUM chart can be viewed as a sequence of the sequential
probability ratio tests (SPRTs) developed by Wald (1947). Because of the relationship
between control charting and hypothesis testing, control charting shares some of the
shortcomings of hypothesis testing. How to address some of these shortcomings is a focus
of this paper.
Some of the metrics used to characterize control chart statistical performance are
discussed in the next section. In the following section, we discuss some of the alternative
methods that can be used when some slack is allowed in the process. The goal is to have
control chart signals associated with practical significance, not just statistical significance.
A more general framework is described for comparing control chart performance.
Conclusions are given in our last section.
2
Characterizing Performance
Commonly used metrics
A fundamental principle in statistical process monitoring is that there should be some
control of the rate at which false alarms occur. A false alarm is defined as a control chart
signal when the process is in the in-control state. In the literature, any deviation of a model
parameter from its assumed in-control value is usually assumed to be a shift to the out-ofcontrol state, indicating the presence of an assignable cause of variation.
Various metrics have been used for the evaluation of the statistical properties of control
charts. A number of these metrics were reviewed by Frisén (1992), Fraker et al. (2008),
and Kenett and Pollack (2012).
For a basic Shewhart control chart one can specify the probability of a false alarm at
each sampling point. One must be careful, however, because this metric does not apply to
charts that accumulate information over time such as the CUSUM and EWMA charts or
when runs rules are used with a Shewhart chart, as discussed by Adams et al. (1992) and
Woodall and Adams (1993). Sometimes the probability of a false alarm within a given time
period is used as the metric of interest.
Recently, the conditional false alarm rate has proved useful, where this metric is defined
as the probability of a false alarm at a particular time given no previous false alarm. The
conditional false alarm rate is analogous to the hazard function in reliability theory. This
metric can be used to determine the control limits for any type of chart. The conditional
false alarm rate metric is particularly useful when the in-control parameter value varies
over time due to a varying covariate such as the sample size. For more information on this
metric, see Margavio et al. (1995), Shen et al. (2013), Zhang and Woodall (2015) and
Driscoll and Woodall (2019).
Most frequently, however, the average run length (ARL) metric is used, where the run
length is defined as the number of samples required before a control chart signal is given.
If the time interval between samples varies, as in observing a sequence of assumed
exponential random variables, then the average time until a signal (ATS) is more
appropriate. From a practical point of view, it is the time between false alarms that is of
more relevance. As the sampling frequency increases, a given ARL value will correspond
to a smaller ATS value. If sampling occurs regularly every h time periods, then we have
ATS = h ARL.
Values of in-control metrics can be specified with exactness under the assumption of
known in-control parameters. There has been quite a bit of research recently, however,
showing that estimation of these in-control parameter values causes a surprisingly large
amount of variation in in-control false alarm rates and in-control ARLs. These metrics
3
become random variables when in-control parameters are estimated. See, for example,
Gandy and Kvaløy (2013), Saleh et al. (2015) and Loureiro et al. (2018). An alternative
approach is to control instead the probability that the false alarm probability exceeds a
specified value or to control the probability that the in-control ARL is less than a given
value.
Precise determinations of false alarm rates and other metrics rely on the adequacy of the
model used. Practitioners may not require precise values of the false alarm rate, but need
at least some idea how frequently a false alarm would be expected to occur.
Selection of in-control metric values
It seems that very little guidance has been given for selecting a particular value of the
in-control metric of interest such as the in-control ARL value. There must be some
determination of the acceptable rate of false alarms. The standard X-bar control chart is
widely used in practice. The in-control ARL of a basic 3-sigma Shewhart X-bar chart under
the assumptions of normality and known in-control mean and variance is 370. If one uses
the Western Electric runs rules, then Champ and Woodall (1987) showed that the in-control
ARL decreases to 92. This performance may be acceptable in some cases, but unacceptable
in others.
As a practical matter, the selection of control limits, with their associated in- and outof-control ARLs, reflects a judgment regarding the relative consequences of false alarms,
versus those of allowing a process change to go undetected for a time. In our experience,
both of these types of ill effects are often underappreciated in practice. The damage done
by false alarms in terms of both cost and loss of confidence tends to be brushed aside.
Relatively few practitioners appreciate that even a correctly functioning chart may take
quite a while to detect a process change, for example a sustained shift of modest size.
Rather, there is a propensity to apply whatever limits have been most commonly studied,
and to assume them to be reasonable, rather than carefully thinking through the alternatives
and making an informed choice.
There is always a tradeoff in false alarm rate reduction and quick detection of process
changes. As Freund (1960) wrote, “We can always avoid looking for non-existent trouble
by never looking for anything at all. That is, we must balance the risks of calling wolf
unnecessarily against the risks of not calling it until the sheep are being digested.”
As more and more variables are measured in practical applications, and as the frequency
of measurement increases, it becomes increasingly important to keep the number of false
alarms at a manageable number. If there are too many false alarms, then alarms tend to be
ignored and the monitoring becomes ineffective. If alarms result in process adjustments,
then false alarms can be harmful since adjusting a process that does not require it can
increase the process variation. In applications where hundreds or thousands of variables
4
are being monitored, one can be overwhelmed by alarms, false or not, and some method
for the prioritization of alarms is required.
There is a growing need for more of a focus on false alarms and their effects. Crichton
and Faltin (2018) wrote in the context of pharmaceutical manufacturing, “In our
observation, the potential for false alarms (and their consequences) is generally not well
understood.” The rate of false alarms should be tied to their consequences. Because the
consequences can vary significantly from application to application, it seems reasonable
that standard methods be adapted to reflect this fact.
As examples, due to the consequences of a false alarm, Blackstone (2004) reported that
Society of Cardiothoracic Surgeons of Great Britain and Ireland required 9999:1 odds
against a sequence of adverse events being a false alarm before declaring that there was an
indication of poor medical performance. In the automobile on-board monitoring system
described by Box et al. (2000), even a false alarm probability of 0.0005 per trip was
considered excessively large for adequate control of the false alarm probability for the
lifetime of a vehicle.
Clearly, the acceptable rate of false alarms depends on the application and the
consequences. A false alarm, sometimes referred to as a nuisance alarm, with a home
smoke alarm system can be irritating, but not destructive. One could tolerate only a very
low false alarm rate, however, with a heat-sensitive sprinkler system. The odds of a
sprinkler turning on accidentally has been reported by Surrey Fire and Rescue (2019) to be
about 1 in 500,000 per year of service, which they state is comparable to the odds of being
struck by lightning. There are so many nuisance alarms with car alarm systems that most
are ignored to the extent that it is possible to ignore an irritating, loud noise.
A much more serious issue is the rate of false alarms in hospital intensive care units.
Variables such as oxygen saturation in the blood, heart rate, heart electrical tracing, blood
pressure, temperature and fluid status are monitored continuously. Monitoring devices
notify members of the care team when a measurement is outside of an acceptable range or
when a sensor becomes inoperative, perhaps due to patient movement. Cvach (2012)
reported that patients, families, and staff may be exposed to as many as 700 monitor alarms
per patient per day. Görges et al. (2009) estimated that up to 94% of the alarms in the
intensive care environment are false alarms. In their review paper on this subject Imhoff
and Kuhls (2006) stated that these excessively frequent false alarms are not only a nuisance
for both patients and caregivers; they can also compromise patient safety and the
effectiveness of patient care.
False alarms are also an issue in public health biosurveillance. Shmueli and Burkom
(2010) wrote:
5
The current reality is that users of some large systems see alarms nearly
every day, because of the large number of data streams and regions
monitored and also because the methods are not appropriately adjusted
for the variations in time series characteristics among these methods.
This frequent alerting leads users to ignore alarms and instead use the
system to examine the data heuristically. As one user commented, “We
hope that when there is a true alarm, we will not ignore it.” One
statistical reason for this phenomenon is inadequate handling of the
nonstationary raw data, for example, not accounting for the day-of-theweek effect or ignoring autocorrelation. Another reason is multiple
testing.
These types of issues are becoming more relevant in process monitoring applications
generally. Multiple testing refers to the monitoring of many variables. Even though the
probability of a false alarm at a particular time point with any single variable might be low,
the collective probability of false alarms system-wide can be quite high.
Montgomery (2013, p. 194) stated that an out-of-control action plan (OCAP) should
accompany the use of control charts. The OCAP indicates what actions should be taken
after an activating event, usually a control chart signal. The action taken could vary from
simply increasing the attention paid to the variable being monitored to a process shutdown
and recalibration of equipment. False alarms that trigger investigations can be costly. As
the cost of a false alarm increases, it makes sense to try to reduce their number. As
Shewhart (1939, p. 36) wrote, “Even in trying to keep the probability of looking for
assignable causes when they are not present below some limiting value, it is necessary to
make some considered choice depending largely upon the costliness of thus looking
unnecessarily for trouble.”
Rethinking Control Chart Design
A three-region approach
In control chart theory, competing methods are often compared on the basis of ARL
performance. It is most common to set the in-control ARLs of competing methods equal
and the method with the lower out-of-control ARL for a given sustained shift in the
parameter is considered to be the better method for that shift. In practice, however, one
would only want to react when a process change is sufficiently large for there to be
justification for a reaction. It is not realistic to assume that any deviation, however small,
should be detected as quickly as possible. As Box et al. (2003-2004) stated, “The idea that
a monitored system (plant) is either “good” or “bad” is too simplistic.” Freund (1960)
wrote, “Many assignable causes produce effects that are real, but small and unimportant.
6
In fact, it is often completely uneconomical to detect these effects, much less worry about
how to correct them.”
This situation is closely related to the difference between statistical significance and
practical significance in the context of hypothesis testing. There has been much criticism
of the use of hypothesis testing and p-values in science. This resulting debate motivated
the American Statistical Association to issue a clarifying statement on the use and misuse
of hypothesis testing and p-values. See Wasserstein and Lazar (2016). Doganaksoy et al.
(2017) and Snee and Hoerl (2018) also addressed this issue. A key point in the process
monitoring context is that observing a sample that is inconsistent with the assumed incontrol model does not necessarily imply that there is a process change of practical
importance. This issue becomes more relevant as the sample size increases at each
sampling point because then very small process changes can be detected with high
probability.
In applications there will often be a range of parameter values for which a signal would
not be desired, the in-control region, and a range of values for which one would want a
quick signal, the out-of-control region. It is reasonable to have a region of parameter values
in between these two regions where one would be indifferent. Box et al. (2003-2004)
referred to the intermediate region as “no man’s land” whereas Freund (1960) and Woodall
(1985) referred to it as the “indifference zone.”
This three-region approach was taken by Ewan and Kemp (1960), Freund (1957, 1960),
Woodall (1985, 1986), Box et al. (2000, 2003-2004), and Yashchin (1985, 2018). Under
this formulation one would want suitably large ARLs for parameter values in the in-control
region and suitably low values in the out-of-control region. This three-region approach was
a component in the IBM implementation of process monitoring discussed by Yashchin
(2018), where engineering input was used to determine the three regions of parameter
values for each variable and a method for the prioritization of alarms was implemented.
We note in some applications it may be more important to detect process shifts in one
direction with greater sensitivity than the other or only shifts in one direction. The incontrol, indifference and out-of-control regions can be adjusted for these cases. The regions
do not have to be symmetric about a target value.
This approach also admits as a special case a two-zone approach which combines the
in-control and indifference zones into a single “deadband”, and strives for the quickest
possible detection beyond this range. This approach was used with great success in an
automated plant monitoring system comprised of tens of thousands of charts, as described
by Faltin and Tucker (1991). Bissell (1990) also favored a two-zone approach with a target
zone specified for high capability or precision processes.
7
In the standard approach to comparing methods the in-control region consists of a single
point and there is effectively no indifference zone. The theoretical, but unattainable,
“optimal” method would then be a chart that doesn’t signal when the process is exactly
centered on target, but signals immediately for any other value. Such a method would be
inapplicable in practice because even the slightest deviation from target would be detected
immediately. There is a problem with theory when the resulting optimal method has no
practical value. On the other hand, a chart that doesn’t signal for parameter values in a
specified in-control region and signals immediately for values in the specified out-ofcontrol region would be highly desirable in practice.
Figure 1 shows an illustration of the regions used by Box et al. (2000, 2003-2004) when
developing methods for the monitoring of a system that is expected to deteriorate over time.
Ewan and Kemp (1960), Woodall (1985), Hawkins and Olwell (1998, pp. 62-65), Box et
al. (2003-2004) and Yashchin (1993, 2018) all recommended that in the design of a onesided CUSUM chart the reference value should be halfway between the parameter value
indicating the worst acceptable performance and the parameter value representing the best
unacceptable value.
[Insert Figure 1 here.]
Page (1961) envisioned the issue with respect to the increased ARL sensitivity of the
CUSUM chart relative to the Shewhart chart for small process shifts, writing:
This sensitivity is an advantage for processes needing precise
control, but is a definite drawback where some slack in the process
is permissible. In these cases, cumulative sum charts should not be
used or should be used with greater thought. It is easy to see that
continued production at a value away from the nominal target will
cause much more frequent interruption of the process by the sum
chart, although the quality of the process is acceptable; to some
extent the nuisance can be avoided by choosing a scheme with a
very large A.R.L. at a nominal target value in the center of the
“slack” region.
Examples and implementation
Consider the situation where we observe observations under the assumption of
normality and independence. The in-control value of the mean and variance are assumed
to be known here, but would be estimated in practice. Let δ represent the size of the
sustained shift in the mean. We let θ represent the size of the sustained shift in the mean in
terms of the standard error of the sample mean, so θ = n1/2δ/σ, where n is the sample size.
8
We base our CUSUM methods on the standardized sample means, i.e., the target is
subtracted from the sample means and then one divides by the standard error. We refer to
these variables as Z1, Z2, Z3, … . Note that the standardized sample means have a mean of
zero and a standard deviation of 1 when the process is centered on target. The standardized
CUSUM chart was described by Montgomery (2013, p. 424).
The CUSUM statistics are defined as
Ui = max (0, Ui-1 + Zi – k)
Li = min (0, Li-1 + Zi + k), i = 1, 2, 3, … ,
where U0 = L0 = 0, and the reference value k > 0. A signal is given indicating a process
shift when Ui > h or Li < -h, where h > 0 is referred to as the decision interval.
As an example, suppose our target value is 100, σ =8 and n =4. We do not want to detect
shifts in the mean within the interval (97.6, 102.4) quickly, but would like to detect shifts
to beyond 104.8 or below 95.2 quickly. Then the in-control region boundary value based
on standardized shifts is θ0 = (102.4 – 100)/(8/41/2) = 0.6 and the out-of-control region
boundary is θ1 = (104.8 – 100)/(8/41/2) = 1.2.
In Table 1, reproduced from Woodall (1985), M0, (θ < θ0 = 0.6), represents the in-control
region, the symbol I represents the indifference region and M1 (θ > θ1 = 1.2) represents the
out-of-control region where quick detection is desired. The zero-state ARL performance of
the two-sided CUSUM chart based on standardized sample means with reference value k
= (θ0 + θ1)/2 = 0.9 and decision limit h = 4.65 is compared to the two-sided Shewhart chart
with control limit multiplier 3.09. This value of h was chosen so that the two methods have
roughly the same ARL value at the in-control boundary θ0.
[Insert Table 1 here.]
We want large values of the ARL in M0 and small values in M1. From Table 1 we can
see that the performance of the CUSUM chart dominates that of the Shewhart chart except
for large shifts in the process mean where performance is quite similar. In particular, note
that there will be far fewer false alarms for the CUSUM chart. The CUSUM chart is easily
desensitized to handle this more general definition of in-control and out-of-control. Other
methods, such as the EWMA chart, are not so readily adaptable.
Note that use of the CUSUM reference value k = (θ0 + θ1)/2 is equivalent to using a
nominal target value of θ0 and designing the CUSUM chart with k = (θ1 – θ0)/2 to optimally
detect a shift to θ1. If needed, software to determine the value of the control limit h to
achieve specified in-control ARL performance or to determine the ARL for a given shift
in the mean and values of k and h has been provided by Knoth (2018). In addition, software
provided by Hawkins (2019) can be used to design and evaluate CUSUM charts.
9
With the standard CUSUM chart the in-control region consists of a single point and the
best unacceptable parameter value θ1 is specified, leading to the reference value of θ1/2.
The most common choice is k = 0.5. This is the default value in Minitab and JMP software,
but it can be easily changed to any value of k = (θ0 + θ1)/2. The parameter values between
zero and θ1 are implicitly assumed in the standard approach to comprise an indifference
region although some treat it as an out-of-control region. Duncan (1986, p. 529) wrote,
“For much smaller shifts (than θ1) the cusum chart has considerably smaller ARLs, but this
is a disadvantage since it means that there will be more searches for relatively unimportant
assignable causes than is presumably desired.”
The CUSUM chart with k = θ1/2= 0.6 and h = 6.858 yields an in-control ARL of
10,405.15 to match that of the modified CUSUM chart in Table 1. Its ARL of 12.0 when θ
= 1.2 would be slightly lower than that of the modified CUSUM chart, but the number of
false alarms would be more than doubled when θ = 0.6 with an ARL of 64.37.
In many applications θ0 and θ1 will be much larger than in this example. The change
from standard practice would be more pronounced in these cases. Suppose, for example,
the target value was 100, σ = 0.2 and n = 1. Suppose we don’t want to detect a shift as long
as the mean is in the range 99.4 to 100.6, but would like to detect shifts in the mean below
99.0 or above 101 as quickly as possible. We would have θ0 = 3 and θ1 =5. The usual
Shewhart X-bar chart would have an in-control ARL value at θ = 0 of 370 and an ARL
value of only 2 at θ0 = 3, but with an almost immediate detection at θ1 =5. The standardized
CUSUM chart with k = 4 and h = 2 would have ARL = 258.7 at θ0 = 3 and ARL = 2.74 at
θ1 =5. Virtually no signals would be given if the process were on target at θ = 0. In our
view the performance of the CUSUM chart would be much preferred in order to prevent
an excessive number of alarms when the process mean remained in the in-control region.
Modified and acceptance control charts
A situation in which some slack is often allowed in the process occurs when the process
is highly capable. This scenario is illustrated in Figure 2, which was reproduced from
Crichton and Faltin (2018). The control chart shows a number of signals of instability, but
the specification limits are so wide relative to the process variation that attention should be
directed to other, more pressing, problems. In these types of situations, the modified control
chart of Hill (1956) or the acceptance control chart of Freund (1957, 1960) is often
recommended.
[Insert Figure 2 here.]
The control limits of the modified and acceptance Shewhart-type control charts are
widened so that the mean does not drift so far as to cause the proportion of nonconforming
product to exceed a specified value. Thus, worst acceptable values are given for the process
mean. The acceptance control chart has an additional stipulation on the power of the chart
10
to detect a specified adverse shift, the best unacceptable values, and is based on a three
region approach. The proportion of non-conforming product is p0 or less in the in-control
region and p1 or higher in the out-of-control region, where p0 < p1. Freund (1960) referred
to the resulting regions as an acceptable zone, indifference zones and rejectable zones in
the two-sided specification case.
In the present quality environment, however, the modified and acceptance charts can
allow the process mean to move too far toward the specification limits. In the example of
the design of an acceptance control chart in Montgomery (2013, p. 457), the acceptable
fraction non-conforming was set to 1%. This would be much too large a value in many
applications. One could, of course, use much lower values for the acceptable and
unacceptable proportions non-conforming, but then the adequacy of the normal distribution
would come into play. It is better to use engineering knowledge, including specification
limit information, to determine the three regions. Freund (1962) stated that the rejectable
zone could be determined by locating the level at which only a given fraction of individual
items would exceed the specification limits, by experience, or by edict.
Note that the international standard ISO 7870-3:2012 provided guidance on the uses of
acceptance control charts and established general procedures for determining sample sizes
and the control limits. It is pointed out that this type of chart should be used only when the
within subgroup variation is in-control, variation is estimated efficiently, and a high level
of process capability has been achieved. Examples were included to illustrate when this
technique has advantages and to provide details of implementation.
The use of an in-control region so that only shifts of practical importance are detected
quickly is analogous to some extent to the use of hypothesis testing when the null
hypothesis contains an interval of values that are not considered of practical importance.
Blume et al. (2018) proposed such a hypothesis testing approach in an effort to alleviate
some of the shortcomings of standard hypothesis testing in which the null hypothesis
contains only a single point. This approach leads to what is referred to as the minimaleffects test described by Lakens et al. (2018).
The use of control limits wider than Shewhart’s three-sigma limits is not without
detractors. It should be noted that Deming (1986, p. 369) referred to modified control limits
as “bear traps”. In his index Deming (1986, p. 501) wrote “Modified limits, never.” Deming
(1986, p. 49) wrote, “With continual improvement, the distributions of parts, materials, and
service become so narrow that specifications are lost beyond the horizon.” This is the ideal
situation, but serves only as a goal in many applications.
Wheeler and Chambers (1992, p. 15) referred to a capable, but out-of-control process,
as on “the brink of chaos.” In their section “The Problem of Modified Control Limits” they
opposed the use of modified limits stating that the use of modified limits is contrary to
11
continual improvement, encourages alternating periods of benign neglect and intense
panic, allows assignable causes to come and go without detection and makes extrapolation
to future process performance unsafe. These can be valid concerns, but in a situation such
as that illustrated in Figure 2, it is likely that other processes merit much higher priority.
Shewhart chose three-sigma limits because he considered them to be reasonable based
on economic grounds. Shewhart (1931, p. 277) stated that this multiple of sigma “seems to
be an acceptable economic value.” Because of the changes in data collection and the
manufacturing environment, what was of economic value in 1931 is not necessarily of
economic value now.
Graphical approaches to prioritization of effort
When monitoring many variables, is becomes necessary to prioritize alarms. This topic
is beyond the scope of our paper, but we note that Sall (2018) and Jensen et al. (2019) have
proposed graphical methods for prioritizing improvement efforts when many variables are
being monitored. They plotted capability indices against stability indices reflecting the
ratio of long-term variation relative to short-term variation. Larger values of the stability
index indicate greater process instability. The plot given in the case study of Jensen et al.
(2019) is shown in Figure 3, where the size of a dot indicates the volume of product
produced over the time period of interest. Improvement efforts are directed toward the
processes that are not capable.
[Insert Figure 3 here.]
Final Comments
In our view, much more attention should be given to the rate and consequences of false
alarms resulting from process monitoring, not only at the level of individual charts, but at
the system level, where appropriate. Appropriate modeling of autocorrelation and other
process behavior, as needed, can reduce misleading signals from control charts. Modifying
the standard methods to lower the expected number of false alarms can have a minimal
effect on the speed of detection of process changes, and in many applications this tradeoff
will pay substantial dividends.
We encourage practitioners to use engineering knowledge with the three region
approach to design their process monitoring methods. We encourage researchers to put
greater focus on methods that can be desensitized, if necessary, in practical applications.
The current approach to evaluating and comparing methods leads to methods with greater
and greater sensitivity in detecting very small shifts in a process. This sensitivity can
become a liability. The goal is to have control chart signals associated with practical
12
significance, not just statistical significance. Many of the ideas presented in our paper have
been suggested in some form before, but are now of increasing urgency as more and more
variables are being monitored, often at high frequencies.
There are many research questions resulting from the use of a three region approach,
since it can be applied in many types of applications and with many types of charts. In
particular, it remains to be seen, for example, how multivariate methods could be
desensitized when using a three region approach based on values of the non-centrality
parameter.
Acknowledgements
We appreciate the helpful comments of Jennifer Van Mullekom of Virginia Tech.
References
Adams, B. M., C. A. Lowry, and W. H. Woodall. 1992. The use (and misuse) of false alarm
probabilities in control chart design. Frontiers in Statistical Quality Control 4, eds.
Lenz, H. -J., Wetherill, G. B., and Wilrich, P. -Th., Physica-Verlag Heidelberg, pp. 155168.
Bissell, A. F. 1990. Control charts and cusums for high precision processes. Total Quality
Management 1(2):221-228.
Blackstone, E. H. 2004. Monitoring surgical performance. Journal of Thoracic and
Cardiovascular Surgery 128:807-825.
Blume, J. D., R. A. Greevy, V.F. Welty, J. R. Smith, W. D. Dupont. 2018. An introduction
to second-generation p-values. To appear in The American Statistician.
Box, G., S. Bisgaard, S. Graves, M. Kulahci, K. Marko, J. James, J. Van Gilder, T. Ting,
H. Zatorski, and C. Wu. 2003-2004. Performance evaluation of dynamic monitoring
systems: the waterfall chart. Quality Engineering 16 (2):183-191.
Box, G., S. Graves, S. Bisgaard, J. Van Gilder, K. Marko, J. James, M. Seifer, M. Poublon,
and F. Fodale. 2000. Detecting malfunctions in dynamic systems. In Transactions of
the Society of Automotive Engineers: Electronic Engine Controls 2000, Modeling,
Neural Networks, OBD, and Sensors (SP-1501).
Champ, C. W. and W. H. Woodall. 1987. Exact results for Shewhart control charts with
supplementary runs rules. Technometrics 29:393-399.
Crichton, J. and F. W. Faltin. 2018. Making CPV a proactive component of process and
product improvement. Pharmaceutical Engineering 38 (4):49-55.
13
Cvach, M. 2012. Monitor alarm fatigue – an integrative review. Biomedical
Instrumentation & Technology 46(4):268-277.
Deming, W. E. 1986. Out of the Crisis. Massachusetts Institute of Technology, Center for
Advanced Engineering Study, Cambridge, Mass.
Doganaksoy, N., G. J. Hahn, and W. Q. Meeker. 2017. Fallacies of statistical significance.
Quality Progress November,56-62.
Driscoll, A. and W. H. Woodall. 2019. Use of the conditional false alarm rate metric in
statistical process monitoring. Paper to be presented at the XIIIth International
Workshop on Intelligent Statistical Quality Control, Hong Kong, August 13 -15, 2019.
Duncan, A. J. 1986. Quality Control and Industrial Statistics, Fifth Edition, Irwin:
Homewood, IL.
Ewan, W. D. and K. W. Kemp. 1960. Sampling inspection of continuous processes with
no autocorrelation between successive results. Biometrika 47 (3 and 4):363-380.
Faltin, F. W. 1986. Run length properties of multi-chart control systems, invited address
at the Second National Symposium on Statistics in Automated Manufacturing, Arizona
State University, Tempe, AZ.
Faltin, F. W. and W. T. Tucker. 1991, On-line quality control for the factory of the ‘90’s
and beyond, Chapter in Statistical Process Control in Manufacturing, edited by J. B.
Keats and D. C. Montgomery, Marcel Dekker, New York.
Fraker, S. E., W. H. Woodall, and S. Mousavi. 2008. Performance metrics for surveillance
schemes. Quality Engineering 20:451-464.
Freund, R. A. 1957. Acceptance control charts. Industrial Quality Control 14(4):13-23.
Freund, R.A. 1960. A reconsideration of the variables control chart with special reference
to the chemical industries. Industrial Quality Control 16(11):35-41. (May)
Freund, R. A. 1962. Graphical process control. Industrial Quality Control 18(7):1-8.
Frisén, M. 1992. Evaluations of methods for statistical surveillance. Statistics in Medicine
11:1489-1502.
Gandy, A. and J. T. Kvaløy. 2013. Guaranteed conditional performance of control charts
via bootstrap methods. Scandinavian Journal of Statistics 40:647–668.
Görges, M., B. A. Markewitz, and D. R. Westenskow. 2009. Improving alarm performance
in the medical intensive care unit using delays and clinical context. Technology,
Computing, and Simulation 108(5):1546-1555.
14
Hawkins, D. M. (2019). ftp://ftp.stat.umn.edu/pub/cusum. (Accessed 1/30/19).
Hawkins, D. M. and D. H. Olwell. 1998. Cumulative Sum Charts and Charting for Quality
Improvement, New York, NY: Springer.
Hill, D. 1956. Modified control limits. Applied Statistics 5(1):12-19.
Imhoff, M. and S. Kuhls. 2006. Alarm algorithms in critical care monitoring. Anesthesia
& Analgesia 102:1525–1537.
International Standards Organization. (2012). ISO 7870-3, Control Charts — Part 3:
Acceptance Control Charts.
Jensen, W., J. Szarka, and K. White. 2019. Stability assessment with the stability index. To
appear in Quality Engineering.
Jones-Farmer, L. A., W. H. Woodall, S. H. Steiner, and C. W. Champ. 2014. An overview
of phase I analysis for process improvement and monitoring. Journal of Quality
Technology 46(3):265-280.
Kenett, R. S. and M. Pollak. 2012. On assessing the performance of sequential procedures
for detecting a change. Quality and Reliability Engineering International 28:500-507.
Knoth, S. 2018. R software package ‘spc’: Statistical process control – calculation of ARL
and
other
control
chart
performance
measures.
https://www.rdocumentation.org/packages/spc/versions/0.6.0.
Lakens, D., A. M. Scheel, and P. M. Isager. 2018. Equivalence testing for psychological
research: a tutorial. Advances in Methods and Practices in Psychological Science
1(2): 259-269.
Loureiro, L. D., E. K. Epprecht, S. Chakraborti, and F. S. Jardim. 2018. On-control
performance of the joint phase II X-bar - S control charts when parameters are
estimated. Quality Engineering 30(2):253-267.
Margavio, T. M., M. D. Conerly, W. H. Woodall, and L. G. Drake. 1995. Alarm rates for
quality control charts. Statistics and Probability Letters 24(3):219-224.
Montgomery, D. C. 2013. Introduction to Statistical Process Control, Seventh Edition,
John Wiley & Sons, Inc., Hoboken NJ.
Page, E. S. 1954. Continuous inspection schemes. Biometrika (41):100-115.
Page, E. S. 1961. Cumulative sum charts. Technometrics 3(1):1-9.
15
Saleh, N. A., M. A. Mahmoud, M. J. Keefe, and W. H. Woodall. 2015. The difficulty in
designing X-bar and X-control charts with estimated parameters. Journal of Quality
Technology 47(2):127-138.
Sall, J. 2018. Scaling-up process characterization. Quality Engineering 30(1):62-78.
Shen, X., F. Tsung, C. Zou, and W. Jiang. 2013. Monitoring Poisson count data with
probability control limits when sample sizes are time-varying. Naval Research
Logistics 60(8):625–636.
Shewhart, W. A. 1931. Economic Control of Quality of Manufactured Product. D. Van
Nostrand, New York, NY. (Republished in 1980 by the American Society for Quality
Control, Milwaukee, WI).
Shewhart, W. A. 1939. Statistical Method from the Viewpoint of Quality Control. Graduate
School of the Department of Agriculture, Washington, D. C. (Republished in 1986 by
Dover Publications, Inc., Mineola, NY.)
Shmueli, G. and H. Burkom. 2010. Statistical challenges facing early outbreak detection in
biosurveillance. Technometrics 52(1):39-51.
Snee, R. and R. W. Hoerl. 2018. Action the matters. Quality Progress, May,56-60.
Surrey Fire and Rescue 2019. https://www.surreycc.gov.uk/people-and-community/fireand-rescue/fire-safety-for-businesses-and-organisations/Frequently-asked-questionsregarding-fire-sprinkers#a2. (Accessed 1/21/19)
Wald, A. 1947. Sequential Analysis, New York: John Wiley & Sons.
Wasserstein, R. L. and N. A. Lazar. 2016. Editorial: The ASA's statement on p-values:
context, process, and purpose. The American Statistician 70 (2):129-133.
Wheeler, D. J. and D. S. Chambers. 1992. Understanding Statistical Process Control, 2nd
edition. SPC Press, Knoxville, TN.
Woodall, W. H. 1985. The statistical design of quality control charts. The Statistician
34(2):155-160.
Woodall, W. H. 1986. The design of CUSUM quality control charts. Journal of Quality
Technology, 18(2):99-102.
Woodall, W. H. 2000. Controversies and contradictions in statistical process control. (with
discussion), Journal of Quality Technology, 32(4):341-378.
Woodall, W. H. and B. M. Adams. 1993. The statistical design of CUSUM charts. Quality
Engineering, 5 (4):559-570.
16
Woodall, W. H. and F. W. Faltin. 1996. An overview and perspective on control charting,
Chapter 2 of Statistical Applications in Process Control, edited by J. B. Keats and
D. C. Montgomery, Arizona State University, Marcel-Dekker, 7-20.
Yashchin, E. 1985. On the analysis and design of CUSUM-Shewhart control schemes. IBM
Journal of Research and Development 29(4):377-391.
Yashchin, E. 1993. Statistical control schemes: methods, applications and generalizations.
International Statistical Review 61(1):41-66.
Yashchin, E. 2018. Statistical monitoring of multi-stage processes. to appear in Frontiers
in Statistical Quality Control 12, S. Knoth, and W. Schmid (Eds.), Berlin: SpringerVerlag.
Zhang, X. and W. H. Woodall. 2015. Dynamic probability control limits for risk-adjusted
Bernoulli CUSUM charts. Statistics in Medicine 34:3336-3348.
Figure 1: One-Sided Regions of Process Performance (Box et al., 2003-2004)
17
Table 1: ARL Comparison of Shewhart Chart to Modified CUSUM Chart,
Reproduced from Woodall (1985)
Figure 2: An I-Chart shown without (left) and with (right) plotted specification limits.
(Reproduced from Crichton and Faltin, 2018)
18
Figure 3: Stability and Capability Plot (Reproduced from Jensen et al., 2019)
William H. Woodall is a Professor of Statistics at Virginia Tech. He is a former editor of the
Journal of Quality Technology (2001–2003) and associate editor of Technometrics (1987– 1995).
He has published 150 papers, most on various aspects of process monitoring. He is the recipient
of the ASQ Shewhart Medal (2002), ENBIS Box Medal (2012), Jack Youden Prize (1995, 2003),
ASQ Brumbaugh Award (2000, 2006), Ellis Ott Foundation Award (1987), Soren Bisgaard Award
(2012), Lloyd S. Nelson Award (2014), and a best paper award for IIE Transactions on Quality
and Reliability Engineering (1997). He is a Fellow of the American Statistical Association, a
Fellow of the American Society for Quality, and an elected member of the International Statistical
Institute.
Frederick W. Faltin is Associate Professor of Practice in the Department of Statistics, and
Director of Corporate Partnerships for the College of Science, at Virginia Tech. He is also
Cofounder and Managing Director of The Faltin Group, a consulting and training firm providing
services in statistics, Six Sigma, economics, and operations research to companies throughout the
Americas, Europe, and Asia. Previously, he founded and managed the Strategic Enterprise
Technologies laboratory at GE Global Research. Faltin has published books, research papers, and
articles on statistics and related topics, including the Encyclopedia of Statistics in Quality and
Reliability (2007), Statistical Methods in Healthcare (2012), and Analytic Methods in Systems and
Software Testing (2018). He is a Fellow of the American Statistical Association and a recipient of
the American Society for Quality’s Shewell Prize.
19
Download