Rethinking control chart design and evaluation (3/17/19) (To appear in Quality Engineering) William H. Woodalla and Frederick W. Faltina a Department of Statistics, Virginia Tech, Blacksburg, VA 24061-0439, USA ABSTRACT We discuss some practical issues involving the control of the number of false alarms in process monitoring. This topic is of growing importance as the number of variables being monitored and the frequency of measurement increase. An alternative formulation for evaluating and comparing the performance of control charts is given based on defining incontrol, indifference and out-of-control regions of the parameter space. Methods are designed so that only changes of practical importance are to be detected quickly. This generalization of the existing framework makes control charting much more useful in practice, especially when many variables are being monitored. It also justifies to a greater extent the use of cumulative sum (CUSUM) methods. KEYWORDS average run length; cumulative sum chart; false alarm rate; hypothesis testing; practical significance; statistical significance; statistical process monitoring Introduction We have two goals. First, we would like to offer a rationale for practitioners to justify modifying the standard control charting methods, using engineering and subject matter knowledge, so that signals have both practical importance and statistical significance. Second, we want to encourage researchers to work on methods that can be desensitized, if necessary, to avoid the quick detection of small process shifts which may not be of practical importance in applications. Many of the ideas we present are not entirely new, as our citations to past work demonstrate. They are, however, of increasing relevance. Many practitioners and researchers tend to think solely in terms of false alarm rates, or related performance metrics, of individual charts. Understandable as that may be in light of the historical focus of training materials and the literature, it can be dangerously misleading in environments where systems of many control charts are being used simultaneously. In such instances, a multiple testing effect comes into play which can make the probability of obtaining false alarms quite high. The estimation of overall false alarm rates for many-chart systems is difficult, as the covariance structure among the variables being monitored may be complex and is almost certainly highly applicationspecific. Nonetheless, as discussed by Faltin (1986), it is straightforward to show that for even modestly sized systems of independent charts, the probability of at least one false 1 alarm can be close to one. We can ill afford to ignore this effect, as such systems of charts are increasingly common. The easiest way to reduce this adverse effect may be to redesign as many charts in the system as possible to have sharply lower false alarm rates. One important step in this direction is to ensure that only genuinely important changes are to be detected. We offer an approach to accomplish this. It is assumed our paper that the reader is familiar with the basics of control charting, e.g., in the construction of X-bar and other types of basic Shewhart charts, the use of runs rules and the use of cumulative sum (CUSUM) and exponentially weighted moving average (EWMA) charts. Background on these methods can be found in Montgomery (2013). Process monitoring is typically broken up into two phases. In Phase I a historical set of data that has been collected over time is analyzed to understand the process variation. Often a great deal of process understanding and process improvement results from the insights obtained during Phase I. Jones-Farmer et al. (2014) discussed many of the issues that arise in Phase I, one end result of which is the fitting of some model of the variation and the estimation of the in-control parameters of the fitted model. Data are collected over time, sample-by-sample, in Phase II and one checks to see if there has been some deviation from the estimated in-control model. As stated by Montgomery (2013, p. 191), there is a close connection between the use of control charts and hypothesis testing. A false alarm with a control chart is frequently referred to as a Type I error, hypothesis testing terminology. The connection between control charting and hypothesis testing was discussed in detail by Woodall and Faltin (1996) and Woodall (2000). In much of the research on control charts it is assumed that the values of the in-control parameters are known. Under this assumption and the assumption of independence of the data over time, the basic Shewhart chart in Phase II can be viewed as being equivalent to a sequence of hypothesis tests. As pointed out by Page (1954), the one-sided CUSUM chart can be viewed as a sequence of the sequential probability ratio tests (SPRTs) developed by Wald (1947). Because of the relationship between control charting and hypothesis testing, control charting shares some of the shortcomings of hypothesis testing. How to address some of these shortcomings is a focus of this paper. Some of the metrics used to characterize control chart statistical performance are discussed in the next section. In the following section, we discuss some of the alternative methods that can be used when some slack is allowed in the process. The goal is to have control chart signals associated with practical significance, not just statistical significance. A more general framework is described for comparing control chart performance. Conclusions are given in our last section. 2 Characterizing Performance Commonly used metrics A fundamental principle in statistical process monitoring is that there should be some control of the rate at which false alarms occur. A false alarm is defined as a control chart signal when the process is in the in-control state. In the literature, any deviation of a model parameter from its assumed in-control value is usually assumed to be a shift to the out-ofcontrol state, indicating the presence of an assignable cause of variation. Various metrics have been used for the evaluation of the statistical properties of control charts. A number of these metrics were reviewed by Frisén (1992), Fraker et al. (2008), and Kenett and Pollack (2012). For a basic Shewhart control chart one can specify the probability of a false alarm at each sampling point. One must be careful, however, because this metric does not apply to charts that accumulate information over time such as the CUSUM and EWMA charts or when runs rules are used with a Shewhart chart, as discussed by Adams et al. (1992) and Woodall and Adams (1993). Sometimes the probability of a false alarm within a given time period is used as the metric of interest. Recently, the conditional false alarm rate has proved useful, where this metric is defined as the probability of a false alarm at a particular time given no previous false alarm. The conditional false alarm rate is analogous to the hazard function in reliability theory. This metric can be used to determine the control limits for any type of chart. The conditional false alarm rate metric is particularly useful when the in-control parameter value varies over time due to a varying covariate such as the sample size. For more information on this metric, see Margavio et al. (1995), Shen et al. (2013), Zhang and Woodall (2015) and Driscoll and Woodall (2019). Most frequently, however, the average run length (ARL) metric is used, where the run length is defined as the number of samples required before a control chart signal is given. If the time interval between samples varies, as in observing a sequence of assumed exponential random variables, then the average time until a signal (ATS) is more appropriate. From a practical point of view, it is the time between false alarms that is of more relevance. As the sampling frequency increases, a given ARL value will correspond to a smaller ATS value. If sampling occurs regularly every h time periods, then we have ATS = h ARL. Values of in-control metrics can be specified with exactness under the assumption of known in-control parameters. There has been quite a bit of research recently, however, showing that estimation of these in-control parameter values causes a surprisingly large amount of variation in in-control false alarm rates and in-control ARLs. These metrics 3 become random variables when in-control parameters are estimated. See, for example, Gandy and Kvaløy (2013), Saleh et al. (2015) and Loureiro et al. (2018). An alternative approach is to control instead the probability that the false alarm probability exceeds a specified value or to control the probability that the in-control ARL is less than a given value. Precise determinations of false alarm rates and other metrics rely on the adequacy of the model used. Practitioners may not require precise values of the false alarm rate, but need at least some idea how frequently a false alarm would be expected to occur. Selection of in-control metric values It seems that very little guidance has been given for selecting a particular value of the in-control metric of interest such as the in-control ARL value. There must be some determination of the acceptable rate of false alarms. The standard X-bar control chart is widely used in practice. The in-control ARL of a basic 3-sigma Shewhart X-bar chart under the assumptions of normality and known in-control mean and variance is 370. If one uses the Western Electric runs rules, then Champ and Woodall (1987) showed that the in-control ARL decreases to 92. This performance may be acceptable in some cases, but unacceptable in others. As a practical matter, the selection of control limits, with their associated in- and outof-control ARLs, reflects a judgment regarding the relative consequences of false alarms, versus those of allowing a process change to go undetected for a time. In our experience, both of these types of ill effects are often underappreciated in practice. The damage done by false alarms in terms of both cost and loss of confidence tends to be brushed aside. Relatively few practitioners appreciate that even a correctly functioning chart may take quite a while to detect a process change, for example a sustained shift of modest size. Rather, there is a propensity to apply whatever limits have been most commonly studied, and to assume them to be reasonable, rather than carefully thinking through the alternatives and making an informed choice. There is always a tradeoff in false alarm rate reduction and quick detection of process changes. As Freund (1960) wrote, “We can always avoid looking for non-existent trouble by never looking for anything at all. That is, we must balance the risks of calling wolf unnecessarily against the risks of not calling it until the sheep are being digested.” As more and more variables are measured in practical applications, and as the frequency of measurement increases, it becomes increasingly important to keep the number of false alarms at a manageable number. If there are too many false alarms, then alarms tend to be ignored and the monitoring becomes ineffective. If alarms result in process adjustments, then false alarms can be harmful since adjusting a process that does not require it can increase the process variation. In applications where hundreds or thousands of variables 4 are being monitored, one can be overwhelmed by alarms, false or not, and some method for the prioritization of alarms is required. There is a growing need for more of a focus on false alarms and their effects. Crichton and Faltin (2018) wrote in the context of pharmaceutical manufacturing, “In our observation, the potential for false alarms (and their consequences) is generally not well understood.” The rate of false alarms should be tied to their consequences. Because the consequences can vary significantly from application to application, it seems reasonable that standard methods be adapted to reflect this fact. As examples, due to the consequences of a false alarm, Blackstone (2004) reported that Society of Cardiothoracic Surgeons of Great Britain and Ireland required 9999:1 odds against a sequence of adverse events being a false alarm before declaring that there was an indication of poor medical performance. In the automobile on-board monitoring system described by Box et al. (2000), even a false alarm probability of 0.0005 per trip was considered excessively large for adequate control of the false alarm probability for the lifetime of a vehicle. Clearly, the acceptable rate of false alarms depends on the application and the consequences. A false alarm, sometimes referred to as a nuisance alarm, with a home smoke alarm system can be irritating, but not destructive. One could tolerate only a very low false alarm rate, however, with a heat-sensitive sprinkler system. The odds of a sprinkler turning on accidentally has been reported by Surrey Fire and Rescue (2019) to be about 1 in 500,000 per year of service, which they state is comparable to the odds of being struck by lightning. There are so many nuisance alarms with car alarm systems that most are ignored to the extent that it is possible to ignore an irritating, loud noise. A much more serious issue is the rate of false alarms in hospital intensive care units. Variables such as oxygen saturation in the blood, heart rate, heart electrical tracing, blood pressure, temperature and fluid status are monitored continuously. Monitoring devices notify members of the care team when a measurement is outside of an acceptable range or when a sensor becomes inoperative, perhaps due to patient movement. Cvach (2012) reported that patients, families, and staff may be exposed to as many as 700 monitor alarms per patient per day. Görges et al. (2009) estimated that up to 94% of the alarms in the intensive care environment are false alarms. In their review paper on this subject Imhoff and Kuhls (2006) stated that these excessively frequent false alarms are not only a nuisance for both patients and caregivers; they can also compromise patient safety and the effectiveness of patient care. False alarms are also an issue in public health biosurveillance. Shmueli and Burkom (2010) wrote: 5 The current reality is that users of some large systems see alarms nearly every day, because of the large number of data streams and regions monitored and also because the methods are not appropriately adjusted for the variations in time series characteristics among these methods. This frequent alerting leads users to ignore alarms and instead use the system to examine the data heuristically. As one user commented, “We hope that when there is a true alarm, we will not ignore it.” One statistical reason for this phenomenon is inadequate handling of the nonstationary raw data, for example, not accounting for the day-of-theweek effect or ignoring autocorrelation. Another reason is multiple testing. These types of issues are becoming more relevant in process monitoring applications generally. Multiple testing refers to the monitoring of many variables. Even though the probability of a false alarm at a particular time point with any single variable might be low, the collective probability of false alarms system-wide can be quite high. Montgomery (2013, p. 194) stated that an out-of-control action plan (OCAP) should accompany the use of control charts. The OCAP indicates what actions should be taken after an activating event, usually a control chart signal. The action taken could vary from simply increasing the attention paid to the variable being monitored to a process shutdown and recalibration of equipment. False alarms that trigger investigations can be costly. As the cost of a false alarm increases, it makes sense to try to reduce their number. As Shewhart (1939, p. 36) wrote, “Even in trying to keep the probability of looking for assignable causes when they are not present below some limiting value, it is necessary to make some considered choice depending largely upon the costliness of thus looking unnecessarily for trouble.” Rethinking Control Chart Design A three-region approach In control chart theory, competing methods are often compared on the basis of ARL performance. It is most common to set the in-control ARLs of competing methods equal and the method with the lower out-of-control ARL for a given sustained shift in the parameter is considered to be the better method for that shift. In practice, however, one would only want to react when a process change is sufficiently large for there to be justification for a reaction. It is not realistic to assume that any deviation, however small, should be detected as quickly as possible. As Box et al. (2003-2004) stated, “The idea that a monitored system (plant) is either “good” or “bad” is too simplistic.” Freund (1960) wrote, “Many assignable causes produce effects that are real, but small and unimportant. 6 In fact, it is often completely uneconomical to detect these effects, much less worry about how to correct them.” This situation is closely related to the difference between statistical significance and practical significance in the context of hypothesis testing. There has been much criticism of the use of hypothesis testing and p-values in science. This resulting debate motivated the American Statistical Association to issue a clarifying statement on the use and misuse of hypothesis testing and p-values. See Wasserstein and Lazar (2016). Doganaksoy et al. (2017) and Snee and Hoerl (2018) also addressed this issue. A key point in the process monitoring context is that observing a sample that is inconsistent with the assumed incontrol model does not necessarily imply that there is a process change of practical importance. This issue becomes more relevant as the sample size increases at each sampling point because then very small process changes can be detected with high probability. In applications there will often be a range of parameter values for which a signal would not be desired, the in-control region, and a range of values for which one would want a quick signal, the out-of-control region. It is reasonable to have a region of parameter values in between these two regions where one would be indifferent. Box et al. (2003-2004) referred to the intermediate region as “no man’s land” whereas Freund (1960) and Woodall (1985) referred to it as the “indifference zone.” This three-region approach was taken by Ewan and Kemp (1960), Freund (1957, 1960), Woodall (1985, 1986), Box et al. (2000, 2003-2004), and Yashchin (1985, 2018). Under this formulation one would want suitably large ARLs for parameter values in the in-control region and suitably low values in the out-of-control region. This three-region approach was a component in the IBM implementation of process monitoring discussed by Yashchin (2018), where engineering input was used to determine the three regions of parameter values for each variable and a method for the prioritization of alarms was implemented. We note in some applications it may be more important to detect process shifts in one direction with greater sensitivity than the other or only shifts in one direction. The incontrol, indifference and out-of-control regions can be adjusted for these cases. The regions do not have to be symmetric about a target value. This approach also admits as a special case a two-zone approach which combines the in-control and indifference zones into a single “deadband”, and strives for the quickest possible detection beyond this range. This approach was used with great success in an automated plant monitoring system comprised of tens of thousands of charts, as described by Faltin and Tucker (1991). Bissell (1990) also favored a two-zone approach with a target zone specified for high capability or precision processes. 7 In the standard approach to comparing methods the in-control region consists of a single point and there is effectively no indifference zone. The theoretical, but unattainable, “optimal” method would then be a chart that doesn’t signal when the process is exactly centered on target, but signals immediately for any other value. Such a method would be inapplicable in practice because even the slightest deviation from target would be detected immediately. There is a problem with theory when the resulting optimal method has no practical value. On the other hand, a chart that doesn’t signal for parameter values in a specified in-control region and signals immediately for values in the specified out-ofcontrol region would be highly desirable in practice. Figure 1 shows an illustration of the regions used by Box et al. (2000, 2003-2004) when developing methods for the monitoring of a system that is expected to deteriorate over time. Ewan and Kemp (1960), Woodall (1985), Hawkins and Olwell (1998, pp. 62-65), Box et al. (2003-2004) and Yashchin (1993, 2018) all recommended that in the design of a onesided CUSUM chart the reference value should be halfway between the parameter value indicating the worst acceptable performance and the parameter value representing the best unacceptable value. [Insert Figure 1 here.] Page (1961) envisioned the issue with respect to the increased ARL sensitivity of the CUSUM chart relative to the Shewhart chart for small process shifts, writing: This sensitivity is an advantage for processes needing precise control, but is a definite drawback where some slack in the process is permissible. In these cases, cumulative sum charts should not be used or should be used with greater thought. It is easy to see that continued production at a value away from the nominal target will cause much more frequent interruption of the process by the sum chart, although the quality of the process is acceptable; to some extent the nuisance can be avoided by choosing a scheme with a very large A.R.L. at a nominal target value in the center of the “slack” region. Examples and implementation Consider the situation where we observe observations under the assumption of normality and independence. The in-control value of the mean and variance are assumed to be known here, but would be estimated in practice. Let δ represent the size of the sustained shift in the mean. We let θ represent the size of the sustained shift in the mean in terms of the standard error of the sample mean, so θ = n1/2δ/σ, where n is the sample size. 8 We base our CUSUM methods on the standardized sample means, i.e., the target is subtracted from the sample means and then one divides by the standard error. We refer to these variables as Z1, Z2, Z3, … . Note that the standardized sample means have a mean of zero and a standard deviation of 1 when the process is centered on target. The standardized CUSUM chart was described by Montgomery (2013, p. 424). The CUSUM statistics are defined as Ui = max (0, Ui-1 + Zi – k) Li = min (0, Li-1 + Zi + k), i = 1, 2, 3, … , where U0 = L0 = 0, and the reference value k > 0. A signal is given indicating a process shift when Ui > h or Li < -h, where h > 0 is referred to as the decision interval. As an example, suppose our target value is 100, σ =8 and n =4. We do not want to detect shifts in the mean within the interval (97.6, 102.4) quickly, but would like to detect shifts to beyond 104.8 or below 95.2 quickly. Then the in-control region boundary value based on standardized shifts is θ0 = (102.4 – 100)/(8/41/2) = 0.6 and the out-of-control region boundary is θ1 = (104.8 – 100)/(8/41/2) = 1.2. In Table 1, reproduced from Woodall (1985), M0, (θ < θ0 = 0.6), represents the in-control region, the symbol I represents the indifference region and M1 (θ > θ1 = 1.2) represents the out-of-control region where quick detection is desired. The zero-state ARL performance of the two-sided CUSUM chart based on standardized sample means with reference value k = (θ0 + θ1)/2 = 0.9 and decision limit h = 4.65 is compared to the two-sided Shewhart chart with control limit multiplier 3.09. This value of h was chosen so that the two methods have roughly the same ARL value at the in-control boundary θ0. [Insert Table 1 here.] We want large values of the ARL in M0 and small values in M1. From Table 1 we can see that the performance of the CUSUM chart dominates that of the Shewhart chart except for large shifts in the process mean where performance is quite similar. In particular, note that there will be far fewer false alarms for the CUSUM chart. The CUSUM chart is easily desensitized to handle this more general definition of in-control and out-of-control. Other methods, such as the EWMA chart, are not so readily adaptable. Note that use of the CUSUM reference value k = (θ0 + θ1)/2 is equivalent to using a nominal target value of θ0 and designing the CUSUM chart with k = (θ1 – θ0)/2 to optimally detect a shift to θ1. If needed, software to determine the value of the control limit h to achieve specified in-control ARL performance or to determine the ARL for a given shift in the mean and values of k and h has been provided by Knoth (2018). In addition, software provided by Hawkins (2019) can be used to design and evaluate CUSUM charts. 9 With the standard CUSUM chart the in-control region consists of a single point and the best unacceptable parameter value θ1 is specified, leading to the reference value of θ1/2. The most common choice is k = 0.5. This is the default value in Minitab and JMP software, but it can be easily changed to any value of k = (θ0 + θ1)/2. The parameter values between zero and θ1 are implicitly assumed in the standard approach to comprise an indifference region although some treat it as an out-of-control region. Duncan (1986, p. 529) wrote, “For much smaller shifts (than θ1) the cusum chart has considerably smaller ARLs, but this is a disadvantage since it means that there will be more searches for relatively unimportant assignable causes than is presumably desired.” The CUSUM chart with k = θ1/2= 0.6 and h = 6.858 yields an in-control ARL of 10,405.15 to match that of the modified CUSUM chart in Table 1. Its ARL of 12.0 when θ = 1.2 would be slightly lower than that of the modified CUSUM chart, but the number of false alarms would be more than doubled when θ = 0.6 with an ARL of 64.37. In many applications θ0 and θ1 will be much larger than in this example. The change from standard practice would be more pronounced in these cases. Suppose, for example, the target value was 100, σ = 0.2 and n = 1. Suppose we don’t want to detect a shift as long as the mean is in the range 99.4 to 100.6, but would like to detect shifts in the mean below 99.0 or above 101 as quickly as possible. We would have θ0 = 3 and θ1 =5. The usual Shewhart X-bar chart would have an in-control ARL value at θ = 0 of 370 and an ARL value of only 2 at θ0 = 3, but with an almost immediate detection at θ1 =5. The standardized CUSUM chart with k = 4 and h = 2 would have ARL = 258.7 at θ0 = 3 and ARL = 2.74 at θ1 =5. Virtually no signals would be given if the process were on target at θ = 0. In our view the performance of the CUSUM chart would be much preferred in order to prevent an excessive number of alarms when the process mean remained in the in-control region. Modified and acceptance control charts A situation in which some slack is often allowed in the process occurs when the process is highly capable. This scenario is illustrated in Figure 2, which was reproduced from Crichton and Faltin (2018). The control chart shows a number of signals of instability, but the specification limits are so wide relative to the process variation that attention should be directed to other, more pressing, problems. In these types of situations, the modified control chart of Hill (1956) or the acceptance control chart of Freund (1957, 1960) is often recommended. [Insert Figure 2 here.] The control limits of the modified and acceptance Shewhart-type control charts are widened so that the mean does not drift so far as to cause the proportion of nonconforming product to exceed a specified value. Thus, worst acceptable values are given for the process mean. The acceptance control chart has an additional stipulation on the power of the chart 10 to detect a specified adverse shift, the best unacceptable values, and is based on a three region approach. The proportion of non-conforming product is p0 or less in the in-control region and p1 or higher in the out-of-control region, where p0 < p1. Freund (1960) referred to the resulting regions as an acceptable zone, indifference zones and rejectable zones in the two-sided specification case. In the present quality environment, however, the modified and acceptance charts can allow the process mean to move too far toward the specification limits. In the example of the design of an acceptance control chart in Montgomery (2013, p. 457), the acceptable fraction non-conforming was set to 1%. This would be much too large a value in many applications. One could, of course, use much lower values for the acceptable and unacceptable proportions non-conforming, but then the adequacy of the normal distribution would come into play. It is better to use engineering knowledge, including specification limit information, to determine the three regions. Freund (1962) stated that the rejectable zone could be determined by locating the level at which only a given fraction of individual items would exceed the specification limits, by experience, or by edict. Note that the international standard ISO 7870-3:2012 provided guidance on the uses of acceptance control charts and established general procedures for determining sample sizes and the control limits. It is pointed out that this type of chart should be used only when the within subgroup variation is in-control, variation is estimated efficiently, and a high level of process capability has been achieved. Examples were included to illustrate when this technique has advantages and to provide details of implementation. The use of an in-control region so that only shifts of practical importance are detected quickly is analogous to some extent to the use of hypothesis testing when the null hypothesis contains an interval of values that are not considered of practical importance. Blume et al. (2018) proposed such a hypothesis testing approach in an effort to alleviate some of the shortcomings of standard hypothesis testing in which the null hypothesis contains only a single point. This approach leads to what is referred to as the minimaleffects test described by Lakens et al. (2018). The use of control limits wider than Shewhart’s three-sigma limits is not without detractors. It should be noted that Deming (1986, p. 369) referred to modified control limits as “bear traps”. In his index Deming (1986, p. 501) wrote “Modified limits, never.” Deming (1986, p. 49) wrote, “With continual improvement, the distributions of parts, materials, and service become so narrow that specifications are lost beyond the horizon.” This is the ideal situation, but serves only as a goal in many applications. Wheeler and Chambers (1992, p. 15) referred to a capable, but out-of-control process, as on “the brink of chaos.” In their section “The Problem of Modified Control Limits” they opposed the use of modified limits stating that the use of modified limits is contrary to 11 continual improvement, encourages alternating periods of benign neglect and intense panic, allows assignable causes to come and go without detection and makes extrapolation to future process performance unsafe. These can be valid concerns, but in a situation such as that illustrated in Figure 2, it is likely that other processes merit much higher priority. Shewhart chose three-sigma limits because he considered them to be reasonable based on economic grounds. Shewhart (1931, p. 277) stated that this multiple of sigma “seems to be an acceptable economic value.” Because of the changes in data collection and the manufacturing environment, what was of economic value in 1931 is not necessarily of economic value now. Graphical approaches to prioritization of effort When monitoring many variables, is becomes necessary to prioritize alarms. This topic is beyond the scope of our paper, but we note that Sall (2018) and Jensen et al. (2019) have proposed graphical methods for prioritizing improvement efforts when many variables are being monitored. They plotted capability indices against stability indices reflecting the ratio of long-term variation relative to short-term variation. Larger values of the stability index indicate greater process instability. The plot given in the case study of Jensen et al. (2019) is shown in Figure 3, where the size of a dot indicates the volume of product produced over the time period of interest. Improvement efforts are directed toward the processes that are not capable. [Insert Figure 3 here.] Final Comments In our view, much more attention should be given to the rate and consequences of false alarms resulting from process monitoring, not only at the level of individual charts, but at the system level, where appropriate. Appropriate modeling of autocorrelation and other process behavior, as needed, can reduce misleading signals from control charts. Modifying the standard methods to lower the expected number of false alarms can have a minimal effect on the speed of detection of process changes, and in many applications this tradeoff will pay substantial dividends. We encourage practitioners to use engineering knowledge with the three region approach to design their process monitoring methods. We encourage researchers to put greater focus on methods that can be desensitized, if necessary, in practical applications. The current approach to evaluating and comparing methods leads to methods with greater and greater sensitivity in detecting very small shifts in a process. This sensitivity can become a liability. The goal is to have control chart signals associated with practical 12 significance, not just statistical significance. Many of the ideas presented in our paper have been suggested in some form before, but are now of increasing urgency as more and more variables are being monitored, often at high frequencies. There are many research questions resulting from the use of a three region approach, since it can be applied in many types of applications and with many types of charts. In particular, it remains to be seen, for example, how multivariate methods could be desensitized when using a three region approach based on values of the non-centrality parameter. Acknowledgements We appreciate the helpful comments of Jennifer Van Mullekom of Virginia Tech. References Adams, B. M., C. A. Lowry, and W. H. Woodall. 1992. The use (and misuse) of false alarm probabilities in control chart design. Frontiers in Statistical Quality Control 4, eds. Lenz, H. -J., Wetherill, G. B., and Wilrich, P. -Th., Physica-Verlag Heidelberg, pp. 155168. Bissell, A. F. 1990. Control charts and cusums for high precision processes. Total Quality Management 1(2):221-228. Blackstone, E. H. 2004. Monitoring surgical performance. Journal of Thoracic and Cardiovascular Surgery 128:807-825. Blume, J. D., R. A. Greevy, V.F. Welty, J. R. Smith, W. D. Dupont. 2018. An introduction to second-generation p-values. To appear in The American Statistician. Box, G., S. Bisgaard, S. Graves, M. Kulahci, K. Marko, J. James, J. Van Gilder, T. Ting, H. Zatorski, and C. Wu. 2003-2004. Performance evaluation of dynamic monitoring systems: the waterfall chart. Quality Engineering 16 (2):183-191. Box, G., S. Graves, S. Bisgaard, J. Van Gilder, K. Marko, J. James, M. Seifer, M. Poublon, and F. Fodale. 2000. Detecting malfunctions in dynamic systems. In Transactions of the Society of Automotive Engineers: Electronic Engine Controls 2000, Modeling, Neural Networks, OBD, and Sensors (SP-1501). Champ, C. W. and W. H. Woodall. 1987. Exact results for Shewhart control charts with supplementary runs rules. Technometrics 29:393-399. Crichton, J. and F. W. Faltin. 2018. Making CPV a proactive component of process and product improvement. Pharmaceutical Engineering 38 (4):49-55. 13 Cvach, M. 2012. Monitor alarm fatigue – an integrative review. Biomedical Instrumentation & Technology 46(4):268-277. Deming, W. E. 1986. Out of the Crisis. Massachusetts Institute of Technology, Center for Advanced Engineering Study, Cambridge, Mass. Doganaksoy, N., G. J. Hahn, and W. Q. Meeker. 2017. Fallacies of statistical significance. Quality Progress November,56-62. Driscoll, A. and W. H. Woodall. 2019. Use of the conditional false alarm rate metric in statistical process monitoring. Paper to be presented at the XIIIth International Workshop on Intelligent Statistical Quality Control, Hong Kong, August 13 -15, 2019. Duncan, A. J. 1986. Quality Control and Industrial Statistics, Fifth Edition, Irwin: Homewood, IL. Ewan, W. D. and K. W. Kemp. 1960. Sampling inspection of continuous processes with no autocorrelation between successive results. Biometrika 47 (3 and 4):363-380. Faltin, F. W. 1986. Run length properties of multi-chart control systems, invited address at the Second National Symposium on Statistics in Automated Manufacturing, Arizona State University, Tempe, AZ. Faltin, F. W. and W. T. Tucker. 1991, On-line quality control for the factory of the ‘90’s and beyond, Chapter in Statistical Process Control in Manufacturing, edited by J. B. Keats and D. C. Montgomery, Marcel Dekker, New York. Fraker, S. E., W. H. Woodall, and S. Mousavi. 2008. Performance metrics for surveillance schemes. Quality Engineering 20:451-464. Freund, R. A. 1957. Acceptance control charts. Industrial Quality Control 14(4):13-23. Freund, R.A. 1960. A reconsideration of the variables control chart with special reference to the chemical industries. Industrial Quality Control 16(11):35-41. (May) Freund, R. A. 1962. Graphical process control. Industrial Quality Control 18(7):1-8. Frisén, M. 1992. Evaluations of methods for statistical surveillance. Statistics in Medicine 11:1489-1502. Gandy, A. and J. T. Kvaløy. 2013. Guaranteed conditional performance of control charts via bootstrap methods. Scandinavian Journal of Statistics 40:647–668. Görges, M., B. A. Markewitz, and D. R. Westenskow. 2009. Improving alarm performance in the medical intensive care unit using delays and clinical context. Technology, Computing, and Simulation 108(5):1546-1555. 14 Hawkins, D. M. (2019). ftp://ftp.stat.umn.edu/pub/cusum. (Accessed 1/30/19). Hawkins, D. M. and D. H. Olwell. 1998. Cumulative Sum Charts and Charting for Quality Improvement, New York, NY: Springer. Hill, D. 1956. Modified control limits. Applied Statistics 5(1):12-19. Imhoff, M. and S. Kuhls. 2006. Alarm algorithms in critical care monitoring. Anesthesia & Analgesia 102:1525–1537. International Standards Organization. (2012). ISO 7870-3, Control Charts — Part 3: Acceptance Control Charts. Jensen, W., J. Szarka, and K. White. 2019. Stability assessment with the stability index. To appear in Quality Engineering. Jones-Farmer, L. A., W. H. Woodall, S. H. Steiner, and C. W. Champ. 2014. An overview of phase I analysis for process improvement and monitoring. Journal of Quality Technology 46(3):265-280. Kenett, R. S. and M. Pollak. 2012. On assessing the performance of sequential procedures for detecting a change. Quality and Reliability Engineering International 28:500-507. Knoth, S. 2018. R software package ‘spc’: Statistical process control – calculation of ARL and other control chart performance measures. https://www.rdocumentation.org/packages/spc/versions/0.6.0. Lakens, D., A. M. Scheel, and P. M. Isager. 2018. Equivalence testing for psychological research: a tutorial. Advances in Methods and Practices in Psychological Science 1(2): 259-269. Loureiro, L. D., E. K. Epprecht, S. Chakraborti, and F. S. Jardim. 2018. On-control performance of the joint phase II X-bar - S control charts when parameters are estimated. Quality Engineering 30(2):253-267. Margavio, T. M., M. D. Conerly, W. H. Woodall, and L. G. Drake. 1995. Alarm rates for quality control charts. Statistics and Probability Letters 24(3):219-224. Montgomery, D. C. 2013. Introduction to Statistical Process Control, Seventh Edition, John Wiley & Sons, Inc., Hoboken NJ. Page, E. S. 1954. Continuous inspection schemes. Biometrika (41):100-115. Page, E. S. 1961. Cumulative sum charts. Technometrics 3(1):1-9. 15 Saleh, N. A., M. A. Mahmoud, M. J. Keefe, and W. H. Woodall. 2015. The difficulty in designing X-bar and X-control charts with estimated parameters. Journal of Quality Technology 47(2):127-138. Sall, J. 2018. Scaling-up process characterization. Quality Engineering 30(1):62-78. Shen, X., F. Tsung, C. Zou, and W. Jiang. 2013. Monitoring Poisson count data with probability control limits when sample sizes are time-varying. Naval Research Logistics 60(8):625–636. Shewhart, W. A. 1931. Economic Control of Quality of Manufactured Product. D. Van Nostrand, New York, NY. (Republished in 1980 by the American Society for Quality Control, Milwaukee, WI). Shewhart, W. A. 1939. Statistical Method from the Viewpoint of Quality Control. Graduate School of the Department of Agriculture, Washington, D. C. (Republished in 1986 by Dover Publications, Inc., Mineola, NY.) Shmueli, G. and H. Burkom. 2010. Statistical challenges facing early outbreak detection in biosurveillance. Technometrics 52(1):39-51. Snee, R. and R. W. Hoerl. 2018. Action the matters. Quality Progress, May,56-60. Surrey Fire and Rescue 2019. https://www.surreycc.gov.uk/people-and-community/fireand-rescue/fire-safety-for-businesses-and-organisations/Frequently-asked-questionsregarding-fire-sprinkers#a2. (Accessed 1/21/19) Wald, A. 1947. Sequential Analysis, New York: John Wiley & Sons. Wasserstein, R. L. and N. A. Lazar. 2016. Editorial: The ASA's statement on p-values: context, process, and purpose. The American Statistician 70 (2):129-133. Wheeler, D. J. and D. S. Chambers. 1992. Understanding Statistical Process Control, 2nd edition. SPC Press, Knoxville, TN. Woodall, W. H. 1985. The statistical design of quality control charts. The Statistician 34(2):155-160. Woodall, W. H. 1986. The design of CUSUM quality control charts. Journal of Quality Technology, 18(2):99-102. Woodall, W. H. 2000. Controversies and contradictions in statistical process control. (with discussion), Journal of Quality Technology, 32(4):341-378. Woodall, W. H. and B. M. Adams. 1993. The statistical design of CUSUM charts. Quality Engineering, 5 (4):559-570. 16 Woodall, W. H. and F. W. Faltin. 1996. An overview and perspective on control charting, Chapter 2 of Statistical Applications in Process Control, edited by J. B. Keats and D. C. Montgomery, Arizona State University, Marcel-Dekker, 7-20. Yashchin, E. 1985. On the analysis and design of CUSUM-Shewhart control schemes. IBM Journal of Research and Development 29(4):377-391. Yashchin, E. 1993. Statistical control schemes: methods, applications and generalizations. International Statistical Review 61(1):41-66. Yashchin, E. 2018. Statistical monitoring of multi-stage processes. to appear in Frontiers in Statistical Quality Control 12, S. Knoth, and W. Schmid (Eds.), Berlin: SpringerVerlag. Zhang, X. and W. H. Woodall. 2015. Dynamic probability control limits for risk-adjusted Bernoulli CUSUM charts. Statistics in Medicine 34:3336-3348. Figure 1: One-Sided Regions of Process Performance (Box et al., 2003-2004) 17 Table 1: ARL Comparison of Shewhart Chart to Modified CUSUM Chart, Reproduced from Woodall (1985) Figure 2: An I-Chart shown without (left) and with (right) plotted specification limits. (Reproduced from Crichton and Faltin, 2018) 18 Figure 3: Stability and Capability Plot (Reproduced from Jensen et al., 2019) William H. Woodall is a Professor of Statistics at Virginia Tech. He is a former editor of the Journal of Quality Technology (2001–2003) and associate editor of Technometrics (1987– 1995). He has published 150 papers, most on various aspects of process monitoring. He is the recipient of the ASQ Shewhart Medal (2002), ENBIS Box Medal (2012), Jack Youden Prize (1995, 2003), ASQ Brumbaugh Award (2000, 2006), Ellis Ott Foundation Award (1987), Soren Bisgaard Award (2012), Lloyd S. Nelson Award (2014), and a best paper award for IIE Transactions on Quality and Reliability Engineering (1997). He is a Fellow of the American Statistical Association, a Fellow of the American Society for Quality, and an elected member of the International Statistical Institute. Frederick W. Faltin is Associate Professor of Practice in the Department of Statistics, and Director of Corporate Partnerships for the College of Science, at Virginia Tech. He is also Cofounder and Managing Director of The Faltin Group, a consulting and training firm providing services in statistics, Six Sigma, economics, and operations research to companies throughout the Americas, Europe, and Asia. Previously, he founded and managed the Strategic Enterprise Technologies laboratory at GE Global Research. Faltin has published books, research papers, and articles on statistics and related topics, including the Encyclopedia of Statistics in Quality and Reliability (2007), Statistical Methods in Healthcare (2012), and Analytic Methods in Systems and Software Testing (2018). He is a Fellow of the American Statistical Association and a recipient of the American Society for Quality’s Shewell Prize. 19