1. introduction

advertisement

726879586

Author’s Response

Marianne Frisén

Statistical Research Unit, Department of Economics,

University of Gothenburg, Gothenburg, Sweden

Abstract: The discussants made very interesting contributions to deep theory as well as interesting applications. The many additional references broadened the subject. Evaluation measures had drawn much attention and different views were exposed. The discussants’ contributions on multivariate surveillance, robustness and other specific issues are also commented upon.

Keywords: Multivariate surveillance; Optimality; Passive surveillance; Robustness;.

Subject Classifications: 62C10; 62C20; 62L15; 62P05; 62P10; 62P12; 62P20.

1.

INTRODUCTION

First of all, I would like to thank the Editor, Professor Mukhopadhyay, for inviting the paper

“Optimal sequential surveillance for finance, public health and other areas” and for organizing the discussion. I am very grateful to Professors Andersson, Bodnar, Efromovich, Khan,

Knoth, Lai, Reynolds, Schmid, Tartakovsky, Woodall, Xing, and Zacks for their insightful contributions to the discussion on the paper. They have connected deep theory and interesting applications in a valuable way.

A number of interesting additional applications were described. Professor Efromovich noted the paramount need of surveillance in actuarial science. Professor Knoth described his experience on SPC with focus on industrial applications and his views on what is actually being done as well as practical problems. He contributed an interesting discussion on the long historical process which has hampered new developments from coming into practice. He saw a great advantage in the new applications which have recently come into focus. Professors

Reynolds and Woodall described similar experiences of traditional SPC. Professor Woodall also gave many references on health related monitoring. Professors Lai and Xing described applications in multichannel problems and finance and recommended that the financial sector learn from the public health sector about surveillance. The development of theory is greatly influenced by new applications. Professors Bodnar and Schmid analyzed the difficulties with of spreading sequential methods but hoped for the positive influence of good applications.

Many valuable references on theoretical issues were added by Professors Lai, Reynolds,

Tartakovsky, Woodall, Xing and others. Unfortunately, my reference to Moustakides (1986) in Section 3.1.4 of the paper was missing in the reference list. I apologize for the error and thank Professor Khan for noticing it.

I am using the notation of my paper in the response to the discussion. Some discussants express their arguments by other notations, but I hope that I have done justice to their arguments in spite of the different notations.

Address correspondence to Marianne Frisén, Statistical Research Unit, Department of

Economics, University of Gothenburg, P. O. Box 640, SE-40530, Gothenburg, Sweden; Fax:

46-31-7861274; E-mail: marianne.frisen@statistics.gu.se

726879586 2

2.

EVALUATION MEASURES

2.1.

False Alarms

Professor Andersson discussed the difference between hypothesis testing and surveillance.

She gave examples of the disadvantages of recent attempts to make surveillance similar to hypothesis testing by a fix limited probability of any false alarm. Professor Woodall agreed that the FDR might not be as useful in surveillance as it is in hypothesis testing.

Professor Knoth discussed the relation between the probability of false alarm, PFA, and the average run length to a false alarm, ARL 0 , and gave graphs on the relation. These graphs complement those by Frisén and Sonesson (2006). The relation between these two measures is different for different methods. Thus, the choice of measure matters when methods are compared.

Professors Lai and Xing suggested a steady state false alarm probability per unit time. This is suitable for passive surveillance. It also avoids the ARL

0

, which was criticized by Mei

(2008) on the base of the performance for a case of composite target process. The median run length has the advantages before the ARL 0 to be less sensitive to the skewness of the run length distribution and to be considerable quicker to determine by simulations.

2.2.

Delay as a Function of the Time of Change

The importance of considering different values of the time, τ, of the change was discussed by

Professor Andersson and others. Professor Tartakovsky illustrated the situation with curves of the conditional average delay, CED(τ). He stated that a uniformly best CED would be desirable, but that no such method exists for a fixed ARL 0 . Several measures are needed to give the full information. Professor Tartakovsky compared three common methods with respect to three delay measures. He concluded that it would not be fair to use ARL

1

for the comparison of methods with different CED shapes. This agrees with my view that several measures, for example the CED for all relevant values of τ, should be reported to give full information .

Sometimes the CED curve is flat and the influence of τ is not so strong. Examples were given by Professor Zacks. This also applies to the Shewhart method, which does not accumulate information and thus has a constant CED function. Procedures which are randomized to take away the influence of τ also have a constant CED. In these cases, the

CED(τ) is needed for only one value of τ.

One summarizing criterion is useful for making a rough comparison and for stating formal optimality. A number of such criteria are discussed below.

2.3.

ARL

1

ARL

1

(=CED(1)+1) is widely used and may work well for giving a rough impression.

Professor Woodall gave positive comments on ARL

1

. Professor Zacks criticized ARL since the skewness of the stopping time distribution depends on τ. Professors Lai and Xing pointed out that while ARL may be a reasonable performance measure for simple problems, it is conceptually unsatisfactory for more complex ones.

Professor Knoth reported results for EWMA. He remarked that the misleading results for one-sided EWMA do not appear for the two-sided version. However, the fact that ARL

1 sometimes gives reasonable results does not mean that it is acceptable as a formal optimality criterion. It should not give misleading results for any method or situation. The reason for the

726879586 3 contradictory results when ARL

1 is used is the violation of a commonly accepted inference principle, as stated by Frisén (2003).

2.4.

Steady State Delay

Professor Knoth demonstrated by an example, that the obviously misleading results by ARL

1 do not appear when using SADT. The CED for large values of τ is useful when only large values of τ are of interest. These applications are not the same as those where ARL 1

is relevant. Recently, methods which were earlier compared by ARL 1 have been recompared by

SADT (see for example Sego et al. (2008)). The conclusion on which method is the best is often reversed, as some methods work well for detecting early changes and others work well for detecting late ones. Instead of choosing between SADT and ARL 1 it may be better to give both, or rather the whole CED curve, so that users can choose the relevant measure for the specific application.

Professor Andersson pointed out the importance of specifying the relations between the change times at the use of asymptotic measures in multivariate surveillance. An asymptotic criterion for passive surveillance was given by Professor Tartakovsky.

2.5.

Minimax

One summarizing value of the CED curve is the least favorable value. Professor Tartakovsky compared three methods by the minimax variant by Pollak (1985). In contrast to the more common variant by Lorden (1971), it does not require the least favourable history.

2.6.

Expected Delay

If there is information on the most probable or interesting values of τ, then a weighted average of the expected delay for different values of τ will be of great interest. The expected delay,

ED, criterion of Section 3.1.3 of the paper, is a possibility here. The ED criterion results in important conclusions about the general structure of the surveillance statistic for a specific problem and is thus useful as a formal optimality criterion. Professors Lai and Xing gave several arguments for the average expected delay as a delay measure, and recommended that it should be combined with a steady state false alarm measure.

Professor Tartakovsky also discussed evaluation measures which involve the distribution of τ.

Professor Woodall contrasted such measures to his own preference of ARL

1

, SADT, and minimax. He writes: “Professor Frisén prefers some of the metrics that require one to specify the distribution of the time at which the process change occurs.” I would like to modify this.

My intent is to let the application determine which metrics are appropriate. In most cases I report the CED curve. It should be noted that the CED does not require a known distribution of τ. If a formal optimality criterion was needed, I would not use the ARL criterion but rather a minimax or average criterion, depending on the situation.

2.7.

Balance Between False and Motivated Alarms

Professor Tartakovsky stressed the important tradeoff between the delay and the frequency of false alarms. Sometimes it is convenient to concentrate on the false alarm properties first and then determine the delay. In hypothesis testing the false alarm is most important, which makes this order natural. In surveillance, however, the two kinds of errors are more on an equal footing. Professors Lai and Xing as well as Professor Tartakovsky discussed how different

726879586 4 false alarm and delay measures should be combined into an optimality criterion. Professors

Bodnar and Schmid gave a good discussion of the dilemma of determining the alarm limit in a way which is suitable for the application. They mentioned that economists are not satisfied with basing the choice of limit only on the in-control behavior but also want to involve the out-of-control behavior.

One way to handle the balance is to specify a utility function with explicit values of the disadvantages of a false alarm and of a delay. This can be relevant for some financial problems, as discussed by Bock et al. (2008). The relation between the ED criterion and the utility function by Shiryaev (1963) is important.

As described in Section 2.2.4 of the paper, the predicted value of an alarm is one way to measure the balance between the false alarms and the delay. An objection might be that the incidence of changes is not known. However, the incidence is important. The predicted value for several different probable values of the incidence may be reported, as by Frisén and

Andersson (2009). A discussion about the relevant incidence can reveal important characteristics of the application and thus be useful. Professor Andersson related the PV in surveillance to that of diagnostic testing in medicine where it is acknowledged that the frequency of a disease is important for the interpretation of the test result.

Professor Knoth gave curves for the predicted value, PV, for several methods. It was concluded that for the full likelihood ratio and Shirayev Roberts methods, the PV does not depend much on the time of the alarm. This is in agreement with the results by Frisén and

Wessman (1999). A constant value facilitates the interpretation of an alarm. In contrast, the

CUSUM method, and still more the Shewhart method or the EWMA with a high value of λ, can have a very low predicted value for early alarms.

2.8.

Special Evaluation Measures for Some Application Areas

It is important that the evaluation measures are relevant for the application. Professors Bodnar and Schmid pointed out this need in finance, and Bock et al. (2008) related measures in finance to those in surveillance. Professors Lai and Xing pointed out the need for further evaluation measures for spatial problems. The expression of some spatial problems by a framework closer to that of general surveillance, as in Sonesson (2007), makes the evaluations more clear-cut. The close relation to multivariate surveillance as discussed in

Section 5.4.4 of the paper may also be useful.

3.

MULTIVARIATE SURVEILLANCE

Professor Reynolds reported on the common practice of parallel surveillance in multiparameter SPC as well as the less common practice of reduction to one univariate statistic and also discussed other approaches.

In the paper, I exemplified multivariate surveillance by the detection of a change in a multiparameter distribution like a normal distribution with both parameters prone to change.

Professor Khan objected to this. However, the same problems can arise in this situation as in the situation where we have a change in a multiparameter distribution originating from the observation of different variables. Thus, the same techniques can be of interest. Professor

Khan gave additional references to multivariate CUSUM methods.

Professors Lai and Xing suggested the GLR, which is based on the joint likelihood for multivariate problems. For univariate surveillance there are optimality theorems based on the

726879586 5 likelihood. However, Professor Andersson pointed out that these theorems are not valid for multivariate surveillance where the distributions of different variables may change at different times.

The likelihood ratio of the joint distributions reveals some interesting facts. The relation between the change points is important, as described by Andersson (2008). It makes a great difference whether the parameters change simultaneously or not. If all parameters change at the same time (if they do change), then the comment in Section 5.4.3 of the paper is relevant.

In that case, there exists a sufficient reduction to a univariate surveillance problem. However, the case where the parameters may change at different times is different, as demonstrated in

Frisén (2009). As an example one may consider a change in the volatility of a stock that precedes a change in the mean. If the lag is known, then there exists a sufficient reduction to a univariate statistic. The possibility of a sufficient reduction was pointed out by Professor

Andersson and also demonstrated by Frisén et al. (2009d).

In the case of simultaneous changes, it is possible to use ordinary evaluation measures. If the changes do not appear simultaneously, then the univariate evaluation measures must be generalized. In such cases, measures like those presented in Section 5.4.2 of the paper and in

Frisén et al. (2009b) are useful.

The importance of the relation between the change times also for asymptotic measures was pointed out by Professor Andersson and was also demonstrated by Frisén et al. (2009b).

Professors Andersson, Lai, and Xing gave references on the isolation problem of identifying which of the variables caused the alarm. Reynolds described the common practice in industry of letting the diagnostic aim hamper the detection procedures and suggested the decoupling of detection and diagnostics. Professors Lai and Xing gave references to methods for optimal joint detection and isolation.

4.

ROBUST SURVEILLANCE

Professor Efromovich mentioned several connections between non-parametric curve estimation and surveillance. New techniques in one area may be of use in the other. A close connection is pointed out between non-parametric estimation with unknown distributions and surveillance problems with unknown parameters. The detection of a change in the probability distribution of the residuals of a regression is one example. In surveillance, the detection of a change in a distribution is often accomplished via likelihood ratios. The semiparametric estimation of curves of special interest in the surveillance of disease outbreaks is treated by

Frisén et al. (2009a).

Professor Reynolds described interest in a wide range of shifts. Methods constructed for unknown shift sizes are mentioned in Section 5.3.1 of the paper. The GLR approach was recommended by Professors Lai and Xing.

Professors Bodnar and Schmid discussed the problems in finance with determining the incontrol process and choosing the target value of a parameter of the distribution. A large amount of historical data is required to estimate the in-control value. Robust methods, which use only the important characteristic and avoid unnecessary assumptions, are of interest here.

The GLR technique is very useful in these situations. It was used for the surveillance of the incidence of a disease where a solution was to reformulate the problem as the detection of an increase. This was described in Section 5.3.2 of the paper, in Frisén and Andersson (2009), and in Frisén et al. (2009c). Professor Andersson gave a thorough discussion on the use of surveillance which concentrates on monotonicity changes and avoids the uncertain estimation of baselines for business cycles and for outbreaks of epidemic diseases.

726879586 6

5.

SPECIAL ISSUES

5.1.

Sampling Design

Professor Reynolds described well the important problem of optimizing the time interval between observations. For methods which efficiently accumulate the information (in contrast to the Shewhart method), frequent sampling has the advantages of giving more information about the process and minimizing the unavoidable delay from the last sampled unit. The drawbacks are the possible cost increase and the impact of seasonal effects, such as weekdays, which can be avoided by less frequent sampling.

5.2.

Passive Surveillance

Most theory on surveillance concerns active surveillance or the first alarm if the surveillance is passive. The differences between error rates for active and passive surveillance problems were described by Frisén and de Maré (1991). Professor Tartakovsky advocated an asymptotic delay measure which is suitable for repeated surveillance and when it is important to quickly detect a real change even at the expense of raising many false alarms. Professors

Bodnar and Schmid discussed repeated surveillance in finance and the problem of determining new control limits after an alarm.

5.3.

Motives for Fix Conditional False Alarm Probability

In Section 2.2.1 of the paper I suggested that a fix conditional false alarm probability and an interpretation as a repeated significance test with a fix significance level might be the motives for using the “exact” EWMAe. Professor Woodall questioned this, but I have not seen any other argument for the common recommendation of EWMAe. Nevertheless, we agree that the unconditional probability of a false alarm is increasing with time (as stated in Section 2.2.1 of the paper). It was demonstrated by Sonesson (2003) that EWMAe has drawbacks compared to the approximate version EWMAa. Thus, the efforts to make the conditional alarm probabilities exactly constant do not result in an efficient method.

5.4.

Time Series

Professors Bodnar, Lai, Schmid, Tartakovsky, and Xing discussed results for data which are not independent in time. An additional reference is the recent proof by Han and Tsung (2009) of finite minimax optimality of the CUSUM method for an important class of models.

6.

CONCLUDING REMARKS

Like Professors Bodnar and Schmid, I have good hope that closer collaborations with practitioners will lead to procedures which are more suitable for surveillance applications. I believe that this will lead to new theoretical challenges and that statistical surveillance methods will become more widely used. Society’s need for surveillance is huge. The discussion contributions provided important steps towards satisfying this need.

ACKNOWLEDGEMENT

This work was partially supported by the Swedish Emergency Management Agency.

726879586 7

7.

REFERENCES

Andersson, E. (2008). Effect of Dependency in Systems for Multivariate Surveillance,

Communications in Statistics. Simulation and Computation 38: 454-472.

Bock, D., Andersson, E., and Frisén, M. (2008). The Relation between Statistical Surveillance and Technical Analysis in Finance, in Financial Surveillance

, M. Frisén, ed., pp. 69-

92, Chichester: Wiley.

Frisén, M. (2003). Statistical Surveillance. Optimality and Methods,

International Statistical

Review 71: 403-434.

Frisén, M. (2009). Principles for Multivariate Surveillance, to appear in

Frontiers in

Statistical Quality Control , H.-J. Lenz and P.-T. Wilrich, eds.,

Frisén, M. and Andersson, E. (2009). Semiparametric Surveillance of Outbreaks,

Sequential

Analysis in press.

Frisén, M., Andersson, E., and Pettersson, K. (2009a). Semiparametric Estimation of

Outbreak Regression, Statistics in press.

Frisén, M., Andersson, E., and Schiöler, L. (2009b). Evaluation of Multivariate Surveillance,

Report 2009:1, Statistical Research Unit, Department of Economics, University of

Gothenburg, Sweden.

Frisén, M., Andersson, E., and Schiöler, L. (2009c). Robust Outbreak Surveillance of

Epidemics in Sweden Statistics in Medicine 28: 476-493.

Frisén, M., Andersson, E., and Schiöler, L. (2009d). Sufficient Reduction in Multivariate

Surveillance. Report 2009:2, Statistical Research Unit, Department of Economics,

University of Gothenburg, Sweden.

Frisén, M. and de Maré, J. (1991). Optimal Surveillance, Biometrika 78: 271-80.

Frisén, M. and Sonesson, C. (2006). Optimal Surveillance Based on Exponentially Weighted

Moving Averages, Sequential Analysis 25: 379-403.

Frisén, M. and Wessman, P. (1999). Evaluations of Likelihood Ratio Methods for

Surveillance. Differences and Robustness, Communications in Statistics. Simulation and Computation 28: 597-622.

Han, D. and Tsung, F. (2009). The Optimal Stopping Time for Detecting Changes in Discrete

Time Markov Processes, Sequential Analysis 28: 115-135.

Lorden, G. (1971). Procedures for Reacting to a Change in Distribution, Annals of

Mathematical Statistics 42: 1897-1908.

Mei, Y. (2008). Is Average Run Length to False Alarm Always an Informative Criterion?,

Sequential Analysis 27: 354-376.

Moustakides, G. V. (1986). Optimal Stopping Times for Detecting Changes in Distributions,

Annals of Statistics 14: 1379-1387.

Pollak, M. (1985). Optimal Detection of a Change in Distribution, Annals of Statistics 13:

206-227.

Sego, L. H., Woodall, W. H., and Reynolds, M. R., Jr (2008). A Comparison of Surveillance

Methods for Small Incidence Rates, Statistics in Medicine 27: 1225-1247.

Shiryaev, A. N. (1963). On Optimum Methods in Quickest Detection Problems, Theory of

Probability and its Applications 8: 22-46.

Sonesson, C. (2003). Evaluations of Some Exponentially Weighted Moving Average

Methods, Journal of Applied Statistics 30: 1115-1133.

Sonesson, C. (2007). A Cusum Framework for Detection of Space-Time Disease Clusters

Using Scan Statistics, Statistics in Medicine 26: 4770-4789.

Download