Semiparametric estimation of the extremal index using block maxima Paul J. Northrop Department of Statistical Science University College London Gower Street London WC1E 6BT, UK November 16, 2005 Abstract The extremal index θ, a measure of the degree of local dependence in the extremes of a stationary process, plays an important role in extreme value analyses. We propose a class of rank estimators of θ based on viewing a characterisation of θ as defining a semiparametric model. These estimators are easy to compute and have good robustness properties. We study the properties of these estimators using simulation. An application to temperature data is presented. Keywords: Extremal index; semiparametric estimation; rank estimator; extreme values. 1 Introduction Let {Zi , i = 1, 2, . . .} be a stationary sequence of random variables with marginal distribution function H. The presence of short-range dependence in the {Zi } series influences the extremal behaviour of the process. For processes satisfying a weak mixing condition, the D(un )-condition of Leadbetter et al. (1983), which limits the degree of long-term dependence at extreme levels, P {max(Z1 , . . . , Zn ) 6 un } ≈ H nθ (un ), (1) for large n and un , where θ (0 6 θ 6 1) is the extremal index of the process. The extremal index is the primary measure of extremal dependence in such processes, with θ = 1 indicating independence at asymptotically high levels. Estimation of θ is of fundamental importance when making inferences about the extremal properties of dependent sequences. Let {Zi∗ , i = 1, 2, . . .} denote a sequence of independent random variables with the same marginal distribution function as {Zi }. Equation (1) suggests that, for large n and un , Research Report No. 259, Department of Statistical Science, University College London. Date: November 2005. the distribution functions G of max(Z1 , . . . , Zn ) and F of max(Z1∗ , . . . , Zn∗ ) are related approximately by G(un ) = F (un )θ . (2) The classical theory of extreme values for i.i.d. sequences (Galambos, 1987; Leadbetter et al., 1983) gives conditions for the existence of normalising sequences an > 0 and bn such that P {max(Z1∗ , . . . , Zn∗ ) 6 an x + bn } = H n (an x + bn ) → FGEV (x), as n → ∞, where FGEV is a non-degenerate distribution function. If such a limit exists FGEV (x) is the distribution function " −1/ξ # x−µ , FGEV (x) = exp − 1 + ξ σ where 1+ξ[(x−µ)/σ] > 0, of a generalised extreme value, GEV(µ, σ, ξ), distribution. As (2) suggests, if the D(un )-condition is satisfied, the corresponding limiting distribution θ function for max(Z1 , . . . , Zn ) is FGEV (x), the distribution function of a GEV(µθ , σθ , ξ) distribution, where µθ = µ − σ(1 − θ ξ )/ξ and σθ = σθ ξ . (3) Another characterisation of the extremal index is that 1/θ is the limiting mean cluster size in the point process of exceedence times of asymptotically high thresholds (Leadbetter, 1983). This motivates methods of estimating θ based on the observed exceedence times of a suitably chosen high threshold u. The resulting estimators can be sensitive to the value of u. The blocks and runs estimators (Smith and Weissman, 1994) also require the specification of an extra tuning parameter. Recently Ferro and Segers (2003) have developed an estimator of θ based on a penultimate approximation to the asymptotic distribution of interexceedence times. Gomes (1993) and Ancona-Navarrete and Tawn (2000) have carried out comparisons of various estimators of θ. Subject to the conditions previously stated θ quantifies, via (2), the extent to which the maximum X of n independent observations Z1∗ , . . . , Zn∗ from H is stochastically larger than the maximum Y of n dependent observations Z1 , . . . , Zn from H. This illustrated in figure 1 which shows the empirical distribution functions of maxima of dependent observations and maxima of approximately independent observations resampled from the dependent series using the regular resampling method detailed in section 2. The plot on the left is based on 100 maxima of sequences of length 90 simulated from a maxautoregressive process with an extremal index of 0.5 (see (5) in section 4). The plot on the right is based on 93 maxima of sequences of length 90 taken froma the temperature series analysed in section 5. In section 3.1 we propose a class of semiparametric estimators of θ based on the ranks of observations sampled from F and G and consider their robustness properties. In section 4 we carry out a simulation study to investigate the performance of the estimators and in section 5 we present an application to temperature data from Wooster, Ohio. 2 1.0 0.6 0.0 0.2 0.4 empirical distribution function 0.8 1.0 0.8 0.6 0.4 empirical distribution function 0.2 0.0 3 4 5 6 7 8 9 −5 0 5 log(simulated data) 10 15 20 25 temperature Figure 1: Empirical distribution functions of the maxima of 120 dependent observations (——–) and 120 approximately independent observations (······) from the same marginal distribution. Left: 100 maxima based on (the logarithm of) data simulated from a max-autoregressive process with extremal index 0.5. Right: 93 maxima based on the temperature series of section 5. 2 Block maxima Suppose that we have a sample Z1 , . . . , Zmn from a stationary sequence. We divide the sample into m blocks of length n and define Yi = max{Z(i−1)n+1 , . . . , Zin }, i = 1, . . . , m. It follows from section 1 that for sufficiently large n, Y1 , . . . , Ym are approximately independent observations from a GEV distribution with distribution function G = F θ . To estimate θ based on the characterisation G = F θ we require an i.i.d. sample with the same marginal distribution as Z1 , . . . , Zmn . We achieve this by resampling from Z1 , . . . , Zmn in such a manner that the resampled sequence of observations are approximately an i.i.d. sample. One possibility (Ancona-Navarrete and Tawn, 2000) is to randomly permute the observations. We call this random resampling. An alternative approach is regular resampling in which Z1 , . . . , Zmn are placed in the order {(Z(j−1)m+i , j = 1, . . . , n), i = 1, . . . , m}. In either case we create an approximate sample X1 , . . . , Xm from F by taking the maxima over disjoint blocks of length n. Under regular resampling the values used to produce each Xi are exactly m time points apart in the original series. Under random resampling the separation of these values varies. For sufficiently large m we expect the dependence within the resampled blocks to be stronger under random resampling than under regular resampling, and therefore estimators based on random resampling to exhibit some positive bias. However, we would expect dependence between the Xi s to be greater under regular resampling. 3 Estimation of the extremal index Gomes (1993) and Ancona-Navarrete and Tawn (2000) have examined parametric approaches to estimating θ based on (2). Let Y = (Y1 , . . . , Ym ) denote a sample of block maxima from G and X = (X1 , . . . , Xm ) a sample of block maxima from F . Gomes (1993) fits GEV distributions separately to these samples and constructs an estimator of 3 θ using the relationships in (3). Ancona-Navarrete and Tawn (2000) estimate (µ, σ, ξ, θ) simultaneously by maximising a likelihood constructed under the assumption that X and Y are mutually independent. In both cases extra parameters are estimated even though interest focuses on θ and estimates of θ may be sensitive to outlying observations and departures from the assumed form of F . 3.1 Semiparametric estimation of the extremal index We view the limiting relationship (2), G(x) = F (x)θ , as defining a semiparametric model in which θ is the parameter of interest and F and G are treated as unknown nuisance functions. The structure of this model is very similar to that of a two-sample proportional hazards model G = F φ (Cox, 1972) for survival data, where G and F are unknown survivor functions and and φ is the relative risk. Thus we consider estimators of θ which are analogous to those proposed for φ. We adapt a class of rank estimators of φ investigated by Begun (1982) and Begun and Reid (1983). Taking logs and differentiating we express the constraint G = F θ as J(F, G)F dG = θJ(F, G)G dF, where J(·, ·) is a score function included with a view to optimising Rthe properties of the resulting estimator. Provided J satisfies the integrability condition |J(F, G)|G dF < ∞, θ can be expressed as Z .Z θ = J(F, G)F dG J(F, G)G dF. A class θbJ = m X j=1 b j )}Fb(Yj ) J{Fb(Yj ), G(Y m .X j=1 b j )}G(X b j) J{Fb(Xj ), G(X of estimators of θPis obtained by replacing F and G by the empirical distribution functions Pm m b b F (x) = (1/m) i=1 I(Xi 6 x) and G(y) = (1/m) i=1 I(Yi 6 y). Under suitable conditions on J (given by Begun and Reid (1983)) the estimator θbJ is strongly consistent and asymptotically normal. The distribution of θbJ does not depend on F and is robust to departures from assumptions about the underlying form of F . These estimators are not contrained to be less than or equal to 1, so in practice we use min(1, θbJ ) to estimate θ. Begun and Reid (1983) consider three score functions: J1 = 1, J2 = 2/(F + G) and J3 = 2/(F + θG). For J1 = 1 the resulting estimator θb1 estimates P (X < Y )/P (Y < X) and θb2 is an analogue of the Mantel-Haenszel estimator (Crowley, 1975). Clearly θb3 depends on the unknown θ. A two-step estimator θb3 = m X j=1 m .X b i) G(X Fb(Yj ) b j) b i) Fb(Yj ) + θb1 G(Y Fb(Xi ) + θb1 G(X i=1 is constructed, based on the initial estimator θb1 . Provided that the initial estimator is consistent, which θb1 is, θb3 is optimal among regular rank estimators of θ, in the sense 4 of attaining minimum asymptotic variance. The two-step estimator θb3 is asymptotically fully efficient relative to the ranks-based maximum partial likelihood estimator of Cox (1975) and can be interpreted as a single iteration of an algorithm to solve these partial likelihood equations. Begun and Reid (1983) note that the gain in asymptotic efficiency of θb3 over θb2 is small unless θ < 1/4 and Bernstein et al. (1981) find that θb2 performs well in small samples. Under the assumption that the random variables (X, Y ) are mutually independent the approximate large sample variances of these estimators are given by 1 θ(1 + θ)2 (1 + 4θ + θ 2 ) n (2 + θ)(1 + 2θ) 2 θ I2 (θ) + I2 (1/θ) var(θb2 ) = (4) n I1 (θ)I1 (1/θ) θ 2 −1 var(θb3 ) = I (θ), n 3 R1 R1 R1 where I1 (t) = 0 (1+u1−t )−1 du, I2 (t) = 0 ut−1 (1+ut−1 )−2 du, I3 (t) = 0 (1+tu1−1/t )−1 du. The influence functions (Hampel, 1974) Z Z ∞ J(F, G) dG J(F, G)G dF, IC1 (x) = −θJ{F (x), G(x)}G(x) + t Z Z ∞ IC2 (y) = J{F (y), G(y)}F (y) − θ J(F, G) dF J(F, G)G dF, var(θb1 ) = t which quantify the sensitivity of θbJ to individual observations x and y from X and Y respectively, are bounded, indicating that θbJ is robust to the presence of outlying observations. Since X and Y are based on the same underlying data there is some dependence between them. In particular, since each value is resampled exactly once to produce X, the maximum values X(m) and Y(m) of X and Y are equal. Consider . X X b (m) ) + b (i) ) . θb1 = Fb(Y(m) ) + Fb(Y(i) ) G(X G(X Yi <Y(m) Xi <X(m) b (m) ) = 1. Under the model The constraint X(m) = Y(m) means that Fb(Y(m) ) = G(X b (m) ) = 1 with probability G = F θ , P (X(m) > Y(m) = 1/(1 + θ), so that Fb(Y(m) ) < 1, G(X 1/(1 + θ). Therefore, for θ < 1, θb1 incurs positive bias as a result of the constraint X(m) = Y(m) , especially for small n. Similar arguments apply to θb2 and θb3 but the effects on these estimators are smaller due to the weighting provided by J2 and J3 . The sample size m is determined by the the choice of block size n. Here the main requirement is that n is sufficiently large that the limiting relationship (2) holds approximately. Gomes (1993) and Ancona-Navarrete and Tawn (2000) set m = n, a pragmatic choice motivated partly by the need for m to be large enough to estimate reliably all unknown parameters. In the current approach this requirement is less crucial. We examine the sensitivity of the semiparametric estimates of θ to m and n in sections 4 and 5. 5 4 Simulation study We examine the performance of estimators θb1 , θb2 and θb3 , for range of sample sizes, in terms of bias, root mean square error (RMSE), and coverage probabilities of confidence intervals constructed based on asymptotic variances (4). We construct confidence interb an approxvals based on normal approximations to the distributions of θb and of log θ, imate variance-stabilising transformation. We simulate data from a max-autoregressive process {Zi , i = 1, 2, . . .} with extremal index θ, namely, Z1 = W1 /θ, Zi = max{(1 − θ)Zi−1 , Wi }, i = 2, 3, . . . (5) where Wi , i = 1, 2, . . . are independent unit Fréchet random variables. This is the process used by Ferro and Segers (2003) to study the properties of the intervals estimator of θ. We simulate 1000 sequences of length 90m, m = 10, 25, 50 and 100 for each of three values, 0.25, 0.5, and 0.75, of θ. For comparability with the example in section 5 we use a block size of 90, equivalent to data from one winter period. Table 1 shows the estimated bias and RMSE of the estimators based on regular resampling of the original data series. As anticipated θb1 exhibits positive bias, the bias decreasing with m and with θ, but θb2 and θb3 have little bias. For the sample sizes studied θb2 tends to have a slightly lower RMSE than the other estimators. The estimated covarage probabilities are generally close to the nominal value although confidence intervals based on θb1 tend to be conservative. Estimator θb2 performs slightly better than θb3 in this respect and confidence intervals constructed on the log θ-scale appear preferable for θ = 0.25 and for θ = 0.5 and θ = 0.75 have similar coverages to those constructed on the θ-scale. These observations are broadly consistent with the findings of Bernstein et al. (1981). These results suggest that any dependence between and within X and Y is sufficiently limited that the asymptotic variances in (4) are useful. Noticeable positive bias results if random resampling is used, especially for small m. For example, the bias of θb2 for m = 10 is estimated to be 0.07 when θ = 0.25. RMSEs are comparable to those under regular resampling but confidence intervals have poorer coverage properties. In order to make comparisons with the intervals estimator we note that the results of Ferro and Segers (2003) are based on a sample size of approximately 56. For m = 50 the magnitudes of the estimated RMSEs of the semiparametric estimators are similar to those obtained by Ferro and Segers (2003) for a threshold corresponding to the 98th percentile of the marginal Fréchet distribution. For lower thresholds the intervals estimator has smaller RMSE. The coverage probablities of the confidence intervals associated with θb2 compare favourably to those of the bootstrapped confidence intervals based on the intervals estimator. 6 bias θ m θb1 RMSE θb2 θb3 θb1 θb2 θb3 0.25 10 25 50 100 0.049 -0.004 -0.019 0.026 0.002 -0.006 0.020 0.006 0.001 0.012 0.006 0.004 0.150 0.100 0.075 0.051 0.130 0.093 0.070 0.048 0.140 0.098 0.073 0.050 0.5 10 25 50 100 0.038 -0.009 -0.010 0.030 0.009 0.010 0.016 0.003 0.003 0.011 0.004 0.004 0.200 0.140 0.098 0.067 0.200 0.140 0.098 0.067 0.210 0.140 0.100 0.068 0.75 10 25 50 100 0.007 -0.025 -0.023 0.019 0.005 0.006 0.013 0.003 0.004 0.009 0.004 0.004 0.180 0.140 0.110 0.076 0.200 0.160 0.110 0.084 0.200 0.160 0.120 0.084 Table 1: Estimated bias and root mean square error (RMSE) of the three semiparametric estimators of the extremal index of a max-autoregressive process. 7 θ-scale θb3 θb1 θb2 θb3 0.25 10 1 0.90 0.84 25 0.95 0.90 0.85 50 0.96 0.91 0.89 100 0.95 0.93 0.90 0.98 0.97 0.95 0.95 0.98 0.95 0.92 0.94 0.98 0.91 0.89 0.91 0.5 10 25 50 100 0.91 0.95 0.95 0.97 1 1 0.99 0.98 0.98 0.97 0.97 0.97 0.97 0.97 0.96 0.96 0.75 10 0.99 0.97 0.96 25 0.99 0.97 0.97 50 1 0.98 0.98 100 1 0.99 0.99 1 1 1 1 0.99 0.98 0.98 0.99 0.98 0.98 0.98 0.99 θ m θb1 log θ-scale 0.98 0.98 0.98 0.99 θb2 0.93 0.96 0.95 0.97 Table 2: Estimated coverage probabilities of 95% confidence intervals for the extremal index θ of a max-autoregressive process. Left: calculated on the θ-scale using a normal b right: calculated on the log θ-scale. approximation to the distribution of θ, 8 5 Application to temperature data We use the semi-parametric estimators to estimate the extremal index of a series of negated daily minimum temperatures, recorded to the nearest degree Fahrenheit, at Wooster, Ohio from June 1893 to December 1987. Smith et al. (1997) and Coles et al. (1994) gives further details and analyses of these data. In common with Smith et al. (1997) and Ferro and Segers (2003) we study the winter months of December to February, a period over which the series is approximately stationary. The data contain 93 winter periods each of length 90 days. There are very few missing values in these data. For data where there are appeciable numbers of missing observations, θ can be estimated by viewing blocks containing missing values as providing right-censored block maxima and using the estimators proposed by Begun and Reid (1983) for censored survival data. We estimate θ for block sizes ranging from 30 days to 360 days using regular resampling from the orginal series. Quantile plots (not shown) suggest that for these data the GEV limit applies approximately for block size as small as 30. Results are presented for θb2 only. As expected very similar results are obtained for θb3 , with θb1 producing slightly larger estimates of θ than the other two estimators. The estimates of θ are plotted against block size in figure 2 with 95% confidence limits. The estimates are stable over a wide range of block sizes and suggest an extremal index of approximately 0.6, in agreement with the findings of Ferro and Segers (2003) and Smith et al. (1997). The relationship between θb2 and block size in figure 2 mirrors that of θb and threshold in these papers. This make sense since we would expect some approximate correspondence between threshold and block size. Also shown in fig 2 for each block size are the averages of 100 estimates of θ based on 100 different random resamples from the original series. For small block sizes these averages are very similar to the estimates based on regular resampling but for large block sizes (corresponding to smaller sample sizes) these averages vary more smoothly with block size. 6 Acknowledgements I am very grateful to Chris Ferro for supplying the Wooster temperature data. References Ancona-Navarrete, M. A. and J. A. Tawn (2000). A comparison of methods for estimating the extremal index. Extremes 3 (1), 5–38. Begun, J. M. (1982). Estimates of relative risk. Institute of statistics, mimeo series no. 1382, University of North Carolina at Chapel Hill. Begun, J. M. and N. Reid (1983). Estimating the relative risk with censored data. Journal of the American Statistical Association 78 (382), 337–341. 9 279 139 93 69 55 46 39 34 31 27 25 23 30 60 90 120 150 180 210 240 270 300 330 360 1.0 extremal index 0.8 0.6 0.4 0.2 0.0 block size / days Figure 2: Estimates θb2 (◦) and 95% confidence limits (- - - - - - -) of the extremal index for the Wooster temperature series for block sizes ranging from 30 days to 360 days, using regular resampling from the original series. The confidence limits are calculated using a normal approximation on the log θ scale. Also shown (· · · · · · ·) are the averages of 100 estimates using random resampling. The sample size corresponding to each block size is given on the upper axis. 10 Bernstein, L., J. Anderson, and M. C. Pike (1981). Estimation of the proportional hazard in two treatment group clinical trials. Biometrics 37, 513–520. Coles, S. G., J. A. Tawn, and R. L. Smith (1994). A seasonal Markov model for extremely low temperatures. Environmetrics 5, 221–239. Cox, D. R. (1972). Regression models for life tables. J. Roy. Statist. Soc. B 34, 187–220. Cox, D. R. (1975). Partial likelihood. Biometrika 62, 269–276. Crowley, J. (1975). Estimation of relative risk in survival studies. Technical report no. 423, Department of Statistics, University of Wisconsin. Ferro, C. A. T. and J. Segers (2003). Inference for clusters of extreme values. J. Roy. Statist. Soc. B 65 (2), 545–556. Galambos, J. (1987). The asymptotic theory of extreme order statistics. Melbourne: Krieger. 2nd edn. Gomes, M. I. (1993). On the estimaton of parameters of rare events in environmental times series. In V. Barnett and K. F. Turkman (Eds.), Statistics for the Environment 2: Water Related Issues, Chichester, pp. 225–241. Wiley. Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association 69 (346), 383–393. Leadbetter, M. R. (1983). Extremes and local dependence in stationary sequences. Z. Wahrsch. Ver. Geb. 65, 291–306. Leadbetter, M. R., G. Lindgren, and H. Rootzén (1983). Extremes and related properties of random sequences and series. New York: Springer. Smith, R. and I. Weissman (1994). Estimating the extremal index. J. Roy. Statist. Soc. B 56 (3), 515–528. Smith, R. L., J. A. Tawn, and S. Coles (1997). Markov chain models for threshold exceedances. Biometrika 84, 249–268. 11