Please cite: Northrop, P. J. (2015) An efficient semiparametric maxima estimator of the extremal index. Extremes, 18(4), 585-603, doi: 10.1007/s10687-015-0221-5.. Semiparametric estimation of the extremal index using block maxima∗ Paul J. Northrop Department of Statistical Science University College London, Gower Street, London WC1E 6BT, UK p.northrop@ucl.ac.uk September 16, 2012 Abstract The extremal index θ, a measure of the degree of local dependence in the extremes of a stationary process, plays an important role in extreme value analyses. We estimate θ semiparametrically, using the relationship between the distribution of block maxima and the marginal distribution of a process to define a semiparametric model. We show that these semiparametric estimators are simpler and substantially more efficient than their parametric counterparts. We seek to improve efficiency further using maxima over sliding blocks. An application to sea-surge heights combines inferences about θ with a standard extreme value analysis of block maxima to estimate marginal quantiles. Keywords: Extremal index; semiparametric estimation; extreme value theory; block maxima; sea-surge heights. 1 Introduction The modelling of rare events in stationary processes is important in many application areas. The extremal behaviour of such a process is governed by its marginal distribution and by its extremal dependence structure. Chavez-Demoulin and Davison (2012) provide a review of this area, concentrating on the latter aspect. For processes satisfying the D(un ) condition of Leadbetter et al. (1983), which limits long-range dependence at extreme levels, the extreme value index θ (0 6 θ 6 1) is the primary measure of shortrange extremal dependence. Following Leadbetter et al. (1983), let X1 , X2 , . . . be a strictly stationary sequence of random variables that satisfies the D(un ) condition and has marginal distribution function F . Let Mb = max(X1 , . . . , Xb ). For large b and ub the distribution function G of the block maximum Mb is approximately related to F via G(ub ) = P (Mb 6 ub ) ≈ F bθ (ub ). ∗ (1) Research Report number 318, Department of Statistical Science, University College London. Date: September 2012 1 Further, if there exist normalizing constants cb and db such that F b (cb x + db ) → G(x), as b → ∞, then G(x) is the distribution function of a GEV distribution. The corresponding result for Mb∗ = max(X1∗ , . . . , Xn∗ ), where X1∗ , X2∗ , . . . are independent variables with distribution function F , gives the limiting distribution function H(x) = G(x)1/θ . Thus, the limiting distributions of Mb and Mb∗ are GEV, with respective location, scale and shape parameters (µθ , σθ , ξ) and (µ, σ, ξ), say, related by µθ = µ + σ θξ − 1 /ξ, σθ = σθξ . (2) For finite b a block size dependent index θb applies, with θb → θ as b → ∞. Smith (1992) defines θb as θb = − log G(ub )/ log 2, where F b (ub ) = 1/2. We will only make explicit the dependence of θ on b when necessary. Two types of analysis where the estimation of θ is vital are: block maxima are modelled and estimates of high marginal quantiles of X are required (type 1); exceedances of a high threshold are modelled and estimates of return levels (high quantiles of the distribution of annual maxima) are required (type 2). Ignoring θ may lead to underestimation of quantiles in analyses of type 1 and overestimation of return levels (and misinterpretation of their meaning) in analyses of type 2 (Beirlant et al., 2004; Chavez-Demoulin and Davison, 2012). Recent advances in the estimation of θ (Ferro and Segers, 2003; Süveges, 2007; Süveges and Davison, 2010; Laurini and Tawn, 2003) have concentrated on threshold methods, based on exceedences a threshold. Maxima methods (Gomes, 1993; Ancona-Navarrete and Tawn, 2000), described in section 1.1, are also used and Fawcett and Walshaw (2012) advocate the use of either the intervals estimator of Ferro and Segers (2003) or the maxima estimator of Gomes (1993) for analyses of type 2. In analyses of type 1 it seems natural to use a maxima method to estimate θ, but it seems that less attention has been devoted to improving existing maxima methods. We propose a new maxima method that is simpler and has much greater statistical efficiency that existing maxima methods. 1.1 Maxima methods Parametric maxima methods are based on fitting GEV distributions to two sets of maxima of b consecutive observations. The first sample Mi , i = 1, . . . , n is block maxima of the original series. The second sample Mi∗ , i = 1, . . . , n is block maxima of the series obtained by randomising the index of the original series, to obtain approximately an independent series with the same marginal distribution as the original sequence. Based on (2), Gomes (1993) fit a GEV(µθ , σθ , ξ) distribution to {M } and a GEV(µ, σ, ξ) distribution to {M ∗ } and construct the estimator θbG = (b σ /b σθ )−1/ξ̃ , where ξ˜ = (b σ−σ bθ )/(b µ−µ bθ ). √ A block size of b = m is suggested. Ancona-Navarrete and Tawn (2000) combine the two GEV fits into one by maximising a likelihood (with respect to (µ, σ, ξ, θ)), assuming that ({Mθ }, {M }) are independent. We call the resulting estimator θbAT . In one sense parametric maxima methods are anomalous: other methods of estimating θ do so directly, without embedding θ in a larger model with nuisance parameters. Northrop (2005) proposes a semiparametric maxima estimator. The relationship G = H θ is used but no particular parametric form is assumed for G or H. 2 An undesirable feature existing maxima methods is the need to resample the original data to produce a sample of block maxima with approximate c.d.f. H. In section 2 we show this is unnecessary: more efficient estimators of θ can be constructed by comparing G directly to F , without generating pseudo-samples from H. The theoretical gain in efficiency is quantified in section 2.3 and and in section 2.1 arguments in favour of the semiparametric estimators are given. In section 2.2 we show that one of these estimators is approximately an extended version of the blocks estimator of Robert (2009). In sections 3 and 4 we carry out a simulation study and an extreme value analysis of type 1 on sea-surge data. 2 Semiparametric estimators of θ Let X1 , . . . , Xm denote strictly stationary sequence of random variables with marginal distribution function F and extremal index θ. Let M [s, t] = maxs<k6t Xk . Consider Y = M [s, s + b], the maximum of a block of b consecutive Xs. Let V = F (Y )b . For the moment we assume that F is known. Under (1), V has a Beta(θ, 1) distribution, with distribution function v θ , 0 < v < 1. We construct two sets of block maxima: Yid = M [(i − 1)b, ib], i = 1, . . . , bm/bc (disjoint blocks) and Yis = M [i − 1, i + b − 1], i = 1, . . . , m − b + 1 (sliding blocks). We treat {Vid } = {F (Yid )b } as a random sample, and {Vis } = {F (Yis )b } as a dependent sample, from a Beta(θ, 1) distribution. We expect (as in Robert et al. (2009)) that {Yis } contains more information about θ than {Yid } and thus produces a more efficient estimator of θ. The maximum likelihood estimator of θ based on a randomPsample V1 , . . . , Vn from a Beta(θ, 1) distribution is θbSP = −1/log V , where log V = ni=1 log Vi /n. Guenther (1967) shows that the minimum variance unbiased estimator of θ is θbSP 2 = (n − 1) θbSP /n with var(θbSP 2 )=θ2 (n − 2)−1 , so that var(θbSP )=n2 θ2 (n − 2)−1 (n − 1)−2 . Thus, ψ = log θ is an approximate variance-stabilizing transformation. The moment estimator θbSP 3 = V /(1 − V ), has low efficiency for θ 6 1 (Johnson and Kotz, 1970, chapter 24). We use X1 , . . . , Xm to estimate F so that θbSP is a Rank Approximate M-estimate (Dabrowska et al., 1989). We make an adjustment for fact that, for example, the values X1 , . . . , Xb cannot exceed Y1 = M [0, b]. Let Bi be the set of indices of the Xs that contribute to block maximum Yi . Let li = mink∈B / i Xk and ui = maxk∈B / i Xk . For li 6 y 6 ui we let X 1 I(Xk 6 y) Fb−i (y) = m−b+1 k∈B / i and use it to define Vbi = Fb−i (Yi )b . Thus, the Xs that contribute to block maximum Yi are not used in the estimate of F applied to Yi . Following Dabrowska et al. (1989) we set Fb−i (y) to 1/(m − b + n + 1) if y < li , although in practice Yi < li is unlikely. By construction Yi > ui is not possible. Let R = (R1 , . . . , Rn ) be the ranks of the block maxima Y = (Y1 , . . . , Yn ) within X1 , . . . , Xm , so that if Ri = ri then ri − 1 of {Xk , k ∈ / Bi } are larger than Yi and b m − b + 1 − ri are smaller than Yi . Thus, Vi = (1 − ri /(m − b + 1))b . The ranks convey no information about the marginal distribution of Y . Therefore, assuming a GEV(µθ , σθ , ξ) model for the block maxima, the joint likelihood factorises as L(θ, µθ , σθ , ξ; R, Y ) = LBeta (θb ; R)LGEV (µθ , σθ , ξ; Y ), 3 (3) so that independent inferences can be made about θ and (µθ , σθ , ξ). In (3) the rank likeQ lihood LBeta (θ; R) = θn ( ni=1 vi )θ−1 is constructed under the assumption that V1 , . . . , Vn are independent. This is far from the case if sliding blocks are used: successive values of V will be strongly positively associated, and pairs (Vi , Vj ) from disjoint blocks are negatively associated (see appendix B). We attempt to adjust for this using an information b −1 Vb (ψ)J b (ψ) b −1 for the asymptotic variance of sandwich estimator (White, 1982) J (ψ) b where the observed information J (ψ) = n and Vb (ψ) is an estimate of the variance ψ, of the score function. Details of the estimation of V (ψ) are given in appendix B. 2.1 Desirable properties of these estimators The estimator θbSP is simple and non-iterative. Sections 2.3 and 3 show that it is more efficient than its parametric counterparts. The extremal index measures local dependence in extremes and is independent of the marginal distribution of the process. Since θbSP is determined by the ordering of the ranks of the raw data it is invariant to marginal transformation, whereas the parametric alternatives are not. In common with threshold methods (for example, Süveges and Davison (2010)), θ is estimated directly, rather than as part of a larger extreme value model, that is, estimation of extremal dependence and marginal behaviour are separated. It may be that the assumption (G = F bθb ) underlying θbSP is reasonable for smaller block sizes than the parametric GEV assumptions. Section 3 gives examples where the rate of convergence of θb to θ as b → ∞ is O(1/b). Thus, convergence to G = F bθ could be relatively fast even if convergence to the limiting GEV form, which depends on the marginal distribution, is slow. The semiparametric framework permits the use of a relatively small block size, whereas the parametric alternatives do not. 2.2 Link to the blocks estimator Blocks estimators of θ require the choice of a threshold u and a block length b. We show that θbSP is asymptotically equivalent to using the blocks estimator of Smith and Weissman (1994) (sometimes called the logs estimator) with a set of random thresholds. b The Smith and Weissman (1994) blocks estimator is θbB = log G(u)/b log Fb(u). Robert (2009) extends this idea by using a random threshold set at a particular sample quantile. Suppose that we use a set of random thresholds ui = Yi , where Yi , i = 1, . . . , n are a sample of block maxima over blocks of length b. We may think of this as using the data b i )/b log Fb(Yi ) is an estimator to define a set of local thresholds. Each of the ratios log G(Y of θ. We combine these using the ratio estimator n θbRB = 1X b i) log G(Y n i=1 n 1X log Fb(Yi ) b n i=1 n = 1X b i) log G(Y n i=1 n 1X log Vi n i=1 . (4) b (i) ) = i/n and, by Stirling’s formula, If disjoint blocks are used to construct {Yi } then G(Y as n → ∞ the numerator of (4) ↓ −1. Therefore, for sufficiently large n, θbSP ≈ θbRB . 4 2.3 Theoretical comparison with parametric maxima method The calculations in appendix A show that, subject to the assumptions made above, θbSP has a smaller asymptotic variance than θbAT . The efficiency of θbAT relative to θbSP depends on θ (see figure 1) but θbAT is at best 50% efficient (when θ = 1). This is expected because the resampling by the parametric estimator introduces an extra source of variability. % relative efficiency of parametric estimator 50 40 30 20 10 ξ = − 0.4 ξ=0 ξ = 0.4 0.0 0.2 0.4 0.6 0.8 1.0 θ Figure 1: Asymptotic relative efficiency of the parametric maximum likelihood estimator of θ compared to the semiparametric maximum likelihood estimator, for a range of values of GEV shape parameter ξ. 3 Simulation study We conduct a simulation study to compare the performances of the semiparametric estimators to the parametric estimators θbAT and θbG for different processes, marginal distributions and blocks sizes. In the following Zi , i = 1, 2, . . . are independent unit Fréchet random variables with P (Z 6 z) = exp(−1/z), for z > 0. A moving maxima process X1 , X2 , . . . (Deheuvels, 1983) is defined by Xi = max {αj Zi+j }, j=0,...,p i = 1, 2, . . . , where {αj } are constants satisfying α0 > 0, αp > 0 and αj > 0, for j = 1, . . . , p − 1, with P p j=0 αi = 1. For simplicity here we assume that p is finite. This process has extremal index θ = maxi=0,...,p (αi ). A first-order Markov chain X1 , X2 , . . . with symmetric logistic extreme value transition distribution (Tawn, 1988) and unit Fréchet margins is implied by β −1/β −1/β , P (Xi 6 x1 , Xi+1 6 xi+1 ) = exp − xi + xi+1 where β ∈ (0, 1]. The extremal index can be estimated by simulation (Smith, 1992). 5 A max-autoregressive (maxAR) process X1 , X2 , . . . (Davis and Resnick, 1989) is defined by Xi = max{(1 − θ)Xi−1 , θZi }, i = 1, 2, . . . , where 0 < θ 6 1 and X0 has a unit Fréchet distribution. The extremal index of this process is θ. The maxAR process is an infinite-order moving maxima process with αj = θ(1 − θ)j , j = 0, 1, . . .. It is also a first-order Markov chain with a particular bivariate extreme value transition distribution (Ledford and Tawn, 2003). A first-order Gaussian autoregressive (AR(1)) process is defined by Xi = αXi−1 + i , i = 1, 2, . . . , where 1 , 2 , . . . are i.i.d. N(0, 1 − α2 ) random variables, X0 ∼ N (0, 1) and we assume |α| < 1 for second-order stationarity. This process exhibits serial dependence but limiting extremal independence because θ = 1 (Leadbetter et al., 1983, chapter 4). For the maxAR process θb = θ+(1−θ)/b and for the moving maxima process θb = θ+c/b, where 1 − θ 6 c 6 pθ (see appendix C). Where analytical results are not possible simulation can be used to estimate θb but Ancona-Navarrete and Tawn (2000) note that even very long simulations may produce unacceptably imprecise estimates. We have found this to be the case for the AR(1) process. For the logistic Markov chain Smith (1992) used simulation to estimate θb , finding that even longer simulations would be required for estimates to respect the conjectured monotonicity of θb as b increases. For a given process we simulate 500 sequences of length m = 4, 900 and estimate θb using the semiparametric estimators (disjoint and sliding blocks) and θbAT and θbG for b = 20, 70, 245 (n = 245, 70, 20). To examine the impact of marginal distribution we apply θbAT and θbG after transformation to Gumbel, Gaussian and exponential margins. In fact the marginal distribution has a relatively small effect on the overall performance of θbAT and θbG so we present results only for Gumbel margins. However, figure 2 compares the estimates produced by θbAT for a moving maxima process with Gaussian and Gumbel margins. For small b = 20 (n = 245) the agreement is very close but for b = 245 (n = 20) there is large disagreement for some datasets. We present results (tables 1-5) for the b=20 0.32 0.30 0.28 0.26 0.24 ● 0.6 ● 0.24 0.26 0.28 0.30 0.32 0.34 ● ● ● 0.3 ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 0.2 ● ● ● ● ● ● 0.1 θ^b (Gumbel margins) ● ● 0.4 0.1 ● ● 0.5 0.2 ● 0.22 ● ● ● 0.22 ● ● θ^b (Gaussian margins) 0.34 θ^b (Gaussian margins) b=245 ● ●● ● ● ● ●● ●● ●● ●● ● ● ●● ● ● ● ●● ●● ● ●● ● ●●●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●●● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ●●● ● ●● ●● ● ●● ● ● ● ●●● ● ●● ●● ●●● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ●●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ●●● ● ●●● ● ●●● ●● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ●●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ●● ● ●● ● ●●● ●● ● ●● ● ● 0.3 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0.4 0.5 θ^b (Gumbel margins) Figure 2: Estimates θbAT based on data simulated from a moving maxima process with α = (1/4, 1/4, 1/4, 1/4): Gaussian margins against Gumbel margins. Left: b = 20. Right: b = 245. same processes for which results are presented in Ancona-Navarrete and Tawn (2000), although we have checked that the findings apply across other levels of dependence. We 6 b 20 70 245 estimator SP,d SP,s P,AT P,G SP,d SP,s P,AT P,G SP,d SP,s P,AT P,G θb mean 1 1.00 1.00 1.01 1.00 1 0.99 0.99 1.01 1.01 1 0.98 0.96 1.04 1.04 RMSE 0.045 0.031 0.068 0.060 0.081 0.064 0.116 0.108 0.149 0.118 0.236 0.234 SD 0.045 0.031 0.068 0.060 0.080 0.062 0.116 0.108 0.148 0.111 0.233 0.231 SE adj SE 0.064 0.063 0.014 0.055 0.091 boot SE 0.044 0.033 0.122 0.014 0.171 0.113 0.098 0.083 0.062 0.242 0.014 0.333 0.186 0.158 0.155 0.118 rel eff 2.06 0.43 0.55 1.64 0.48 0.55 1.77 0.40 0.41 Table 1: Independent data; θ = 1. Estimators - SP,d: θbSP (disjoint blocks); SP,s: θbSP (sliding blocks); P,AT: θbAT ; P,G: θbG . Sampling distribution mean, root mean square error (RMSE) and standard deviation (SD), mean standard error (SE), sandwich adjusted standard error (adj SE) and boostrap standard error (boot SE) and efficiency relative to SP,d (rel eff). present results only for θbSP . The performance of θbSP 2 is very similar and, as expected, both these estimator out-perform θbSP 3 . We calculate naive standard errors (for θbSP using section 2 and for θbAT using appendix A). For θbSP we also calculate standard errors based on a sandwich estimator (appendix B) and on 100 stationary block bootstrap resamples (Politis and Romano, 1994) with optimal block length chosen using Patton et al. (2009), implemented using Canty and Ripley (2010) and Hayfield and Racine (2008). All estimators are approximately unbiased for θb (where this is known). When using disjoint blocks the semiparametric estimator outperforms the parametric estimators approximately by the amounts suggested by figure 1. The use of sliding maxima improves this further. Comparison of the mean standard errors (SE) with standard deviations (SD) shows that the (non-bootstrap) standard errors tend to be a little too large, that is, all the estimators perform better than expected. The exception is the logistic Markov chain for which the standard errors are approximately the correct size for b = 20 and b = 70 and there is slight underestimation for b = 245. In general the bootstrap standard errors are more reliable. As expected, for the estimators based on sliding blocks the naive standard errors are far too small. 7 b 20 70 245 estimator SP,d SP,s P,AT P,G SP,d SP,s P,AT P,G SP,d SP,s P,AT P,G mean 0.70 0.70 0.72 0.75 0.79 0.78 0.81 0.83 0.86 0.85 0.91 0.93 SD 0.039 0.030 0.047 0.045 0.072 0.057 0.097 0.094 0.147 0.118 0.218 0.223 SE adj SE 0.045 0.049 0.010 0.044 0.066 boot SE 0.043 0.032 0.097 0.011 0.138 0.099 0.084 0.083 0.062 0.213 0.013 0.292 0.174 0.144 0.154 0.118 rel eff 1.67 0.70 0.76 1.61 0.56 0.60 1.57 0.46 0.44 Table 2: AR(1) process with α=0.5; θ = 1. Estimators - SP,d: θbSP (disjoint blocks); SP,s: θbSP (sliding blocks); P,AT: θbAT ; P,G: θbG . Sampling distribution mean and standard deviation (SD), mean standard error (SE), sandwich adjusted standard error (adj SE) and boostrap standard error (boot SE) and efficiency relative to SP,d (rel eff). b 20 estimator θb mean SP,d 0.38 0.39 SP,s 0.39 P,AT 0.39 P,G 0.38 70 SP,d 0.34 0.35 SP,s 0.35 P,AT 0.36 P,G 0.34 245 SP,d 0.33 0.35 SP,s 0.35 P,AT 0.37 P,G 0.34 RMSE 0.024 0.023 0.033 0.038 0.047 0.044 0.067 0.074 0.085 0.078 0.134 0.148 SD 0.023 0.021 0.033 0.038 0.044 0.041 0.064 0.074 0.083 0.076 0.127 0.148 SE adj SE 0.025 0.024 0.006 0.023 0.038 boot SE 0.025 0.021 0.044 0.005 0.067 0.041 0.037 0.046 0.040 0.087 0.005 0.133 0.071 0.061 0.091 0.078 rel eff 1.16 0.48 0.35 1.17 0.48 0.36 1.19 0.43 0.31 Table 3: logistic Markov chain with β=0.5 and θ ≈0.328. Estimators - SP,d: θbSP (disjoint blocks); SP,s: θbSP (sliding blocks); P,AT: θbAT ; P,G: θbG . Sampling distribution mean, root mean square error (RMSE) and standard deviation (SD), mean standard error (SE), sandwich adjusted standard error (adj SE) and boostrap standard error (boot SE) and efficiency relative to SP,d (rel eff). 8 b 20 estimator θb mean SP,d 0.34 0.35 SP,s 0.35 P,AT 0.35 P,G 0.34 70 SP,d 0.31 0.31 SP,s 0.31 P,AT 0.32 P,G 0.30 245 SP,d 0.30 0.30 SP,s 0.30 P,AT 0.32 P,G 0.28 RMSE 0.015 0.011 0.026 0.026 0.027 0.020 0.042 0.046 0.047 0.037 0.083 0.088 SD 0.015 0.011 0.026 0.026 0.027 0.020 0.042 0.045 0.047 0.037 0.082 0.084 SE adj SE 0.022 0.022 0.005 0.020 0.035 boot SE 0.018 0.013 0.038 0.004 0.060 0.037 0.032 0.032 0.024 0.076 0.004 0.116 0.061 0.052 0.063 0.049 rel eff 1.82 0.34 0.35 1.83 0.43 0.37 1.62 0.33 0.32 Table 4: Moving maxima process with α=(0.3, 0.2, 0.3, 0.3); θ = 0.3. Estimators SP,d: θbSP (disjoint blocks); SP,s: θbSP (sliding blocks); P,AT: θbAT ; P,G: θbG . Sampling distribution mean, root mean square error (RMSE) and standard deviation (SD), mean standard error (SE), sandwich adjusted standard error (adj SE) and boostrap standard error (boot SE) and efficiency relative to SP,d (rel eff). b 20 estimator θb mean SP,d 0.52 0.53 SP,s 0.52 P,AT 0.53 P,G 0.52 70 SP,d 0.51 0.51 SP,s 0.50 P,AT 0.51 P,G 0.50 245 SP,d 0.50 0.51 SP,s 0.50 P,AT 0.54 P,G 0.51 RMSE 0.028 0.023 0.037 0.038 0.050 0.043 0.071 0.075 0.105 0.088 0.152 0.159 SD 0.028 0.023 0.037 0.038 0.050 0.043 0.071 0.075 0.105 0.088 0.148 0.159 SE adj SE 0.034 0.033 0.007 0.030 0.050 boot SE 0.029 0.024 0.062 0.007 0.091 0.059 0.052 0.054 0.045 0.126 0.007 0.179 0.101 0.085 0.108 0.087 rel eff 1.45 0.56 0.54 1.38 0.50 0.45 1.42 0.50 0.44 Table 5: maxAR process with θ=0.5. Estimators - SP,d: θbSP (disjoint blocks); SP,s: θbSP (sliding blocks); P,AT: θbAT ; P,G: θbG . Sampling distribution mean, root mean square error (RMSE) and standard deviation (SD), mean standard error (SE), sandwich adjusted standard error (adj SE) and boostrap standard error (boot SE) and efficiency relative to SP,d (rel eff). 9 4 Example: Newlyn sea-surges Figure 3 shows a series of 2894 measurements of sea-surge heights taken just off the coast at Newlyn, Cornwall, UK, over the period 1971–1976. The data are the maximum hourly surge heights over periods of 15 hours (see Coles (1991)). Fawcett and Walshaw (2012) used several estimators, including the parametric maxima estimator of Gomes (1993), to estimate the extremal index of the underlying process using several estimators. We use θbSP to estimate the extremal index of this series, based on series of disjoint and sliding block maxima. We also fit a GEV distribution to the block maxima in order to make inferences about extreme quantiles of the marginal distribution of sea-surge heights at Newlyn. 0.8 0.6 surge (m) 0.4 0.2 0.0 −0.2 0 500 1000 1500 2000 2500 3000 observation number Figure 3: Time series plot of the Newlyn sea-surge data. The top left plot in figure 4 shows θbSP against block size b based disjoint and sliding maxima. Also given are 95% confidence intervals, based on adjusted log-likelihoods for log θ following Chandler and Bate (2007). The other plots in figure 4 show maximum likelihood estimates for GEV distributions fitted to the disjoint and sliding maxima, with, for the disjoint maxima only, symmetric 95% confidence intervals. As the location and scale of the GEV distribution depend on b we have plotted estimates of the GEV parameters implied for a block size of 1, i.e. the marginal distribution of the data. For these data b = 20 is reasonable. Table 6 shows estimates, standard errors and 95% confidence intervals for θ using this block size. The bootstrap estimates result from the approach detailed in section 3 using 10,000 resamples. Basic confidence intervals, constructed on the log θ scale, are given in table 6 but as the bootstrap distributions of log θbSP are very close to normal these intervals are very similar to normal intervals. Bootstrap bias-adjustment results in a slightly smaller point estimate of θ. √ Following Gomes (1993), Fawcett and Walshaw (2012) used the larger block size m ≈ 54, obtaining an estimate of 0.282 with a standard error of 0.206. For this block size the θbSP compares favourably, with estimates (and adjusted standard errors) of 0.269 (0.044) using disjoint blocks and 0.245 (0.040) using sliding blocks. The bootstrap standard errors are 0.047 and 0.039 respectively. Fawcett and Walshaw (2012) also use 10 0.0 0.6 −0.1 location θ^ 0.5 0.4 −0.2 −0.3 0.3 −0.4 0.2 −0.5 0 10 20 30 40 50 60 0 10 20 block size 30 40 50 60 40 50 60 block size 0.2 0.25 0.1 shape scale 0.20 0.15 0.0 −0.1 0.10 −0.2 0.05 0 10 20 30 40 50 60 0 block size 10 20 30 block size Figure 4: Block size selection for the Newlyn data. Top left: estimates and 95% confidence intervals for θ based on disjoint maxima (solid lines) and sliding maxima (dashed lines). Other plots: estimates of marginal (b = 1) GEV parameters based on disjoint block maxima (solid lines) and sliding maxima (dashed lines). Symmetric 95% confidence intervals are also given for the disjoint maxima. the intervals estimator of Ferro and Segers (2003), based on a threshold of 0.3m selected using a mean residual life plot, obtaining 0.223 (0.050). The standard errors in table 6 suggest that, at least for these data, θbSP is competitive with the intervals estimator when both approaches are allowed to select their tuning parameter (block size for θbSP and threshold for the intervals estimator) using the observed data. naive disjoint adjusted bootstrap adjusted sliding bootstrap b θb SE(θ) 0.241 0.020 0.241 0.026 0.219 0.027 0.238 0.028 0.213 0.023 95% CI (0.204, 0.283) (0.194, 0.295) (0.179, 0.265) (0.188, 0.296) (0.180, 0.251) Table 6: Estimates, standard errors and 95% confidence intervals for the extremal index θ of the Newlyn data using (disjoint or sliding) blocks of size of 20. Naive: independence log-likelihood; adjusted: adjusted log-likelihood; bootstrap: stationary block bootstrap. Figure 5 shows estimates and 95% confidence intervals of high quantiles of the marginal distribution of sea-surge height. The sum of adjusted log-likelihood for log θ based on sliding maxima and the log-likelihood for the GEV parameters based on disjoint maxima is profiled with respect to the desired quantile. The estimates (and standard errors) of the GEV parameters µθ , σθ and ξ are 0.192 (0.012), 0.130 (0.0085) and −0.0546 (0.056) respectively. The underestimation that would result from assuming that θ = 1 is clear. 11 1.8 1.6 xp 1.4 1.2 1.0 0.8 3000 10000 20000 30000 40000 50000 60000 1/p Figure 5: Estimates and 95% confidence intervals for the 100(1 − p)% marginal quantile xp against 1/p. Solid lines: inferring θ using θbSP based on sliding maxima. Dashed lines: using θ = 1. 5 Discussion The semiparametric estimators proposed in this paper improve substantially on existing maxima methods of estimating the extremal index. For the Newlyn data the semiparametric estimators are competitive with the widely-used intervals estimator of Ferro and Segers (2003). The stationary block bootstrap produces more reliable standard errors than the information sandwich estimators. It may be that a more sophisticated implementation of these latter estimators, for example Lumley and Heagerty (1999), fairs better. The simulation study in section 3 showed that there is benefit to using sliding blocks rather than disjoint blocks. Apart from the point estimates in figure 4 we did not use sliding blocks for the GEV analysis in section 4. This could be an area of further research. Acknowledgments I thank Chris Ferro for helpful comments on Northrop (2005) and Nigel Williams and Marianna Demetriou for project work they carried out in this area. Appendix A: asymptotic efficiencies of semiparametric and parametric estimators using disjoint blocks The semiparametric estimator θbSP is based on a random sample V1 , . . . , Vn from a Beta(θ, 1) distribution. The log-likelihood for a single observation V is l(θ) = log θ + (θ − 1) log V. Thus, the asymptotic precision of θbSp is l00 (θ) = 1/θ2 . 12 The parametric estimator θbAT of Ancona-Navarrete and Tawn (2000) treats as independent two random samples, each of size n. Sample 1 is from a GEV(µ, σ, ξ) distribution and sample 2 is from a GEV(µθ , σθ , ξ) distribution. Here we take n = 1. Let I(µ, σ, ξ) denote the Fisher information matrix for a sample of size 1 from a GEV(µ, σ, ξ) distribution. This matrix can be inferred from Prescott and Walden (1980), who use a shape parameter k = −ξ. The calculation of the asymptotic precision of θbP requires that ξ > −1/2 (Smith, 1985). Let I1 and I2 denote the respective Fisher information matrices for the parameter vector (θ, µ, σ, ξ) from samples 1 and 2. I1 is given by 0 0 I1 = , 0T I(µ, σ, ξ) where 0 = (0, 0, 0). Let ψ = (µθ , σθ , ξ) and η = (θ, µ, σ, ξ) and ∆ij = ∂ψi /∂ηj . I2 is given by I2 = ∆T I(µθ , σθ , ξ)∆. The total information is I = I1 + I2 , giving 1/θ2 w I= , wT IGEV where w is a vector with non-zero entries and Im is the total Fisher information for (µ, σ, ξ). Block inversion of I gives the asymptotic precision of θbAT as −1 prec(θbAT ) = 1/θ2 − wIGEV wT > 1/θ2 = prec(θbM LE ), −1 the inequality following because IGEV is positive definite. Appendix B: estimating the variance of the score function Using the notation defined in 2 the log-likelihood for ψ = log θ is n X l(ψ) = ψ + (eψ − 1) log Vi . i=1 The score function is U = U (ψ) = n X Ui (ψ) = n X i=1 ψ 1 + e log Vi = i=1 n X {1 + θ log Vi } . i=1 b equal n. The expected information I(ψ), and the observed information J (ψ), The variance of the score is given by V = var n X ! {1 + θ log Vi } , i=1 = n X i=1 var {1 + θ log Vi } + 2 j−1 n X X j=2 i=1 13 cov {1 + θ log Vi , 1 + θ log Vj } . (5) P The first term of (5) is estimated by ni=1 (1 + θb log Vi )2 . To evaluate the covariance in the second term we ignore the possibility that either Yi < li or Yj < lj , as their contributions to the covariance are negligible. Let 1 (Si + Ti ) , log Vi ≈ b log m−b+1 1 log Vj ≈ b log (Sj + Tj ) , m−b+1 where Si = X I(Xk 6 Yi ), Sj = k∈B / i ∪Bj X Ti = X I(Xk 6 Yj ), k∈B / i ∪Bj I(Xk 6 Yi ), Tj = k∈Bj ∩Bic X I(Xk 6 Yj ). k∈Bi ∩Bjc Suppose that {Xk , k ∈ Bi } q {Xk , k ∈ Bj }, for i 6= j, that is, data from distinct blocks are independent. Consider the case where disjoint blocks are used. Then Si q Sj , Si q Tj and Sj q Ti , because in each case a block of Xs, and/or the maximum of these Xs, are compared to Xs from two other disjoint blocks. Then cov {1+θ log Vi , 1+θ log Vj } = θ2 cov(log Vi , log Vj ), 1 1 2 2 (Si +Ti )−1, (Sj +Tj )−1 , ≈ θ b cov m−b+1 m−b+1 θ 2 b2 cov(Ti , Tj ), = (m − b + 1)2 where we have used log x ≈ x − 1 for x ≈ 1. If Yi > Yj then Ti = b and Tj < b and if Yi < Yj then Ti < b and Tj = b. Thus, (Ti − b)(Tj − b) = 0 and cov(Ti , Tj ) = cov(Ti − b, Tj − b), = E [(Ti − b)(Tj − b)] − E(Ti − b)E(Tj − b), = − [E(T ) − b]2 , where E(T ) = bP (X 6 Y ) = b2 θ/(bθ + 1). As n(n − 1)/2 pairs of blocks contribute to the second term of (5) it is estimated by −n(n − 1)θb2 b4 /(m − b + 1)2 (bθb + 1)2 . For sliding blocks we note that the second term of (5) contains contributions from pairs of blocks that overlap and (n − b)(n − b + 1)/2 pairs that do not. For the latter the total contribution is estimated by −(n − b)(n − b + 1)θb2 b4 /(m − b + 1)2 (bθb + 1)2 . We P Pn−k b b estimate the total contribution of the former by 2 b−1 k=1 i=1 Ui (ψ)Ui+k (ψ). Thus, the estimators of V using disjoint and sliding blocks are respectively Vbd = n X (1 + θb log Vi )2 − i=1 n(n − 1)θb2 b4 (m − b + 1)2 (bθb + 1)2 and Vbs = n X i=1 (1 + θb log Vi )2 + 2 b−1 X n−k X k=1 i=1 b2 4 b i+k (ψ) b − (n − b)(n − b + 1)θ b . Ui (ψ)U (m − b + 1)2 (bθb + 1)2 In practice the contribution to the score function from the largest block maximum Y(n) is non-random, because log V(n) = b log[(m − b)/(m − b + 1)]. We adjust for this by removing from Vbd and Vbs contributions from V(n) . 14 Appendix C: Derivation of θb for some special cases With unit Fréchet margins ub = b/ log 2. The maxAR process. Following (Beirlant et al., 2004, chapter 10) G(x) = P (X1 6 x, . . . , Xb 6 x), = P (X1 6 x, θZ2 6 x, . . . , θZb 6 x), = exp {− [1 + θ(b − 1)] /x} . Therefore, θb = − log G(ub )/ log 2 = θ + (1 − θ)/b. The moving maxima process. Let αi+ = max(α0 , . . . , αi ) and αi− = max(αp , . . . , αi ). Then, for b > p, G(x) = P (X1 6 x, . . . , Xb 6 x), x x x x x x = P Z1 6 + , . . . , Zp 6 + , Zp+1 6 + , . . . , Zb 6 + , Zb+1 6 − , . . . , Zp+b 6 − , αp αp αp α0 αp−1 α1 ! , ) ( p p−1 X X + + αi− x . αi + (b − p)αp + = exp − i=1 i=0 Therefore, θb 1 = αp+ + b p−1 X αi+ + i=0 p X ! αi− − pαp+ , i=1 = θ + c/b. where 1 − θ 6 c 6 pθ. The lower bound is achieved if j = argmaxi {αi } ∈ / {0, p} and αi = (1 − θ)/p for i 6= j, or if {αi } are monotonic in i, and the upper bound when α0 = αp = θ. References Ancona-Navarrete, M. A. and J. A. Tawn (2000). A comparison of methods for estimating the extremal index. Extremes 3 (1), 5–38. Beirlant, J., Y. Goegebeur, J. Segers, and J. Teugels (2004). Statistics of Extremes: Theory and Applications. Chichester, England: Wiley. Canty, A. and B. D. Ripley (2010). boot: Bootstrap R (S-Plus) functions. R package version 1.3-4. Chandler, R. E. and S. B. Bate (2007). Inference for clustered data using the independence loglikelihood. Biometrika 94 (1), 167–183. Chavez-Demoulin, V. and A. C. Davison (2012). Modelling time series extremes. REVSTAT - Statistical Journal 10 (1), 109–133. Coles, S. G. (1991). Modelling extreme multivariate events. Ph. D. thesis, University of Sheffield, Sheffield, UK. 15 Dabrowska, D. M., K. A. Doksum, and R. Miura (1989). Rank estimates in a class of semiparametric two-sample models. Ann. Inst. Statist. Math. 41 (1), 63–79. Davis, R. and S. Resnick (1989). Basic properties and prediction of max-ARMA processes. Adv. Appl. Prob. 21, 781–803. Deheuvels, P. (1983). Point processes and multivariate extreme values. J. Multiv. Anal. 13, 257–272. Fawcett, L. and D. Walshaw (2012). Estimating return levels from serially dependent extremes. Environmetrics 23, 272–283. Ferro, C. and J. Segers (2003). Inference for clusters of extreme values. J. R. Statist. Soc. B 65, 545–556. Gomes, M. (1993). On the estimation of parameters of rare events in environmental time series. In V. Barnett and K. Turkman (Eds.), Statistics for the environment 2: Water Related Issues, pp. 225–241. Wiley. Guenther, W. (1967). A best statistic with variance not equal to the Cramér-Rao lower bound. American Mathematical Monthly 74, 993–994. Hayfield, T. and J. S. Racine (2008). Nonparametric econometrics: The np package. Journal of Statistical Software 27 (5). Johnson, N. L. and S. Kotz (1970). Continuous Univariate Distributions Volume 2. New York: Wiley. Laurini, F. and J. Tawn (2003). New estimators for the extremal index and other cluster characteristics. Extremes 6, 189–211. Leadbetter, M., G. Lindgren, and H. Rootzén (1983). Extremes and related properties of random sequences and series. New York: Springer Verlag. Ledford, A. W. and J. Tawn (2003). Diagnostics for dependence within time series extremes. J. R. Statist. Soc. B 65 (2), 521–543. Lumley, T. and P. Heagerty (1999). Weighted empirical adaptive variance estimators for correlated data regression. J. R. Statist. Soc. B. 61, 459–477. Northrop, P. J. (2005). Semiparametric estimation of the extremal index using block maxima. Technical Report 259, University College London. Patton, A., D. Politis, and H. White (2009). Correction to “Automatic block-length selection for the dependent bootstrap” by D. Politis and H. White. Econometric Reviews 28 (4), 372–375. Politis, D. and J. Romano (1994). The stationary bootstrap. J. Am. Statist. Ass. 89, 1303–1313. Prescott, P. and A. Walden (1980). Maximum likelihood estimation of the parameters of the generalized extreme value distribution. Biometrika 67 (3), 723–724. Robert, C. (2009). Inference for the limiting cluster size distribution of extreme values. Ann. Statist. 37 (1), 271–310. 16 Robert, C., J. Segers, and C. A. T. Ferro (2009). A sliding blocks estimator for the extremal index. Electronic Journal of Statistics 3, 993–1020. Smith, R. L. (1985). Maximum likelihood estimation in a class of non-regular cases. Biometrika 72, 67–92. Smith, R. L. (1992). The extremal index for a Markov Chain. J. App. Prob. 29 (1), 37–45. Smith, R. L. and I. Weissman (1994). Estimating the extremal index. J. R. Statist. Soc. B 56, 515–528. Süveges, M. (2007). Likelihood estimation of the extremal index. Extremes 10, 41–55. Süveges, M. and A. C. Davison (2010). Model misspecification in peaks over threshold analysis. Ann. Appl. Statist. 4, 203–221. Tawn, J. A. (1988). Bivariate extreme value theory: Biometrika 75, 397–415. models and estimation. White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrika 50, 1–25. 17