Please cite: Northrop, P. J. (2015) An efficient semiparametric maxima... 10.1007/s10687-015-0221-5..

advertisement
Please cite: Northrop, P. J. (2015) An efficient semiparametric maxima estimator of
the extremal index. Extremes, 18(4), 585-603, doi: 10.1007/s10687-015-0221-5..
Semiparametric estimation of the extremal index
using block maxima∗
Paul J. Northrop
Department of Statistical Science
University College London, Gower Street, London WC1E 6BT, UK
p.northrop@ucl.ac.uk
September 16, 2012
Abstract
The extremal index θ, a measure of the degree of local dependence in the extremes
of a stationary process, plays an important role in extreme value analyses. We estimate
θ semiparametrically, using the relationship between the distribution of block maxima
and the marginal distribution of a process to define a semiparametric model. We show
that these semiparametric estimators are simpler and substantially more efficient than
their parametric counterparts. We seek to improve efficiency further using maxima over
sliding blocks. An application to sea-surge heights combines inferences about θ with a
standard extreme value analysis of block maxima to estimate marginal quantiles.
Keywords: Extremal index; semiparametric estimation; extreme value theory; block
maxima; sea-surge heights.
1
Introduction
The modelling of rare events in stationary processes is important in many application
areas. The extremal behaviour of such a process is governed by its marginal distribution
and by its extremal dependence structure. Chavez-Demoulin and Davison (2012) provide
a review of this area, concentrating on the latter aspect. For processes satisfying the
D(un ) condition of Leadbetter et al. (1983), which limits long-range dependence at
extreme levels, the extreme value index θ (0 6 θ 6 1) is the primary measure of shortrange extremal dependence.
Following Leadbetter et al. (1983), let X1 , X2 , . . . be a strictly stationary sequence of
random variables that satisfies the D(un ) condition and has marginal distribution function F . Let Mb = max(X1 , . . . , Xb ). For large b and ub the distribution function G of
the block maximum Mb is approximately related to F via
G(ub ) = P (Mb 6 ub ) ≈ F bθ (ub ).
∗
(1)
Research Report number 318, Department of Statistical Science, University College London. Date:
September 2012
1
Further, if there exist normalizing constants cb and db such that
F b (cb x + db ) → G(x),
as b → ∞,
then G(x) is the distribution function of a GEV distribution. The corresponding result
for Mb∗ = max(X1∗ , . . . , Xn∗ ), where X1∗ , X2∗ , . . . are independent variables with distribution function F , gives the limiting distribution function H(x) = G(x)1/θ . Thus, the
limiting distributions of Mb and Mb∗ are GEV, with respective location, scale and shape
parameters (µθ , σθ , ξ) and (µ, σ, ξ), say, related by
µθ = µ + σ θξ − 1 /ξ, σθ = σθξ .
(2)
For finite b a block size dependent index θb applies, with θb → θ as b → ∞. Smith (1992)
defines θb as θb = − log G(ub )/ log 2, where F b (ub ) = 1/2. We will only make explicit
the dependence of θ on b when necessary.
Two types of analysis where the estimation of θ is vital are: block maxima are modelled
and estimates of high marginal quantiles of X are required (type 1); exceedances of a high
threshold are modelled and estimates of return levels (high quantiles of the distribution
of annual maxima) are required (type 2). Ignoring θ may lead to underestimation of
quantiles in analyses of type 1 and overestimation of return levels (and misinterpretation
of their meaning) in analyses of type 2 (Beirlant et al., 2004; Chavez-Demoulin and
Davison, 2012).
Recent advances in the estimation of θ (Ferro and Segers, 2003; Süveges, 2007; Süveges
and Davison, 2010; Laurini and Tawn, 2003) have concentrated on threshold methods,
based on exceedences a threshold. Maxima methods (Gomes, 1993; Ancona-Navarrete
and Tawn, 2000), described in section 1.1, are also used and Fawcett and Walshaw
(2012) advocate the use of either the intervals estimator of Ferro and Segers (2003) or
the maxima estimator of Gomes (1993) for analyses of type 2. In analyses of type 1 it
seems natural to use a maxima method to estimate θ, but it seems that less attention
has been devoted to improving existing maxima methods. We propose a new maxima
method that is simpler and has much greater statistical efficiency that existing maxima
methods.
1.1
Maxima methods
Parametric maxima methods are based on fitting GEV distributions to two sets of maxima of b consecutive observations. The first sample Mi , i = 1, . . . , n is block maxima of
the original series. The second sample Mi∗ , i = 1, . . . , n is block maxima of the series
obtained by randomising the index of the original series, to obtain approximately an independent series with the same marginal distribution as the original sequence. Based on
(2), Gomes (1993) fit a GEV(µθ , σθ , ξ) distribution to {M } and a GEV(µ, σ, ξ) distribution to {M ∗ } and construct
the estimator θbG = (b
σ /b
σθ )−1/ξ̃ , where ξ˜ = (b
σ−σ
bθ )/(b
µ−µ
bθ ).
√
A block size of b = m is suggested. Ancona-Navarrete and Tawn (2000) combine the
two GEV fits into one by maximising a likelihood (with respect to (µ, σ, ξ, θ)), assuming that ({Mθ }, {M }) are independent. We call the resulting estimator θbAT . In one
sense parametric maxima methods are anomalous: other methods of estimating θ do so
directly, without embedding θ in a larger model with nuisance parameters. Northrop
(2005) proposes a semiparametric maxima estimator. The relationship G = H θ is used
but no particular parametric form is assumed for G or H.
2
An undesirable feature existing maxima methods is the need to resample the original
data to produce a sample of block maxima with approximate c.d.f. H. In section
2 we show this is unnecessary: more efficient estimators of θ can be constructed by
comparing G directly to F , without generating pseudo-samples from H. The theoretical
gain in efficiency is quantified in section 2.3 and and in section 2.1 arguments in favour
of the semiparametric estimators are given. In section 2.2 we show that one of these
estimators is approximately an extended version of the blocks estimator of Robert (2009).
In sections 3 and 4 we carry out a simulation study and an extreme value analysis of
type 1 on sea-surge data.
2
Semiparametric estimators of θ
Let X1 , . . . , Xm denote strictly stationary sequence of random variables with marginal
distribution function F and extremal index θ. Let M [s, t] = maxs<k6t Xk . Consider
Y = M [s, s + b], the maximum of a block of b consecutive Xs. Let V = F (Y )b . For the
moment we assume that F is known. Under (1), V has a Beta(θ, 1) distribution, with
distribution function v θ , 0 < v < 1.
We construct two sets of block maxima: Yid = M [(i − 1)b, ib], i = 1, . . . , bm/bc (disjoint
blocks) and Yis = M [i − 1, i + b − 1], i = 1, . . . , m − b + 1 (sliding blocks). We treat
{Vid } = {F (Yid )b } as a random sample, and {Vis } = {F (Yis )b } as a dependent sample,
from a Beta(θ, 1) distribution. We expect (as in Robert et al. (2009)) that {Yis } contains
more information about θ than {Yid } and thus produces a more efficient estimator of θ.
The maximum likelihood estimator of θ based on a randomPsample V1 , . . . , Vn from
a Beta(θ, 1) distribution is θbSP = −1/log V , where log V = ni=1 log Vi /n. Guenther
(1967) shows that the minimum variance unbiased estimator of θ is θbSP 2 = (n − 1) θbSP /n
with var(θbSP 2 )=θ2 (n − 2)−1 , so that var(θbSP )=n2 θ2 (n − 2)−1 (n − 1)−2 . Thus, ψ = log θ
is an approximate variance-stabilizing transformation. The moment estimator θbSP 3 =
V /(1 − V ), has low efficiency for θ 6 1 (Johnson and Kotz, 1970, chapter 24).
We use X1 , . . . , Xm to estimate F so that θbSP is a Rank Approximate M-estimate
(Dabrowska et al., 1989). We make an adjustment for fact that, for example, the values
X1 , . . . , Xb cannot exceed Y1 = M [0, b]. Let Bi be the set of indices of the Xs that contribute to block maximum Yi . Let li = mink∈B
/ i Xk and ui = maxk∈B
/ i Xk . For li 6 y 6 ui
we let
X
1
I(Xk 6 y)
Fb−i (y) =
m−b+1
k∈B
/ i
and use it to define Vbi = Fb−i (Yi )b . Thus, the Xs that contribute to block maximum Yi
are not used in the estimate of F applied to Yi . Following Dabrowska et al. (1989) we
set Fb−i (y) to 1/(m − b + n + 1) if y < li , although in practice Yi < li is unlikely. By
construction Yi > ui is not possible.
Let R = (R1 , . . . , Rn ) be the ranks of the block maxima Y = (Y1 , . . . , Yn ) within
X1 , . . . , Xm , so that if Ri = ri then ri − 1 of {Xk , k ∈
/ Bi } are larger than Yi and
b
m − b + 1 − ri are smaller than Yi . Thus, Vi = (1 − ri /(m − b + 1))b . The ranks convey no
information about the marginal distribution of Y . Therefore, assuming a GEV(µθ , σθ , ξ)
model for the block maxima, the joint likelihood factorises as
L(θ, µθ , σθ , ξ; R, Y ) = LBeta (θb ; R)LGEV (µθ , σθ , ξ; Y ),
3
(3)
so that independent inferences
can be made about θ and (µθ , σθ , ξ). In (3) the rank likeQ
lihood LBeta (θ; R) = θn ( ni=1 vi )θ−1 is constructed under the assumption that V1 , . . . , Vn
are independent. This is far from the case if sliding blocks are used: successive values of
V will be strongly positively associated, and pairs (Vi , Vj ) from disjoint blocks are negatively associated (see appendix B). We attempt to adjust for this using an information
b −1 Vb (ψ)J
b (ψ)
b −1 for the asymptotic variance of
sandwich estimator (White, 1982) J (ψ)
b where the observed information J (ψ) = n and Vb (ψ) is an estimate of the variance
ψ,
of the score function. Details of the estimation of V (ψ) are given in appendix B.
2.1
Desirable properties of these estimators
The estimator θbSP is simple and non-iterative. Sections 2.3 and 3 show that it is more
efficient than its parametric counterparts. The extremal index measures local dependence in extremes and is independent of the marginal distribution of the process. Since
θbSP is determined by the ordering of the ranks of the raw data it is invariant to marginal
transformation, whereas the parametric alternatives are not. In common with threshold
methods (for example, Süveges and Davison (2010)), θ is estimated directly, rather than
as part of a larger extreme value model, that is, estimation of extremal dependence and
marginal behaviour are separated. It may be that the assumption (G = F bθb ) underlying
θbSP is reasonable for smaller block sizes than the parametric GEV assumptions. Section
3 gives examples where the rate of convergence of θb to θ as b → ∞ is O(1/b). Thus,
convergence to G = F bθ could be relatively fast even if convergence to the limiting GEV
form, which depends on the marginal distribution, is slow. The semiparametric framework permits the use of a relatively small block size, whereas the parametric alternatives
do not.
2.2
Link to the blocks estimator
Blocks estimators of θ require the choice of a threshold u and a block length b. We
show that θbSP is asymptotically equivalent to using the blocks estimator of Smith and
Weissman (1994) (sometimes called the logs estimator) with a set of random thresholds.
b
The Smith and Weissman (1994) blocks estimator is θbB = log G(u)/b
log Fb(u). Robert
(2009) extends this idea by using a random threshold set at a particular sample quantile.
Suppose that we use a set of random thresholds ui = Yi , where Yi , i = 1, . . . , n are a
sample of block maxima over blocks of length b. We may think of this as using the data
b i )/b log Fb(Yi ) is an estimator
to define a set of local thresholds. Each of the ratios log G(Y
of θ. We combine these using the ratio estimator
n
θbRB =
1X
b i)
log G(Y
n i=1
n
1X
log Fb(Yi )
b
n i=1
n
=
1X
b i)
log G(Y
n i=1
n
1X
log Vi
n i=1
.
(4)
b (i) ) = i/n and, by Stirling’s formula,
If disjoint blocks are used to construct {Yi } then G(Y
as n → ∞ the numerator of (4) ↓ −1. Therefore, for sufficiently large n, θbSP ≈ θbRB .
4
2.3
Theoretical comparison with parametric maxima method
The calculations in appendix A show that, subject to the assumptions made above, θbSP
has a smaller asymptotic variance than θbAT . The efficiency of θbAT relative to θbSP depends
on θ (see figure 1) but θbAT is at best 50% efficient (when θ = 1). This is expected because
the resampling by the parametric estimator introduces an extra source of variability.
% relative efficiency of parametric estimator
50
40
30
20
10
ξ = − 0.4
ξ=0
ξ = 0.4
0.0
0.2
0.4
0.6
0.8
1.0
θ
Figure 1: Asymptotic relative efficiency of the parametric maximum likelihood estimator of θ compared to the semiparametric maximum likelihood estimator, for a range of
values of GEV shape parameter ξ.
3
Simulation study
We conduct a simulation study to compare the performances of the semiparametric
estimators to the parametric estimators θbAT and θbG for different processes, marginal
distributions and blocks sizes. In the following Zi , i = 1, 2, . . . are independent unit
Fréchet random variables with P (Z 6 z) = exp(−1/z), for z > 0.
A moving maxima process X1 , X2 , . . . (Deheuvels, 1983) is defined by
Xi = max {αj Zi+j },
j=0,...,p
i = 1, 2, . . . ,
where {αj } are constants satisfying α0 > 0, αp > 0 and αj > 0, for j = 1, . . . , p − 1, with
P
p
j=0 αi = 1. For simplicity here we assume that p is finite. This process has extremal
index θ = maxi=0,...,p (αi ).
A first-order Markov chain X1 , X2 , . . . with symmetric logistic extreme value transition
distribution (Tawn, 1988) and unit Fréchet margins is implied by
β −1/β
−1/β
,
P (Xi 6 x1 , Xi+1 6 xi+1 ) = exp − xi
+ xi+1
where β ∈ (0, 1]. The extremal index can be estimated by simulation (Smith, 1992).
5
A max-autoregressive (maxAR) process X1 , X2 , . . . (Davis and Resnick, 1989) is defined
by
Xi = max{(1 − θ)Xi−1 , θZi },
i = 1, 2, . . . ,
where 0 < θ 6 1 and X0 has a unit Fréchet distribution. The extremal index of this
process is θ. The maxAR process is an infinite-order moving maxima process with
αj = θ(1 − θ)j , j = 0, 1, . . .. It is also a first-order Markov chain with a particular
bivariate extreme value transition distribution (Ledford and Tawn, 2003).
A first-order Gaussian autoregressive (AR(1)) process is defined by
Xi = αXi−1 + i ,
i = 1, 2, . . . ,
where 1 , 2 , . . . are i.i.d. N(0, 1 − α2 ) random variables, X0 ∼ N (0, 1) and we assume
|α| < 1 for second-order stationarity. This process exhibits serial dependence but limiting extremal independence because θ = 1 (Leadbetter et al., 1983, chapter 4).
For the maxAR process θb = θ+(1−θ)/b and for the moving maxima process θb = θ+c/b,
where 1 − θ 6 c 6 pθ (see appendix C). Where analytical results are not possible
simulation can be used to estimate θb but Ancona-Navarrete and Tawn (2000) note that
even very long simulations may produce unacceptably imprecise estimates. We have
found this to be the case for the AR(1) process. For the logistic Markov chain Smith
(1992) used simulation to estimate θb , finding that even longer simulations would be
required for estimates to respect the conjectured monotonicity of θb as b increases.
For a given process we simulate 500 sequences of length m = 4, 900 and estimate θb
using the semiparametric estimators (disjoint and sliding blocks) and θbAT and θbG for
b = 20, 70, 245 (n = 245, 70, 20). To examine the impact of marginal distribution we
apply θbAT and θbG after transformation to Gumbel, Gaussian and exponential margins.
In fact the marginal distribution has a relatively small effect on the overall performance
of θbAT and θbG so we present results only for Gumbel margins. However, figure 2 compares
the estimates produced by θbAT for a moving maxima process with Gaussian and Gumbel
margins. For small b = 20 (n = 245) the agreement is very close but for b = 245 (n = 20)
there is large disagreement for some datasets. We present results (tables 1-5) for the
b=20
0.32
0.30
0.28
0.26
0.24
●
0.6
●
0.24
0.26
0.28
0.30
0.32
0.34
●
●
●
0.3
●
●
●
●
●
●
● ●
●●
●
●●●
●
●● ●
● ●
● ●●
●
●
●●
●
●●
●●
●
● ●●●
●
●
●
●
●
●
●● ●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
● ●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●● ●
●●
●
●
●●
●●
● ●
●
●●
●
● ●
●
● ●
●
●
●●
●
●
●
●
●
0.2
●
●
●
●
●
●
0.1
θ^b (Gumbel margins)
●
●
0.4
0.1
●
●
0.5
0.2
●
0.22
●
●
●
0.22
●
●
θ^b (Gaussian margins)
0.34
θ^b (Gaussian margins)
b=245
●
●● ●
●
●
●●
●●
●●
●● ●
● ●● ●
●
●
●●
●●
●
●●
●
●●●●
●
●
●
●
●
●
●●●
●● ●
●
●
●
●●●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●● ●
●
●●●
●
●●
●●
●
●●
●
●
●
●●●
●
●●
●●
●●●
●
●
●
●
●
●●●●
●● ●
●
●
●
●
●
●
●●●
●●
●
●
●●
●●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
● ●●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●●●
●
●●● ●
●●● ●●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
● ●●
●●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●
●●
●
● ●
●
●
●●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
● ●●
●●
●
●●
●
●●●
●●
●
●● ●
●
0.3
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
0.4
0.5
θ^b (Gumbel margins)
Figure 2: Estimates θbAT based on data simulated from a moving maxima process with
α = (1/4, 1/4, 1/4, 1/4): Gaussian margins against Gumbel margins. Left: b = 20.
Right: b = 245.
same processes for which results are presented in Ancona-Navarrete and Tawn (2000),
although we have checked that the findings apply across other levels of dependence. We
6
b
20
70
245
estimator
SP,d
SP,s
P,AT
P,G
SP,d
SP,s
P,AT
P,G
SP,d
SP,s
P,AT
P,G
θb mean
1 1.00
1.00
1.01
1.00
1 0.99
0.99
1.01
1.01
1 0.98
0.96
1.04
1.04
RMSE
0.045
0.031
0.068
0.060
0.081
0.064
0.116
0.108
0.149
0.118
0.236
0.234
SD
0.045
0.031
0.068
0.060
0.080
0.062
0.116
0.108
0.148
0.111
0.233
0.231
SE
adj SE
0.064 0.063
0.014 0.055
0.091
boot SE
0.044
0.033
0.122
0.014
0.171
0.113
0.098
0.083
0.062
0.242
0.014
0.333
0.186
0.158
0.155
0.118
rel eff
2.06
0.43
0.55
1.64
0.48
0.55
1.77
0.40
0.41
Table 1: Independent data; θ = 1. Estimators - SP,d: θbSP (disjoint blocks); SP,s: θbSP
(sliding blocks); P,AT: θbAT ; P,G: θbG . Sampling distribution mean, root mean square error
(RMSE) and standard deviation (SD), mean standard error (SE), sandwich adjusted
standard error (adj SE) and boostrap standard error (boot SE) and efficiency relative
to SP,d (rel eff).
present results only for θbSP . The performance of θbSP 2 is very similar and, as expected,
both these estimator out-perform θbSP 3 .
We calculate naive standard errors (for θbSP using section 2 and for θbAT using appendix
A). For θbSP we also calculate standard errors based on a sandwich estimator (appendix
B) and on 100 stationary block bootstrap resamples (Politis and Romano, 1994) with
optimal block length chosen using Patton et al. (2009), implemented using Canty and
Ripley (2010) and Hayfield and Racine (2008).
All estimators are approximately unbiased for θb (where this is known). When using
disjoint blocks the semiparametric estimator outperforms the parametric estimators approximately by the amounts suggested by figure 1. The use of sliding maxima improves
this further. Comparison of the mean standard errors (SE) with standard deviations
(SD) shows that the (non-bootstrap) standard errors tend to be a little too large, that
is, all the estimators perform better than expected. The exception is the logistic Markov
chain for which the standard errors are approximately the correct size for b = 20 and
b = 70 and there is slight underestimation for b = 245. In general the bootstrap standard errors are more reliable. As expected, for the estimators based on sliding blocks
the naive standard errors are far too small.
7
b
20
70
245
estimator
SP,d
SP,s
P,AT
P,G
SP,d
SP,s
P,AT
P,G
SP,d
SP,s
P,AT
P,G
mean
0.70
0.70
0.72
0.75
0.79
0.78
0.81
0.83
0.86
0.85
0.91
0.93
SD
0.039
0.030
0.047
0.045
0.072
0.057
0.097
0.094
0.147
0.118
0.218
0.223
SE
adj SE
0.045 0.049
0.010 0.044
0.066
boot SE
0.043
0.032
0.097
0.011
0.138
0.099
0.084
0.083
0.062
0.213
0.013
0.292
0.174
0.144
0.154
0.118
rel eff
1.67
0.70
0.76
1.61
0.56
0.60
1.57
0.46
0.44
Table 2: AR(1) process with α=0.5; θ = 1. Estimators - SP,d: θbSP (disjoint blocks);
SP,s: θbSP (sliding blocks); P,AT: θbAT ; P,G: θbG . Sampling distribution mean and standard
deviation (SD), mean standard error (SE), sandwich adjusted standard error (adj SE)
and boostrap standard error (boot SE) and efficiency relative to SP,d (rel eff).
b
20
estimator θb
mean
SP,d
0.38 0.39
SP,s
0.39
P,AT
0.39
P,G
0.38
70
SP,d
0.34 0.35
SP,s
0.35
P,AT
0.36
P,G
0.34
245
SP,d
0.33 0.35
SP,s
0.35
P,AT
0.37
P,G
0.34
RMSE
0.024
0.023
0.033
0.038
0.047
0.044
0.067
0.074
0.085
0.078
0.134
0.148
SD
0.023
0.021
0.033
0.038
0.044
0.041
0.064
0.074
0.083
0.076
0.127
0.148
SE
adj SE
0.025 0.024
0.006 0.023
0.038
boot SE
0.025
0.021
0.044
0.005
0.067
0.041
0.037
0.046
0.040
0.087
0.005
0.133
0.071
0.061
0.091
0.078
rel eff
1.16
0.48
0.35
1.17
0.48
0.36
1.19
0.43
0.31
Table 3: logistic Markov chain with β=0.5 and θ ≈0.328. Estimators - SP,d: θbSP
(disjoint blocks); SP,s: θbSP (sliding blocks); P,AT: θbAT ; P,G: θbG . Sampling distribution
mean, root mean square error (RMSE) and standard deviation (SD), mean standard
error (SE), sandwich adjusted standard error (adj SE) and boostrap standard error
(boot SE) and efficiency relative to SP,d (rel eff).
8
b
20
estimator θb
mean
SP,d
0.34 0.35
SP,s
0.35
P,AT
0.35
P,G
0.34
70
SP,d
0.31 0.31
SP,s
0.31
P,AT
0.32
P,G
0.30
245
SP,d
0.30 0.30
SP,s
0.30
P,AT
0.32
P,G
0.28
RMSE
0.015
0.011
0.026
0.026
0.027
0.020
0.042
0.046
0.047
0.037
0.083
0.088
SD
0.015
0.011
0.026
0.026
0.027
0.020
0.042
0.045
0.047
0.037
0.082
0.084
SE
adj SE
0.022 0.022
0.005 0.020
0.035
boot SE
0.018
0.013
0.038
0.004
0.060
0.037
0.032
0.032
0.024
0.076
0.004
0.116
0.061
0.052
0.063
0.049
rel eff
1.82
0.34
0.35
1.83
0.43
0.37
1.62
0.33
0.32
Table 4: Moving maxima process with α=(0.3, 0.2, 0.3, 0.3); θ = 0.3. Estimators SP,d: θbSP (disjoint blocks); SP,s: θbSP (sliding blocks); P,AT: θbAT ; P,G: θbG . Sampling
distribution mean, root mean square error (RMSE) and standard deviation (SD), mean
standard error (SE), sandwich adjusted standard error (adj SE) and boostrap standard
error (boot SE) and efficiency relative to SP,d (rel eff).
b
20
estimator θb
mean
SP,d
0.52 0.53
SP,s
0.52
P,AT
0.53
P,G
0.52
70
SP,d
0.51 0.51
SP,s
0.50
P,AT
0.51
P,G
0.50
245
SP,d
0.50 0.51
SP,s
0.50
P,AT
0.54
P,G
0.51
RMSE
0.028
0.023
0.037
0.038
0.050
0.043
0.071
0.075
0.105
0.088
0.152
0.159
SD
0.028
0.023
0.037
0.038
0.050
0.043
0.071
0.075
0.105
0.088
0.148
0.159
SE
adj SE
0.034 0.033
0.007 0.030
0.050
boot SE
0.029
0.024
0.062
0.007
0.091
0.059
0.052
0.054
0.045
0.126
0.007
0.179
0.101
0.085
0.108
0.087
rel eff
1.45
0.56
0.54
1.38
0.50
0.45
1.42
0.50
0.44
Table 5: maxAR process with θ=0.5. Estimators - SP,d: θbSP (disjoint blocks); SP,s: θbSP
(sliding blocks); P,AT: θbAT ; P,G: θbG . Sampling distribution mean, root mean square error
(RMSE) and standard deviation (SD), mean standard error (SE), sandwich adjusted
standard error (adj SE) and boostrap standard error (boot SE) and efficiency relative
to SP,d (rel eff).
9
4
Example: Newlyn sea-surges
Figure 3 shows a series of 2894 measurements of sea-surge heights taken just off the coast
at Newlyn, Cornwall, UK, over the period 1971–1976. The data are the maximum hourly
surge heights over periods of 15 hours (see Coles (1991)). Fawcett and Walshaw (2012)
used several estimators, including the parametric maxima estimator of Gomes (1993), to
estimate the extremal index of the underlying process using several estimators. We use
θbSP to estimate the extremal index of this series, based on series of disjoint and sliding
block maxima. We also fit a GEV distribution to the block maxima in order to make
inferences about extreme quantiles of the marginal distribution of sea-surge heights at
Newlyn.
0.8
0.6
surge (m)
0.4
0.2
0.0
−0.2
0
500
1000
1500
2000
2500
3000
observation number
Figure 3: Time series plot of the Newlyn sea-surge data.
The top left plot in figure 4 shows θbSP against block size b based disjoint and sliding
maxima. Also given are 95% confidence intervals, based on adjusted log-likelihoods for
log θ following Chandler and Bate (2007). The other plots in figure 4 show maximum
likelihood estimates for GEV distributions fitted to the disjoint and sliding maxima,
with, for the disjoint maxima only, symmetric 95% confidence intervals. As the location
and scale of the GEV distribution depend on b we have plotted estimates of the GEV
parameters implied for a block size of 1, i.e. the marginal distribution of the data.
For these data b = 20 is reasonable. Table 6 shows estimates, standard errors and 95%
confidence intervals for θ using this block size. The bootstrap estimates result from
the approach detailed in section 3 using 10,000 resamples. Basic confidence intervals,
constructed on the log θ scale, are given in table 6 but as the bootstrap distributions
of log θbSP are very close to normal these intervals are very similar to normal intervals.
Bootstrap bias-adjustment results in a slightly smaller point estimate of θ.
√
Following Gomes (1993), Fawcett and Walshaw (2012) used the larger block size m ≈
54, obtaining an estimate of 0.282 with a standard error of 0.206. For this block size
the θbSP compares favourably, with estimates (and adjusted standard errors) of 0.269
(0.044) using disjoint blocks and 0.245 (0.040) using sliding blocks. The bootstrap
standard errors are 0.047 and 0.039 respectively. Fawcett and Walshaw (2012) also use
10
0.0
0.6
−0.1
location
θ^
0.5
0.4
−0.2
−0.3
0.3
−0.4
0.2
−0.5
0
10
20
30
40
50
60
0
10
20
block size
30
40
50
60
40
50
60
block size
0.2
0.25
0.1
shape
scale
0.20
0.15
0.0
−0.1
0.10
−0.2
0.05
0
10
20
30
40
50
60
0
block size
10
20
30
block size
Figure 4: Block size selection for the Newlyn data. Top left: estimates and 95%
confidence intervals for θ based on disjoint maxima (solid lines) and sliding maxima
(dashed lines). Other plots: estimates of marginal (b = 1) GEV parameters based on
disjoint block maxima (solid lines) and sliding maxima (dashed lines). Symmetric 95%
confidence intervals are also given for the disjoint maxima.
the intervals estimator of Ferro and Segers (2003), based on a threshold of 0.3m selected
using a mean residual life plot, obtaining 0.223 (0.050). The standard errors in table
6 suggest that, at least for these data, θbSP is competitive with the intervals estimator
when both approaches are allowed to select their tuning parameter (block size for θbSP
and threshold for the intervals estimator) using the observed data.
naive
disjoint adjusted
bootstrap
adjusted
sliding
bootstrap
b
θb
SE(θ)
0.241 0.020
0.241 0.026
0.219 0.027
0.238 0.028
0.213 0.023
95% CI
(0.204, 0.283)
(0.194, 0.295)
(0.179, 0.265)
(0.188, 0.296)
(0.180, 0.251)
Table 6: Estimates, standard errors and 95% confidence intervals for the extremal index
θ of the Newlyn data using (disjoint or sliding) blocks of size of 20. Naive: independence
log-likelihood; adjusted: adjusted log-likelihood; bootstrap: stationary block bootstrap.
Figure 5 shows estimates and 95% confidence intervals of high quantiles of the marginal
distribution of sea-surge height. The sum of adjusted log-likelihood for log θ based on
sliding maxima and the log-likelihood for the GEV parameters based on disjoint maxima
is profiled with respect to the desired quantile. The estimates (and standard errors) of
the GEV parameters µθ , σθ and ξ are 0.192 (0.012), 0.130 (0.0085) and −0.0546 (0.056)
respectively. The underestimation that would result from assuming that θ = 1 is clear.
11
1.8
1.6
xp
1.4
1.2
1.0
0.8
3000
10000
20000
30000
40000
50000
60000
1/p
Figure 5: Estimates and 95% confidence intervals for the 100(1 − p)% marginal quantile
xp against 1/p. Solid lines: inferring θ using θbSP based on sliding maxima. Dashed lines:
using θ = 1.
5
Discussion
The semiparametric estimators proposed in this paper improve substantially on existing
maxima methods of estimating the extremal index. For the Newlyn data the semiparametric estimators are competitive with the widely-used intervals estimator of Ferro and
Segers (2003). The stationary block bootstrap produces more reliable standard errors
than the information sandwich estimators. It may be that a more sophisticated implementation of these latter estimators, for example Lumley and Heagerty (1999), fairs
better. The simulation study in section 3 showed that there is benefit to using sliding
blocks rather than disjoint blocks. Apart from the point estimates in figure 4 we did not
use sliding blocks for the GEV analysis in section 4. This could be an area of further
research.
Acknowledgments
I thank Chris Ferro for helpful comments on Northrop (2005) and Nigel Williams and
Marianna Demetriou for project work they carried out in this area.
Appendix A: asymptotic efficiencies of semiparametric and parametric estimators using disjoint blocks
The semiparametric estimator θbSP is based on a random sample V1 , . . . , Vn from a
Beta(θ, 1) distribution. The log-likelihood for a single observation V is l(θ) = log θ +
(θ − 1) log V. Thus, the asymptotic precision of θbSp is l00 (θ) = 1/θ2 .
12
The parametric estimator θbAT of Ancona-Navarrete and Tawn (2000) treats as independent two random samples, each of size n. Sample 1 is from a GEV(µ, σ, ξ) distribution
and sample 2 is from a GEV(µθ , σθ , ξ) distribution. Here we take n = 1. Let I(µ, σ, ξ)
denote the Fisher information matrix for a sample of size 1 from a GEV(µ, σ, ξ) distribution. This matrix can be inferred from Prescott and Walden (1980), who use a
shape parameter k = −ξ. The calculation of the asymptotic precision of θbP requires
that ξ > −1/2 (Smith, 1985).
Let I1 and I2 denote the respective Fisher information matrices for the parameter vector
(θ, µ, σ, ξ) from samples 1 and 2. I1 is given by
0
0
I1 =
,
0T I(µ, σ, ξ)
where 0 = (0, 0, 0). Let ψ = (µθ , σθ , ξ) and η = (θ, µ, σ, ξ) and ∆ij = ∂ψi /∂ηj . I2 is
given by I2 = ∆T I(µθ , σθ , ξ)∆. The total information is I = I1 + I2 , giving
1/θ2
w
I=
,
wT IGEV
where w is a vector with non-zero entries and Im is the total Fisher information for
(µ, σ, ξ). Block inversion of I gives the asymptotic precision of θbAT as
−1
prec(θbAT ) = 1/θ2 − wIGEV
wT > 1/θ2 = prec(θbM LE ),
−1
the inequality following because IGEV
is positive definite.
Appendix B: estimating the variance of the score function
Using the notation defined in 2 the log-likelihood for ψ = log θ is
n
X
l(ψ) =
ψ + (eψ − 1) log Vi .
i=1
The score function is
U = U (ψ) =
n
X
Ui (ψ) =
n
X
i=1
ψ
1 + e log Vi =
i=1
n
X
{1 + θ log Vi } .
i=1
b equal n.
The expected information I(ψ), and the observed information J (ψ),
The variance of the score is given by
V
= var
n
X
!
{1 + θ log Vi } ,
i=1
=
n
X
i=1
var {1 + θ log Vi } + 2
j−1
n X
X
j=2 i=1
13
cov {1 + θ log Vi , 1 + θ log Vj } .
(5)
P
The first term of (5) is estimated by ni=1 (1 + θb log Vi )2 . To evaluate the covariance
in the second term we ignore the possibility that either Yi < li or Yj < lj , as their
contributions to the covariance are negligible. Let
1
(Si + Ti ) ,
log Vi ≈ b log
m−b+1
1
log Vj ≈ b log
(Sj + Tj ) ,
m−b+1
where
Si =
X
I(Xk 6 Yi ),
Sj =
k∈B
/ i ∪Bj
X
Ti =
X
I(Xk 6 Yj ),
k∈B
/ i ∪Bj
I(Xk 6 Yi ),
Tj =
k∈Bj ∩Bic
X
I(Xk 6 Yj ).
k∈Bi ∩Bjc
Suppose that {Xk , k ∈ Bi } q {Xk , k ∈ Bj }, for i 6= j, that is, data from distinct blocks
are independent. Consider the case where disjoint blocks are used. Then Si q Sj ,
Si q Tj and Sj q Ti , because in each case a block of Xs, and/or the maximum of these
Xs, are compared to Xs from two other disjoint blocks. Then
cov {1+θ log Vi , 1+θ log Vj } = θ2 cov(log Vi , log Vj ),
1
1
2 2
(Si +Ti )−1,
(Sj +Tj )−1 ,
≈ θ b cov
m−b+1
m−b+1
θ 2 b2
cov(Ti , Tj ),
=
(m − b + 1)2
where we have used log x ≈ x − 1 for x ≈ 1. If Yi > Yj then Ti = b and Tj < b and if
Yi < Yj then Ti < b and Tj = b. Thus, (Ti − b)(Tj − b) = 0 and
cov(Ti , Tj ) = cov(Ti − b, Tj − b),
= E [(Ti − b)(Tj − b)] − E(Ti − b)E(Tj − b),
= − [E(T ) − b]2 ,
where E(T ) = bP (X 6 Y ) = b2 θ/(bθ + 1). As n(n − 1)/2 pairs of blocks contribute to
the second term of (5) it is estimated by −n(n − 1)θb2 b4 /(m − b + 1)2 (bθb + 1)2 .
For sliding blocks we note that the second term of (5) contains contributions from pairs
of blocks that overlap and (n − b)(n − b + 1)/2 pairs that do not. For the latter the
total contribution is estimated by −(n − b)(n − b + 1)θb2 b4 /(m − b + 1)2 (bθb + 1)2 . We
P
Pn−k
b
b
estimate the total contribution of the former by 2 b−1
k=1
i=1 Ui (ψ)Ui+k (ψ). Thus, the
estimators of V using disjoint and sliding blocks are respectively
Vbd =
n
X
(1 + θb log Vi )2 −
i=1
n(n − 1)θb2 b4
(m − b + 1)2 (bθb + 1)2
and
Vbs =
n
X
i=1
(1 + θb log Vi )2 + 2
b−1 X
n−k
X
k=1 i=1
b2 4
b i+k (ψ)
b − (n − b)(n − b + 1)θ b .
Ui (ψ)U
(m − b + 1)2 (bθb + 1)2
In practice the contribution to the score function from the largest block maximum Y(n)
is non-random, because log V(n) = b log[(m − b)/(m − b + 1)]. We adjust for this by
removing from Vbd and Vbs contributions from V(n) .
14
Appendix C: Derivation of θb for some special cases
With unit Fréchet margins ub = b/ log 2.
The maxAR process. Following (Beirlant et al., 2004, chapter 10)
G(x) = P (X1 6 x, . . . , Xb 6 x),
= P (X1 6 x, θZ2 6 x, . . . , θZb 6 x),
= exp {− [1 + θ(b − 1)] /x} .
Therefore, θb = − log G(ub )/ log 2 = θ + (1 − θ)/b.
The moving maxima process. Let αi+ = max(α0 , . . . , αi ) and αi− = max(αp , . . . , αi ).
Then, for b > p,
G(x) = P (X1 6 x, . . . , Xb 6 x),
x
x
x
x
x
x
= P Z1 6 + , . . . , Zp 6 + , Zp+1 6 + , . . . , Zb 6 + , Zb+1 6 − , . . . , Zp+b 6 − ,
αp
αp
αp
α0
αp−1
α1
!
,
)
(
p
p−1
X
X
+
+
αi−
x .
αi + (b − p)αp +
= exp −
i=1
i=0
Therefore,
θb
1
= αp+ +
b
p−1
X
αi+ +
i=0
p
X
!
αi− − pαp+
,
i=1
= θ + c/b.
where 1 − θ 6 c 6 pθ. The lower bound is achieved if j = argmaxi {αi } ∈
/ {0, p} and
αi = (1 − θ)/p for i 6= j, or if {αi } are monotonic in i, and the upper bound when
α0 = αp = θ.
References
Ancona-Navarrete, M. A. and J. A. Tawn (2000). A comparison of methods for estimating the extremal index. Extremes 3 (1), 5–38.
Beirlant, J., Y. Goegebeur, J. Segers, and J. Teugels (2004). Statistics of Extremes:
Theory and Applications. Chichester, England: Wiley.
Canty, A. and B. D. Ripley (2010). boot: Bootstrap R (S-Plus) functions. R package
version 1.3-4.
Chandler, R. E. and S. B. Bate (2007). Inference for clustered data using the independence loglikelihood. Biometrika 94 (1), 167–183.
Chavez-Demoulin, V. and A. C. Davison (2012). Modelling time series extremes. REVSTAT - Statistical Journal 10 (1), 109–133.
Coles, S. G. (1991). Modelling extreme multivariate events. Ph. D. thesis, University of
Sheffield, Sheffield, UK.
15
Dabrowska, D. M., K. A. Doksum, and R. Miura (1989). Rank estimates in a class of
semiparametric two-sample models. Ann. Inst. Statist. Math. 41 (1), 63–79.
Davis, R. and S. Resnick (1989). Basic properties and prediction of max-ARMA processes. Adv. Appl. Prob. 21, 781–803.
Deheuvels, P. (1983). Point processes and multivariate extreme values. J. Multiv.
Anal. 13, 257–272.
Fawcett, L. and D. Walshaw (2012). Estimating return levels from serially dependent
extremes. Environmetrics 23, 272–283.
Ferro, C. and J. Segers (2003). Inference for clusters of extreme values. J. R. Statist.
Soc. B 65, 545–556.
Gomes, M. (1993). On the estimation of parameters of rare events in environmental
time series. In V. Barnett and K. Turkman (Eds.), Statistics for the environment 2:
Water Related Issues, pp. 225–241. Wiley.
Guenther, W. (1967). A best statistic with variance not equal to the Cramér-Rao lower
bound. American Mathematical Monthly 74, 993–994.
Hayfield, T. and J. S. Racine (2008). Nonparametric econometrics: The np package.
Journal of Statistical Software 27 (5).
Johnson, N. L. and S. Kotz (1970). Continuous Univariate Distributions Volume 2. New
York: Wiley.
Laurini, F. and J. Tawn (2003). New estimators for the extremal index and other cluster
characteristics. Extremes 6, 189–211.
Leadbetter, M., G. Lindgren, and H. Rootzén (1983). Extremes and related properties
of random sequences and series. New York: Springer Verlag.
Ledford, A. W. and J. Tawn (2003). Diagnostics for dependence within time series
extremes. J. R. Statist. Soc. B 65 (2), 521–543.
Lumley, T. and P. Heagerty (1999). Weighted empirical adaptive variance estimators
for correlated data regression. J. R. Statist. Soc. B. 61, 459–477.
Northrop, P. J. (2005). Semiparametric estimation of the extremal index using block
maxima. Technical Report 259, University College London.
Patton, A., D. Politis, and H. White (2009). Correction to “Automatic block-length
selection for the dependent bootstrap” by D. Politis and H. White. Econometric
Reviews 28 (4), 372–375.
Politis, D. and J. Romano (1994). The stationary bootstrap. J. Am. Statist. Ass. 89,
1303–1313.
Prescott, P. and A. Walden (1980). Maximum likelihood estimation of the parameters
of the generalized extreme value distribution. Biometrika 67 (3), 723–724.
Robert, C. (2009). Inference for the limiting cluster size distribution of extreme values.
Ann. Statist. 37 (1), 271–310.
16
Robert, C., J. Segers, and C. A. T. Ferro (2009). A sliding blocks estimator for the
extremal index. Electronic Journal of Statistics 3, 993–1020.
Smith, R. L. (1985). Maximum likelihood estimation in a class of non-regular cases.
Biometrika 72, 67–92.
Smith, R. L. (1992). The extremal index for a Markov Chain. J. App. Prob. 29 (1),
37–45.
Smith, R. L. and I. Weissman (1994). Estimating the extremal index. J. R. Statist.
Soc. B 56, 515–528.
Süveges, M. (2007). Likelihood estimation of the extremal index. Extremes 10, 41–55.
Süveges, M. and A. C. Davison (2010). Model misspecification in peaks over threshold
analysis. Ann. Appl. Statist. 4, 203–221.
Tawn, J. A. (1988). Bivariate extreme value theory:
Biometrika 75, 397–415.
models and estimation.
White, H. (1982).
Maximum likelihood estimation of misspecified models.
Econometrika 50, 1–25.
17
Download