Semiparametric estimation of the extremal index using block maxima

advertisement
Semiparametric estimation of the extremal index
using block maxima
Paul J. Northrop
Department of Statistical Science
University College London
Gower Street
London WC1E 6BT, UK
November 16, 2005
Abstract
The extremal index θ, a measure of the degree of local dependence in the
extremes of a stationary process, plays an important role in extreme value analyses.
We propose a class of rank estimators of θ based on viewing a characterisation of
θ as defining a semiparametric model. These estimators are easy to compute and
have good robustness properties. We study the properties of these estimators using
simulation. An application to temperature data is presented.
Keywords: Extremal index; semiparametric estimation; rank estimator; extreme values.
1
Introduction
Let {Zi , i = 1, 2, . . .} be a stationary sequence of random variables with marginal distribution function H. The presence of short-range dependence in the {Zi } series influences
the extremal behaviour of the process. For processes satisfying a weak mixing condition,
the D(un )-condition of Leadbetter et al. (1983), which limits the degree of long-term
dependence at extreme levels,
P {max(Z1 , . . . , Zn ) 6 un } ≈ H nθ (un ),
(1)
for large n and un , where θ (0 6 θ 6 1) is the extremal index of the process. The
extremal index is the primary measure of extremal dependence in such processes, with
θ = 1 indicating independence at asymptotically high levels. Estimation of θ is of fundamental importance when making inferences about the extremal properties of dependent
sequences.
Let {Zi∗ , i = 1, 2, . . .} denote a sequence of independent random variables with the same
marginal distribution function as {Zi }. Equation (1) suggests that, for large n and un ,
Research Report No. 259, Department of Statistical Science, University College London.
Date: November 2005.
the distribution functions G of max(Z1 , . . . , Zn ) and F of max(Z1∗ , . . . , Zn∗ ) are related
approximately by
G(un ) = F (un )θ .
(2)
The classical theory of extreme values for i.i.d. sequences (Galambos, 1987; Leadbetter
et al., 1983) gives conditions for the existence of normalising sequences an > 0 and bn
such that
P {max(Z1∗ , . . . , Zn∗ ) 6 an x + bn } = H n (an x + bn ) → FGEV (x),
as n → ∞, where FGEV is a non-degenerate distribution function. If such a limit exists
FGEV (x) is the distribution function
" −1/ξ #
x−µ
,
FGEV (x) = exp − 1 + ξ
σ
where 1+ξ[(x−µ)/σ] > 0, of a generalised extreme value, GEV(µ, σ, ξ), distribution. As
(2) suggests, if the D(un )-condition is satisfied, the corresponding limiting distribution
θ
function for max(Z1 , . . . , Zn ) is FGEV
(x), the distribution function of a GEV(µθ , σθ , ξ)
distribution, where
µθ = µ − σ(1 − θ ξ )/ξ and σθ = σθ ξ .
(3)
Another characterisation of the extremal index is that 1/θ is the limiting mean cluster
size in the point process of exceedence times of asymptotically high thresholds (Leadbetter, 1983). This motivates methods of estimating θ based on the observed exceedence
times of a suitably chosen high threshold u. The resulting estimators can be sensitive to
the value of u. The blocks and runs estimators (Smith and Weissman, 1994) also require
the specification of an extra tuning parameter. Recently Ferro and Segers (2003) have
developed an estimator of θ based on a penultimate approximation to the asymptotic
distribution of interexceedence times. Gomes (1993) and Ancona-Navarrete and Tawn
(2000) have carried out comparisons of various estimators of θ.
Subject to the conditions previously stated θ quantifies, via (2), the extent to which the
maximum X of n independent observations Z1∗ , . . . , Zn∗ from H is stochastically larger
than the maximum Y of n dependent observations Z1 , . . . , Zn from H. This illustrated
in figure 1 which shows the empirical distribution functions of maxima of dependent
observations and maxima of approximately independent observations resampled from
the dependent series using the regular resampling method detailed in section 2. The
plot on the left is based on 100 maxima of sequences of length 90 simulated from a maxautoregressive process with an extremal index of 0.5 (see (5) in section 4). The plot on the
right is based on 93 maxima of sequences of length 90 taken froma the temperature series
analysed in section 5. In section 3.1 we propose a class of semiparametric estimators of θ
based on the ranks of observations sampled from F and G and consider their robustness
properties. In section 4 we carry out a simulation study to investigate the performance
of the estimators and in section 5 we present an application to temperature data from
Wooster, Ohio.
2
1.0
0.6
0.0
0.2
0.4
empirical distribution function
0.8
1.0
0.8
0.6
0.4
empirical distribution function
0.2
0.0
3
4
5
6
7
8
9
−5
0
5
log(simulated data)
10
15
20
25
temperature
Figure 1: Empirical distribution functions of the maxima of 120 dependent observations
(——–) and 120 approximately independent observations (······) from the same marginal
distribution. Left: 100 maxima based on (the logarithm of) data simulated from a
max-autoregressive process with extremal index 0.5. Right: 93 maxima based on the
temperature series of section 5.
2
Block maxima
Suppose that we have a sample Z1 , . . . , Zmn from a stationary sequence. We divide the
sample into m blocks of length n and define Yi = max{Z(i−1)n+1 , . . . , Zin }, i = 1, . . . , m.
It follows from section 1 that for sufficiently large n, Y1 , . . . , Ym are approximately independent observations from a GEV distribution with distribution function G = F θ .
To estimate θ based on the characterisation G = F θ we require an i.i.d. sample with
the same marginal distribution as Z1 , . . . , Zmn . We achieve this by resampling from
Z1 , . . . , Zmn in such a manner that the resampled sequence of observations are approximately an i.i.d. sample. One possibility (Ancona-Navarrete and Tawn, 2000)
is to randomly permute the observations. We call this random resampling. An alternative approach is regular resampling in which Z1 , . . . , Zmn are placed in the order
{(Z(j−1)m+i , j = 1, . . . , n), i = 1, . . . , m}. In either case we create an approximate sample
X1 , . . . , Xm from F by taking the maxima over disjoint blocks of length n.
Under regular resampling the values used to produce each Xi are exactly m time points
apart in the original series. Under random resampling the separation of these values
varies. For sufficiently large m we expect the dependence within the resampled blocks
to be stronger under random resampling than under regular resampling, and therefore
estimators based on random resampling to exhibit some positive bias. However, we
would expect dependence between the Xi s to be greater under regular resampling.
3
Estimation of the extremal index
Gomes (1993) and Ancona-Navarrete and Tawn (2000) have examined parametric approaches to estimating θ based on (2). Let Y = (Y1 , . . . , Ym ) denote a sample of block
maxima from G and X = (X1 , . . . , Xm ) a sample of block maxima from F . Gomes
(1993) fits GEV distributions separately to these samples and constructs an estimator of
3
θ using the relationships in (3). Ancona-Navarrete and Tawn (2000) estimate (µ, σ, ξ, θ)
simultaneously by maximising a likelihood constructed under the assumption that X
and Y are mutually independent. In both cases extra parameters are estimated even
though interest focuses on θ and estimates of θ may be sensitive to outlying observations
and departures from the assumed form of F .
3.1
Semiparametric estimation of the extremal index
We view the limiting relationship (2), G(x) = F (x)θ , as defining a semiparametric
model in which θ is the parameter of interest and F and G are treated as unknown
nuisance functions. The structure of this model is very similar to that of a two-sample
proportional hazards model G = F φ (Cox, 1972) for survival data, where G and F are
unknown survivor functions and and φ is the relative risk. Thus we consider estimators
of θ which are analogous to those proposed for φ.
We adapt a class of rank estimators of φ investigated by Begun (1982) and Begun and
Reid (1983). Taking logs and differentiating we express the constraint G = F θ as
J(F, G)F dG = θJ(F, G)G dF,
where J(·, ·) is a score function included with a view to optimising Rthe properties of the resulting estimator. Provided J satisfies the integrability condition |J(F, G)|G dF < ∞,
θ can be expressed as
Z
.Z
θ = J(F, G)F dG
J(F, G)G dF.
A class
θbJ =
m
X
j=1
b j )}Fb(Yj )
J{Fb(Yj ), G(Y
m
.X
j=1
b j )}G(X
b j)
J{Fb(Xj ), G(X
of estimators of θPis obtained by replacing F and G by the
empirical distribution functions
Pm
m
b
b
F (x) = (1/m) i=1 I(Xi 6 x) and G(y) = (1/m) i=1 I(Yi 6 y). Under suitable
conditions on J (given by Begun and Reid (1983)) the estimator θbJ is strongly consistent
and asymptotically normal. The distribution of θbJ does not depend on F and is robust
to departures from assumptions about the underlying form of F . These estimators are
not contrained to be less than or equal to 1, so in practice we use min(1, θbJ ) to estimate
θ.
Begun and Reid (1983) consider three score functions: J1 = 1, J2 = 2/(F + G) and
J3 = 2/(F + θG). For J1 = 1 the resulting estimator θb1 estimates P (X < Y )/P (Y < X)
and θb2 is an analogue of the Mantel-Haenszel estimator (Crowley, 1975). Clearly θb3
depends on the unknown θ. A two-step estimator
θb3 =
m
X
j=1
m
.X
b i)
G(X
Fb(Yj )
b j)
b i)
Fb(Yj ) + θb1 G(Y
Fb(Xi ) + θb1 G(X
i=1
is constructed, based on the initial estimator θb1 . Provided that the initial estimator is
consistent, which θb1 is, θb3 is optimal among regular rank estimators of θ, in the sense
4
of attaining minimum asymptotic variance. The two-step estimator θb3 is asymptotically
fully efficient relative to the ranks-based maximum partial likelihood estimator of Cox
(1975) and can be interpreted as a single iteration of an algorithm to solve these partial
likelihood equations. Begun and Reid (1983) note that the gain in asymptotic efficiency
of θb3 over θb2 is small unless θ < 1/4 and Bernstein et al. (1981) find that θb2 performs well
in small samples. Under the assumption that the random variables (X, Y ) are mutually
independent the approximate large sample variances of these estimators are given by
1 θ(1 + θ)2 (1 + 4θ + θ 2 )
n
(2 + θ)(1 + 2θ)
2
θ I2 (θ) + I2 (1/θ)
var(θb2 ) =
(4)
n I1 (θ)I1 (1/θ)
θ 2 −1
var(θb3 ) =
I (θ),
n 3
R1
R1
R1
where I1 (t) = 0 (1+u1−t )−1 du, I2 (t) = 0 ut−1 (1+ut−1 )−2 du, I3 (t) = 0 (1+tu1−1/t )−1 du.
The influence functions (Hampel, 1974)
Z
Z ∞
J(F, G) dG
J(F, G)G dF,
IC1 (x) = −θJ{F (x), G(x)}G(x) +
t
Z
Z ∞
IC2 (y) = J{F (y), G(y)}F (y) − θ
J(F, G) dF
J(F, G)G dF,
var(θb1 ) =
t
which quantify the sensitivity of θbJ to individual observations x and y from X and
Y respectively, are bounded, indicating that θbJ is robust to the presence of outlying
observations.
Since X and Y are based on the same underlying data there is some dependence between
them. In particular, since each value is resampled exactly once to produce X, the
maximum values X(m) and Y(m) of X and Y are equal. Consider

 


 .

X
X
b (m) ) +
b (i) ) .
θb1 = Fb(Y(m) ) +
Fb(Y(i) )
G(X
G(X

 

Yi <Y(m)
Xi <X(m)
b (m) ) = 1. Under the model
The constraint X(m) = Y(m) means that Fb(Y(m) ) = G(X
b (m) ) = 1 with probability
G = F θ , P (X(m) > Y(m) = 1/(1 + θ), so that Fb(Y(m) ) < 1, G(X
1/(1 + θ). Therefore, for θ < 1, θb1 incurs positive bias as a result of the constraint
X(m) = Y(m) , especially for small n. Similar arguments apply to θb2 and θb3 but the effects
on these estimators are smaller due to the weighting provided by J2 and J3 .
The sample size m is determined by the the choice of block size n. Here the main requirement is that n is sufficiently large that the limiting relationship (2) holds approximately.
Gomes (1993) and Ancona-Navarrete and Tawn (2000) set m = n, a pragmatic choice
motivated partly by the need for m to be large enough to estimate reliably all unknown
parameters. In the current approach this requirement is less crucial. We examine the
sensitivity of the semiparametric estimates of θ to m and n in sections 4 and 5.
5
4
Simulation study
We examine the performance of estimators θb1 , θb2 and θb3 , for range of sample sizes, in
terms of bias, root mean square error (RMSE), and coverage probabilities of confidence
intervals constructed based on asymptotic variances (4). We construct confidence interb an approxvals based on normal approximations to the distributions of θb and of log θ,
imate variance-stabilising transformation. We simulate data from a max-autoregressive
process {Zi , i = 1, 2, . . .} with extremal index θ, namely,
Z1 = W1 /θ,
Zi = max{(1 − θ)Zi−1 , Wi },
i = 2, 3, . . .
(5)
where Wi , i = 1, 2, . . . are independent unit Fréchet random variables. This is the process
used by Ferro and Segers (2003) to study the properties of the intervals estimator of θ.
We simulate 1000 sequences of length 90m, m = 10, 25, 50 and 100 for each of three
values, 0.25, 0.5, and 0.75, of θ. For comparability with the example in section 5 we use
a block size of 90, equivalent to data from one winter period.
Table 1 shows the estimated bias and RMSE of the estimators based on regular resampling of the original data series. As anticipated θb1 exhibits positive bias, the bias
decreasing with m and with θ, but θb2 and θb3 have little bias. For the sample sizes studied θb2 tends to have a slightly lower RMSE than the other estimators. The estimated
covarage probabilities are generally close to the nominal value although confidence intervals based on θb1 tend to be conservative. Estimator θb2 performs slightly better than θb3
in this respect and confidence intervals constructed on the log θ-scale appear preferable
for θ = 0.25 and for θ = 0.5 and θ = 0.75 have similar coverages to those constructed
on the θ-scale. These observations are broadly consistent with the findings of Bernstein
et al. (1981). These results suggest that any dependence between and within X and Y
is sufficiently limited that the asymptotic variances in (4) are useful.
Noticeable positive bias results if random resampling is used, especially for small m.
For example, the bias of θb2 for m = 10 is estimated to be 0.07 when θ = 0.25. RMSEs
are comparable to those under regular resampling but confidence intervals have poorer
coverage properties.
In order to make comparisons with the intervals estimator we note that the results of
Ferro and Segers (2003) are based on a sample size of approximately 56. For m = 50
the magnitudes of the estimated RMSEs of the semiparametric estimators are similar to
those obtained by Ferro and Segers (2003) for a threshold corresponding to the 98th percentile of the marginal Fréchet distribution. For lower thresholds the intervals estimator
has smaller RMSE. The coverage probablities of the confidence intervals associated with
θb2 compare favourably to those of the bootstrapped confidence intervals based on the
intervals estimator.
6
bias
θ
m
θb1
RMSE
θb2
θb3
θb1
θb2
θb3
0.25
10
25
50
100
0.049 -0.004 -0.019
0.026 0.002 -0.006
0.020 0.006 0.001
0.012 0.006 0.004
0.150
0.100
0.075
0.051
0.130
0.093
0.070
0.048
0.140
0.098
0.073
0.050
0.5
10
25
50
100
0.038 -0.009 -0.010
0.030 0.009 0.010
0.016 0.003 0.003
0.011 0.004 0.004
0.200
0.140
0.098
0.067
0.200
0.140
0.098
0.067
0.210
0.140
0.100
0.068
0.75
10
25
50
100
0.007 -0.025 -0.023
0.019 0.005 0.006
0.013 0.003 0.004
0.009 0.004 0.004
0.180
0.140
0.110
0.076
0.200
0.160
0.110
0.084
0.200
0.160
0.120
0.084
Table 1: Estimated bias and root mean square error (RMSE) of the three semiparametric
estimators of the extremal index of a max-autoregressive process.
7
θ-scale
θb3
θb1
θb2
θb3
0.25
10
1 0.90 0.84
25 0.95 0.90 0.85
50 0.96 0.91 0.89
100 0.95 0.93 0.90
0.98
0.97
0.95
0.95
0.98
0.95
0.92
0.94
0.98
0.91
0.89
0.91
0.5
10
25
50
100
0.91
0.95
0.95
0.97
1
1
0.99
0.98
0.98
0.97
0.97
0.97
0.97
0.97
0.96
0.96
0.75
10 0.99 0.97 0.96
25 0.99 0.97 0.97
50
1 0.98 0.98
100
1 0.99 0.99
1
1
1
1
0.99
0.98
0.98
0.99
0.98
0.98
0.98
0.99
θ
m
θb1
log θ-scale
0.98
0.98
0.98
0.99
θb2
0.93
0.96
0.95
0.97
Table 2: Estimated coverage probabilities of 95% confidence intervals for the extremal
index θ of a max-autoregressive process. Left: calculated on the θ-scale using a normal
b right: calculated on the log θ-scale.
approximation to the distribution of θ,
8
5
Application to temperature data
We use the semi-parametric estimators to estimate the extremal index of a series of
negated daily minimum temperatures, recorded to the nearest degree Fahrenheit, at
Wooster, Ohio from June 1893 to December 1987. Smith et al. (1997) and Coles et al.
(1994) gives further details and analyses of these data. In common with Smith et al.
(1997) and Ferro and Segers (2003) we study the winter months of December to February,
a period over which the series is approximately stationary. The data contain 93 winter
periods each of length 90 days. There are very few missing values in these data. For
data where there are appeciable numbers of missing observations, θ can be estimated by
viewing blocks containing missing values as providing right-censored block maxima and
using the estimators proposed by Begun and Reid (1983) for censored survival data.
We estimate θ for block sizes ranging from 30 days to 360 days using regular resampling
from the orginal series. Quantile plots (not shown) suggest that for these data the GEV
limit applies approximately for block size as small as 30. Results are presented for θb2 only.
As expected very similar results are obtained for θb3 , with θb1 producing slightly larger
estimates of θ than the other two estimators. The estimates of θ are plotted against
block size in figure 2 with 95% confidence limits. The estimates are stable over a wide
range of block sizes and suggest an extremal index of approximately 0.6, in agreement
with the findings of Ferro and Segers (2003) and Smith et al. (1997). The relationship
between θb2 and block size in figure 2 mirrors that of θb and threshold in these papers. This
make sense since we would expect some approximate correspondence between threshold
and block size. Also shown in fig 2 for each block size are the averages of 100 estimates
of θ based on 100 different random resamples from the original series. For small block
sizes these averages are very similar to the estimates based on regular resampling but
for large block sizes (corresponding to smaller sample sizes) these averages vary more
smoothly with block size.
6
Acknowledgements
I am very grateful to Chris Ferro for supplying the Wooster temperature data.
References
Ancona-Navarrete, M. A. and J. A. Tawn (2000). A comparison of methods for estimating the extremal index. Extremes 3 (1), 5–38.
Begun, J. M. (1982). Estimates of relative risk. Institute of statistics, mimeo series no.
1382, University of North Carolina at Chapel Hill.
Begun, J. M. and N. Reid (1983). Estimating the relative risk with censored data.
Journal of the American Statistical Association 78 (382), 337–341.
9
279
139
93
69
55
46
39
34
31
27
25
23
30
60
90
120
150
180
210
240
270
300
330
360
1.0
extremal index
0.8
0.6
0.4
0.2
0.0
block size / days
Figure 2: Estimates θb2 (◦) and 95% confidence limits (- - - - - - -) of the extremal index
for the Wooster temperature series for block sizes ranging from 30 days to 360 days,
using regular resampling from the original series. The confidence limits are calculated
using a normal approximation on the log θ scale. Also shown (· · · · · · ·) are the averages
of 100 estimates using random resampling. The sample size corresponding to each block
size is given on the upper axis.
10
Bernstein, L., J. Anderson, and M. C. Pike (1981). Estimation of the proportional
hazard in two treatment group clinical trials. Biometrics 37, 513–520.
Coles, S. G., J. A. Tawn, and R. L. Smith (1994). A seasonal Markov model for extremely
low temperatures. Environmetrics 5, 221–239.
Cox, D. R. (1972). Regression models for life tables. J. Roy. Statist. Soc. B 34, 187–220.
Cox, D. R. (1975). Partial likelihood. Biometrika 62, 269–276.
Crowley, J. (1975). Estimation of relative risk in survival studies. Technical report no.
423, Department of Statistics, University of Wisconsin.
Ferro, C. A. T. and J. Segers (2003). Inference for clusters of extreme values. J. Roy.
Statist. Soc. B 65 (2), 545–556.
Galambos, J. (1987). The asymptotic theory of extreme order statistics. Melbourne:
Krieger. 2nd edn.
Gomes, M. I. (1993). On the estimaton of parameters of rare events in environmental
times series. In V. Barnett and K. F. Turkman (Eds.), Statistics for the Environment
2: Water Related Issues, Chichester, pp. 225–241. Wiley.
Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of
the American Statistical Association 69 (346), 383–393.
Leadbetter, M. R. (1983). Extremes and local dependence in stationary sequences. Z.
Wahrsch. Ver. Geb. 65, 291–306.
Leadbetter, M. R., G. Lindgren, and H. Rootzén (1983). Extremes and related properties
of random sequences and series. New York: Springer.
Smith, R. and I. Weissman (1994). Estimating the extremal index. J. Roy. Statist. Soc.
B 56 (3), 515–528.
Smith, R. L., J. A. Tawn, and S. Coles (1997). Markov chain models for threshold
exceedances. Biometrika 84, 249–268.
11
Download