Confidence intervals of the tail index

advertisement
Confidence intervals of the tail index
Jan Picek
Department of Applied Mathematics, Technical University of Liberec, Hálkova 6,
461 17 Liberec, Czech Republic jan.picek@vslib.cz
Summary. Jurečková and Picek recently constructed several nonparametric tests
of one-sided hypotheses on the value of the tail index in the family of distributions
with nondegenerate right tails. Inverting the tests in the Hodges-Lehmann manner,
they considered strongly consistent estimators of the tail index m. The simulation
study demonstrates good behavior of such estimator. We also consider one-sided
confidence intervals obtained on the basis of the tests.
Key words: Tail index; Hill estimator, confidence interval, domain of attraction.
1 Introduction
Jurečková and Picek recently constructed several nonparametric tests of one-sided
hypotheses on the value of the Pareto-type tail index in the family of distributions
with nondegenerate right tails; see Jurečková (2000), Jurečková and Picek (2001),
Fialová, Jurečková and Picek (2004), and Picek and Jurečková (2001). The proposed
tests, fully nonparametric, are based on splitting the set of observations into N
subsamples of sizes n and on the empirical distribution functions of N subsamples
statistics of various types; namely, the subsample extremes, the subsample means,
the averages of two subsubsample extremes.
The model assumes that the distribution function F of the observations satisfies
lim
a→∞
− log(1 − F (a))
= 1,
m log a
(1)
where m > 0 is the parameter of interest, i.e. the tail index.
The tests of the one-sided hypothesis Hm0 : m ≤ m0 (or H∗m0 : m ≥ m0 )
are consistent against one-sided alternatives Km0 : m > m0 , (or K∗m0 : m < m0 ,
respectively), m0 > 0 fixed; the asymptotic null distributions of the test criteria are
normal. The simulation studies in the cited papers show that the tests distinguish
well the distribution tails already for moderate samples.
Inverting such test in the Hodges-Lehmann manner, we obtain strongly consistent estimate MN of m; see Jurečková and Picek (2004). In this paper the simulation
study demonstrates a surprisingly good approximation of the tail index m by MN .
1302
Jan Picek
Cheng and Peng (2001) proposed the confidence intervals for the tail index.
They are based on the asymptotic normal approximation of the Hill estimator. In
this paper the one-sided confidence intervals are obtained again on the basis of
nonparametric tests.
2 Confidence intervals
Consider N independent samples of equal fixed sizes n, Xj = (Xj1 , . . . , Xjn )′ , j =
1, . . . , N from the population with distribution function F satisfying (1), where
m, 0 < m < ∞ is the parameter of interest (i.e, the tail index).
(N)
(1)
(N)
(1)
Let X(n) , . . . , X(n) be the respective sample maxima, X̄n , . . . , X̄n be the
respective sample means and fix ν, 1 ≤ ν ≤ n − 1 and denote
θ̂n(j) =
1 (1)
(2)
Xj + Xj
,
4
(1)
(2)
(2)
where Xj
= max{Xj1 , . . . , Xjν }, Xj
= max{Xj(ν+1) , . . . , Xjn }, j =
1, . . . , N.
(N)
(1)
Then X(n) , . . . , X(n) is a random sample from distribution F ∗ (x) = F n (x),
(1)
(N)
(1)
(N)
X̄n , . . . , X̄n from a distribution F ∗∗ (x) and θ̂n , . . . , θ̂n from F ∗∗∗ (x). Denote
F̂N∗ the empirical distribution function of the sample maxima, i.e.
F̂N∗ (a) =
N
1 X
(j)
I[X(n) ≤ a].
N j=1
(3)
Similarly denote F̂N∗∗ and F̂N∗∗∗ the corresponding empirical distribution functions.
For any fixed m > 0, put
1
aN,m = (nN 1−δ ) m
where 0 < δ <
1
2
(4)
is a chosen constant.
The test (see Jurečková and Picek 2001) of Hm0 : m ≤ m0 against Km0 :
m > m0 , rejects the hypothesis Hm0 on the asymptotic significance level α ∈ (0, 1)
provided
either 1 − F̂N∗ (aN,m0 ) = 0,
or 1 − F̂N∗ (aN,m0 ) > 0 and
h
i
N δ/2 − log(1 − F̂N∗ (aN,m0 )) − (1 − δ) log N ≥ Φ−1 (1 − α),
where Φ is the standard normal distribution function.
Analogously, we reject the hypothesis H∗m0 : m ≥ m0 against K∗m0 : m < m0 ,
provided
either F̂N∗ (aN,m0 ) = 0,
or F̂N∗ (aN,m0 ) > 0 and
h
i
N δ/2 − log(1 − F̂N∗ (aN,m0 )) − (1 − δ) log N ≤ Φ−1 (α);
Confidence intervals of the tail index
1303
or equivalently
h
i
N δ/2 log(1 − F̂N∗ (aN,m0 )) + (1 − δ) log N ≥ Φ−1 (1 − α)
The quantity aN,m is decreasing in m for any fixed N, n, hence the statistic
log(1 − F̂N∗ (aN,m )) + (1 − δ) log N
is nondecreasing in m for fixed N, n.
Then we can consider a left-side confidence interval of the tail index m as
0
1
log(nN
1−δ
)
log F̂N∗−1 (1 − exp {−N −δ/2 Φ−1 (1 − α) − (1 − δ) log N })
; ∞A
(5)
and a right-side interval of m as
0
1
0;
log(nN 1−δ )
log F̂N∗−1 (1 − exp {N −δ/2 Φ−1 (1 − α) − (1 − δ) log N })
A ,
(6)
where
F̂N∗−1 (x) = inf{s : F̂N∗ (s)) > x}.
(7)
Analogously
0
1
log(N 1−δ )
log F̂N∗∗−1 (1 − exp {−N −δ/2 Φ−1 (1 − α) − (1 − δ) log N })
and
; ∞A
0
(8)
1
0;
log(N 1−δ )
log F̂N∗∗−1 (1 − exp {N −δ/2 Φ−1 (1 − α) − (1 − δ) log N })
A
(9)
are one-side intervals of m based on subsamples means.
Similarly,
0
1
log(nN 1−δ )
log 2 F̂N∗∗∗−1 (1 − exp {−N −δ/2 Φ−1 (1 − α) − (1 − δ) log N })
and
; ∞A
0
0;
(10)
1
log(nN 1−δ )
log 2 F̂N∗∗∗−1 (1 − exp {N −δ/2 Φ−1 (1 − α) − (1 − δ) log N })
A .
are one-side intervals of m based on averages of extremes of two subsamples.
(11)
1304
Jan Picek
3 Numerical illustration
The performance of the proposed estimation procedures is illustrated on the simulated random samples of sizes 1000, splitted to N = 200 subsamples of sizes n = 5;
the samples were generated 1000 times from the following distributions:
Pareto
F (x) = 1 −
Burr
F (x) = 1 −
Generalized
Pareto
F (x) =
1
1+x
m
1
1+xm
8
>
> 1 − (1 +
>
>
>
>
>
< 1 − (1 +
,
α
x≥0
,
x≥0
x −m
)
mβ
if x ≥ 0, 0 < m < ∞, β > 0,
x −m
)
mβ
if 0 ≤ x ≤ −mβ, m < 0, β > 0,
>
>
1 − e−x/β
>
>
>
>
>
:
if m = ∞, β > 0,
0
otherwise.
For either of these distributions we proceeded as follows:
(1) Generated independent observations (Xj1 , . . . , Xjn )′ ,
j = 1, . . . , N ;
(2) computed
(j)
A) maxima X(n) = max(Xj1 , . . . , Xjn ), j = 1, . . . , N ;
(1)
(N)
B) sample means X n , . . . , X n ;
C) means of two maxima
1
max(Xj1 , . . . , Xjν ) + max(Xj(ν+1) , . . . , Xjn ) ,
4
j = 1, . . . , N, where ν = [n/2];
(3) found the corresponding empirical distribution function for each of the quantities
A–C;
(4) computed the confidence intervals for δ = 0.01, 0.02, . . . , 0.60.
(5) For a comparison, the confidence intervals based on the asymptotic normal approximation of Hill estimator:
θ̂n(j) =
p
(k)
p
(k)H(k) + Φ−1 (1 − α)H(k)
!
;∞ ,
where H(k) is the Hill estimator
H(k) =
k
1X
log X(n−i+1:n) − log X(n−k:n) .
k i=1
(6) The steps (1)-(5) were repeated 1 000 times.
(7) selected statistics and ”coverage probability” were computed.
Figures 1–6 show the empirical coverage probability and the figure 7 showes
the quantiles of left-side interval values in dependence on the choice of δ (or k,
respectively).
1305
0.6
0.4
0.0
0.2
coverage probability
0.8
1.0
Confidence intervals of the tail index
0.0
0.1
0.2
0.3
0.4
0.5
0.6
delta
!h
0.6
0.4
0.0
0.2
coverage probability
0.8
1.0
Fig. 1. The empirical coverage probability for Pareto (m=1.0) distribution in dependence of δ for the intervals based on subsamples maxima (solid), on subsamples
means (dotted) and on averages of maxima of two subsamples (dashed).
0
200
400
600
800
1000
delta
Fig. 2. The empirical coverage probability for Burr (m=1.0) distribution in dependence of the sample fraction k for the interval based on of Hill estimator.
Conclusion
The simulation study show better the coverage probability for intervals based on
the Hill estimator. However, our left-side values of interval seem being less sensitive
and stable to the choice of δ than the values of Hill interval with respect to k.
Jan Picek
0.6
0.4
0.0
0.2
coverage probability
0.8
1.0
1306
0.0
0.1
0.2
0.3
0.4
0.5
0.6
delta
0.6
0.4
0.0
0.2
coverage probability
0.8
1.0
Fig. 3. The empirical coverage probability for Burr (m=1.0) distribution in dependence of δ for the intervals based on subsamples maxima (solid), on subsamples
means (dotted) and on averages of maxima of two subsamples (dashed).
0
200
400
600
800
1000
delta
Fig. 4. The empirical coverage probability for Burr (m=1.0) distribution in dependence of the sample fraction k for the interval based on of Hill estimator.
Acknowledgements
The author thanks two referees for their valuable comments and suggestions, which
led to a better readability of the text.
This work was supported by Czech Republic Grant 201/05/2340.
1307
0.6
0.4
0.0
0.2
coverage probability
0.8
1.0
Confidence intervals of the tail index
0.0
0.1
0.2
0.3
0.4
0.5
0.6
delta
0.6
0.4
0.0
0.2
coverage probability
0.8
1.0
Fig. 5. The empirical coverage probability for Generalized Pareto (m=1.0) distribution in dependence of δ for the intervals based on subsamples maxima (solid), on
subsamples means (dotted) and on averages of maxima of two subsamples (dashed).
0
200
400
600
800
1000
delta
Fig. 6. The empirical coverage probability for Generalized Pareto (m=1.0) distribution in dependence of the sample fraction k for the interval based on of Hill
estimator.
References
[CHP01] S. Cheng, and L. Peng, Confidence intervals for the tail index. Bernoulli,
7(5), (2001), 751–760.
[FJP03] A. Fialová, J. Jurečková and J. Picek, Estimating Pareto tail index using
all observations. REVSTAT 2, (2004) 75-99.
Jan Picek
0.8
quantiles
0.6
1.0
0.2
0.6
0.4
0.8
quantiles
1.0
1.2
1.2
1.4
1.4
1308
0.0
0.1
0.2
0.3
delta
0.4
0.5
0.6
0
200
400
600
800
1000
k
Fig. 7. The quantiles of left-side interval values for Burr (m=1.0) distribution –
50%(solid), 5%(dotted) and 95% (dashed). The quantiles corresponding to the interval based on subsamples maxima are left and the quantiles corresponding to the
interval based on the Hill estimator are right.
[Jur00]
[JP01]
[JP04]
[PJ01]
J. Jurečková, Tests of tails based on extreme regression quantiles. Statist.
& Probab. Letters 49 (2000), 53–61.
J. Jurečková and J. Picek, A class of tests on the tail index. Extremes 4:2
(2001), 165–183.
J. Jurečková and J. Picek, Estimates of the tail index based on nonparametric tests. In: Theory and Applications of Recent Robust Methods,
edited by M. Hubert, G. Pison, A. Struyf and S. Van Aelst, Series: Statistics for Industry and Technology, Birkhauser, Basel, 2004.
J. Picek and J. Jurečková, A class of tests on the tail index using the modified extreme regression quantiles. “ROBUST’2000” (J. Antoch, G. Dohnal,
Eds.), Union of Czech Mathematicians and Physicists, Prague, 2001, 217–
226.
Download