Confidence intervals of the tail index Jan Picek Department of Applied Mathematics, Technical University of Liberec, Hálkova 6, 461 17 Liberec, Czech Republic jan.picek@vslib.cz Summary. Jurečková and Picek recently constructed several nonparametric tests of one-sided hypotheses on the value of the tail index in the family of distributions with nondegenerate right tails. Inverting the tests in the Hodges-Lehmann manner, they considered strongly consistent estimators of the tail index m. The simulation study demonstrates good behavior of such estimator. We also consider one-sided confidence intervals obtained on the basis of the tests. Key words: Tail index; Hill estimator, confidence interval, domain of attraction. 1 Introduction Jurečková and Picek recently constructed several nonparametric tests of one-sided hypotheses on the value of the Pareto-type tail index in the family of distributions with nondegenerate right tails; see Jurečková (2000), Jurečková and Picek (2001), Fialová, Jurečková and Picek (2004), and Picek and Jurečková (2001). The proposed tests, fully nonparametric, are based on splitting the set of observations into N subsamples of sizes n and on the empirical distribution functions of N subsamples statistics of various types; namely, the subsample extremes, the subsample means, the averages of two subsubsample extremes. The model assumes that the distribution function F of the observations satisfies lim a→∞ − log(1 − F (a)) = 1, m log a (1) where m > 0 is the parameter of interest, i.e. the tail index. The tests of the one-sided hypothesis Hm0 : m ≤ m0 (or H∗m0 : m ≥ m0 ) are consistent against one-sided alternatives Km0 : m > m0 , (or K∗m0 : m < m0 , respectively), m0 > 0 fixed; the asymptotic null distributions of the test criteria are normal. The simulation studies in the cited papers show that the tests distinguish well the distribution tails already for moderate samples. Inverting such test in the Hodges-Lehmann manner, we obtain strongly consistent estimate MN of m; see Jurečková and Picek (2004). In this paper the simulation study demonstrates a surprisingly good approximation of the tail index m by MN . 1302 Jan Picek Cheng and Peng (2001) proposed the confidence intervals for the tail index. They are based on the asymptotic normal approximation of the Hill estimator. In this paper the one-sided confidence intervals are obtained again on the basis of nonparametric tests. 2 Confidence intervals Consider N independent samples of equal fixed sizes n, Xj = (Xj1 , . . . , Xjn )′ , j = 1, . . . , N from the population with distribution function F satisfying (1), where m, 0 < m < ∞ is the parameter of interest (i.e, the tail index). (N) (1) (N) (1) Let X(n) , . . . , X(n) be the respective sample maxima, X̄n , . . . , X̄n be the respective sample means and fix ν, 1 ≤ ν ≤ n − 1 and denote θ̂n(j) = 1 (1) (2) Xj + Xj , 4 (1) (2) (2) where Xj = max{Xj1 , . . . , Xjν }, Xj = max{Xj(ν+1) , . . . , Xjn }, j = 1, . . . , N. (N) (1) Then X(n) , . . . , X(n) is a random sample from distribution F ∗ (x) = F n (x), (1) (N) (1) (N) X̄n , . . . , X̄n from a distribution F ∗∗ (x) and θ̂n , . . . , θ̂n from F ∗∗∗ (x). Denote F̂N∗ the empirical distribution function of the sample maxima, i.e. F̂N∗ (a) = N 1 X (j) I[X(n) ≤ a]. N j=1 (3) Similarly denote F̂N∗∗ and F̂N∗∗∗ the corresponding empirical distribution functions. For any fixed m > 0, put 1 aN,m = (nN 1−δ ) m where 0 < δ < 1 2 (4) is a chosen constant. The test (see Jurečková and Picek 2001) of Hm0 : m ≤ m0 against Km0 : m > m0 , rejects the hypothesis Hm0 on the asymptotic significance level α ∈ (0, 1) provided either 1 − F̂N∗ (aN,m0 ) = 0, or 1 − F̂N∗ (aN,m0 ) > 0 and h i N δ/2 − log(1 − F̂N∗ (aN,m0 )) − (1 − δ) log N ≥ Φ−1 (1 − α), where Φ is the standard normal distribution function. Analogously, we reject the hypothesis H∗m0 : m ≥ m0 against K∗m0 : m < m0 , provided either F̂N∗ (aN,m0 ) = 0, or F̂N∗ (aN,m0 ) > 0 and h i N δ/2 − log(1 − F̂N∗ (aN,m0 )) − (1 − δ) log N ≤ Φ−1 (α); Confidence intervals of the tail index 1303 or equivalently h i N δ/2 log(1 − F̂N∗ (aN,m0 )) + (1 − δ) log N ≥ Φ−1 (1 − α) The quantity aN,m is decreasing in m for any fixed N, n, hence the statistic log(1 − F̂N∗ (aN,m )) + (1 − δ) log N is nondecreasing in m for fixed N, n. Then we can consider a left-side confidence interval of the tail index m as 0 1 log(nN 1−δ ) log F̂N∗−1 (1 − exp {−N −δ/2 Φ−1 (1 − α) − (1 − δ) log N }) ; ∞A (5) and a right-side interval of m as 0 1 0; log(nN 1−δ ) log F̂N∗−1 (1 − exp {N −δ/2 Φ−1 (1 − α) − (1 − δ) log N }) A , (6) where F̂N∗−1 (x) = inf{s : F̂N∗ (s)) > x}. (7) Analogously 0 1 log(N 1−δ ) log F̂N∗∗−1 (1 − exp {−N −δ/2 Φ−1 (1 − α) − (1 − δ) log N }) and ; ∞A 0 (8) 1 0; log(N 1−δ ) log F̂N∗∗−1 (1 − exp {N −δ/2 Φ−1 (1 − α) − (1 − δ) log N }) A (9) are one-side intervals of m based on subsamples means. Similarly, 0 1 log(nN 1−δ ) log 2 F̂N∗∗∗−1 (1 − exp {−N −δ/2 Φ−1 (1 − α) − (1 − δ) log N }) and ; ∞A 0 0; (10) 1 log(nN 1−δ ) log 2 F̂N∗∗∗−1 (1 − exp {N −δ/2 Φ−1 (1 − α) − (1 − δ) log N }) A . are one-side intervals of m based on averages of extremes of two subsamples. (11) 1304 Jan Picek 3 Numerical illustration The performance of the proposed estimation procedures is illustrated on the simulated random samples of sizes 1000, splitted to N = 200 subsamples of sizes n = 5; the samples were generated 1000 times from the following distributions: Pareto F (x) = 1 − Burr F (x) = 1 − Generalized Pareto F (x) = 1 1+x m 1 1+xm 8 > > 1 − (1 + > > > > > < 1 − (1 + , α x≥0 , x≥0 x −m ) mβ if x ≥ 0, 0 < m < ∞, β > 0, x −m ) mβ if 0 ≤ x ≤ −mβ, m < 0, β > 0, > > 1 − e−x/β > > > > > : if m = ∞, β > 0, 0 otherwise. For either of these distributions we proceeded as follows: (1) Generated independent observations (Xj1 , . . . , Xjn )′ , j = 1, . . . , N ; (2) computed (j) A) maxima X(n) = max(Xj1 , . . . , Xjn ), j = 1, . . . , N ; (1) (N) B) sample means X n , . . . , X n ; C) means of two maxima 1 max(Xj1 , . . . , Xjν ) + max(Xj(ν+1) , . . . , Xjn ) , 4 j = 1, . . . , N, where ν = [n/2]; (3) found the corresponding empirical distribution function for each of the quantities A–C; (4) computed the confidence intervals for δ = 0.01, 0.02, . . . , 0.60. (5) For a comparison, the confidence intervals based on the asymptotic normal approximation of Hill estimator: θ̂n(j) = p (k) p (k)H(k) + Φ−1 (1 − α)H(k) ! ;∞ , where H(k) is the Hill estimator H(k) = k 1X log X(n−i+1:n) − log X(n−k:n) . k i=1 (6) The steps (1)-(5) were repeated 1 000 times. (7) selected statistics and ”coverage probability” were computed. Figures 1–6 show the empirical coverage probability and the figure 7 showes the quantiles of left-side interval values in dependence on the choice of δ (or k, respectively). 1305 0.6 0.4 0.0 0.2 coverage probability 0.8 1.0 Confidence intervals of the tail index 0.0 0.1 0.2 0.3 0.4 0.5 0.6 delta !h 0.6 0.4 0.0 0.2 coverage probability 0.8 1.0 Fig. 1. The empirical coverage probability for Pareto (m=1.0) distribution in dependence of δ for the intervals based on subsamples maxima (solid), on subsamples means (dotted) and on averages of maxima of two subsamples (dashed). 0 200 400 600 800 1000 delta Fig. 2. The empirical coverage probability for Burr (m=1.0) distribution in dependence of the sample fraction k for the interval based on of Hill estimator. Conclusion The simulation study show better the coverage probability for intervals based on the Hill estimator. However, our left-side values of interval seem being less sensitive and stable to the choice of δ than the values of Hill interval with respect to k. Jan Picek 0.6 0.4 0.0 0.2 coverage probability 0.8 1.0 1306 0.0 0.1 0.2 0.3 0.4 0.5 0.6 delta 0.6 0.4 0.0 0.2 coverage probability 0.8 1.0 Fig. 3. The empirical coverage probability for Burr (m=1.0) distribution in dependence of δ for the intervals based on subsamples maxima (solid), on subsamples means (dotted) and on averages of maxima of two subsamples (dashed). 0 200 400 600 800 1000 delta Fig. 4. The empirical coverage probability for Burr (m=1.0) distribution in dependence of the sample fraction k for the interval based on of Hill estimator. Acknowledgements The author thanks two referees for their valuable comments and suggestions, which led to a better readability of the text. This work was supported by Czech Republic Grant 201/05/2340. 1307 0.6 0.4 0.0 0.2 coverage probability 0.8 1.0 Confidence intervals of the tail index 0.0 0.1 0.2 0.3 0.4 0.5 0.6 delta 0.6 0.4 0.0 0.2 coverage probability 0.8 1.0 Fig. 5. The empirical coverage probability for Generalized Pareto (m=1.0) distribution in dependence of δ for the intervals based on subsamples maxima (solid), on subsamples means (dotted) and on averages of maxima of two subsamples (dashed). 0 200 400 600 800 1000 delta Fig. 6. The empirical coverage probability for Generalized Pareto (m=1.0) distribution in dependence of the sample fraction k for the interval based on of Hill estimator. References [CHP01] S. Cheng, and L. Peng, Confidence intervals for the tail index. Bernoulli, 7(5), (2001), 751–760. [FJP03] A. Fialová, J. Jurečková and J. Picek, Estimating Pareto tail index using all observations. REVSTAT 2, (2004) 75-99. Jan Picek 0.8 quantiles 0.6 1.0 0.2 0.6 0.4 0.8 quantiles 1.0 1.2 1.2 1.4 1.4 1308 0.0 0.1 0.2 0.3 delta 0.4 0.5 0.6 0 200 400 600 800 1000 k Fig. 7. The quantiles of left-side interval values for Burr (m=1.0) distribution – 50%(solid), 5%(dotted) and 95% (dashed). The quantiles corresponding to the interval based on subsamples maxima are left and the quantiles corresponding to the interval based on the Hill estimator are right. [Jur00] [JP01] [JP04] [PJ01] J. Jurečková, Tests of tails based on extreme regression quantiles. Statist. & Probab. Letters 49 (2000), 53–61. J. Jurečková and J. Picek, A class of tests on the tail index. Extremes 4:2 (2001), 165–183. J. Jurečková and J. Picek, Estimates of the tail index based on nonparametric tests. In: Theory and Applications of Recent Robust Methods, edited by M. Hubert, G. Pison, A. Struyf and S. Van Aelst, Series: Statistics for Industry and Technology, Birkhauser, Basel, 2004. J. Picek and J. Jurečková, A class of tests on the tail index using the modified extreme regression quantiles. “ROBUST’2000” (J. Antoch, G. Dohnal, Eds.), Union of Czech Mathematicians and Physicists, Prague, 2001, 217– 226.