Pál Rakonczai, László Varga, András Zempléni Copula fitting to time-dependent data, with applications to wind speed maxima Eötvös Loránd University Faculty of Science Institute of Mathematics Department of Probability Theory and Statistics Outline 1. 2. 3. 4. 5. Copulae Goodness-of-fit tests Bootstrap methods Serial dependence Applications to wind speed maxima 2 1. Copulae • C is a copula, if it is a d-dimensional random vector with marginals ~ Unif [0,1] • Existence (Sklar’s Theorem): to any d-dimensional random variable X with c.d.f. H and marginals Fi (i=1,...,d) there exists a copula C : H( x1, …, xd ) = C ( F1(x1), …, Fd(xd ) ) • Uniqueness: if Fi are continuous (i=1,...,d) • Separation of the marginal model and the dependence 3 1. Copulae – Examples Elliptical Copulae – copulae of elliptical distributions – Gaussian: X ~ Nn(0,Σ) C Ga 1 (u ) ( u1 ) where 1 (ud ) 2 C ,v (u ) t where 1/ 2 e 1 2 t x 1 x dx 1 ... dx d Φ: c.d.f. of N(0,1) 1 tv ( u1 ) n/2 – Student’s t: X ~ Student 1 tv 1 (ud ) n ,v 0 , vn vn 1 t 1 2 2 dx 1 ... dx d 1 x x v 1/ 2 v n/2 2 tv: c.d.f. of Student’s t distribution with v degrees of freedom 4 1. Copulae - Examples Archimedean Copulae Copula generator function: (u ) : 0 ,1 0 , ϕ is continuous, strictly decreasing and ϕ(1)=0. d-variate Archimedean copula: – Gumbel: C Gumbel ( u ) e d i 1 d (u i ) i 1 where 1, ( u ) ln( u ) C (u ) 1 log u i 1 / – Clayton: ( u ) u 1 where > 0 C Clayton ( u ) u i d 1 i 1 d 1 / 5 1. Copulae - Examples Simulation from t-copula 1.0 1.0 Simulation from Gauss copula 0.8 n 1500 0.0 0.0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 0.75 n 1500 0.0 0.2 0.4 0.6 0.8 0.0 1.0 0.2 1.0 Simulation from Gumbel copula 0.4 0.6 0.8 1.0 Simulation from Clayton copula 2.5 n 1500 0.0 0.0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1.0 2.5 n 1500 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 6 2. Goodness-of-fit tests in one dimension 1. Estimation of the model parameter 2. Goodness-of fit test: H 0 : C C 0 C θ , θ Θ a) Cramér-von Mises tests: • • • T n 2 Fn: empirical c.d.f. F: c.d.f. Φ : weight function Anderson-Darling: ( Fn ( x ) F ( x )) ( x ) dF ( x ) ( x) b) Critical value – simulation: 1 F x 1 F x 1) Simulate a sample from the copula model Cθ under H0 2) Re-estimate ˆ by ML-method 3) Calculate the test statistics Repetition and estimation of p values 7 2. Goodness-of-fit tests in more dimensions • Probability integral transformation (PIT) – mapping into the d-dimensional unit cube: Observatio ns X i ( X i1 ,..., X id ) PIT ~H observatio ns Pseudo U i (U i1 ,..., U id ) ~C, for i=1,...,n • Kendall’s transform: (K function) Κ ( , t ) P ( C ( F1 ( X 1 ),..., ( Fd ( X d )) t ) P ( C (U 1 ,..., U d ) t ) Advantage: one-dimensional d 1 1i i 1 i! – Example: Archimedean copulas: K , t t where f i , t d dx i 1 i i f i , t t x x t 8 2. Goodness-of-fit tests in more dimensions • Empirical version: K n t n 1 1 E n in t , t 0 ,1 i 1 where E in n 1 U n 1 j1 U i1 ,..., U jd U id j 1 • Kendall’s process: n ( t ) n ( K ( n , t ) K n ( t )) favorable asymptotic properties 1 • Cramér-von Mises type statistic: S n ( n ( t )) 2 ( t ) dt where Φ : weight function 0 9 3. Serial dependence • Let X1, X2, ..., Xn be univariate stationary observations; EXi =μ , Var(Xi )=σ2. 2 • If X1, X2, ..., Xn are i.i.d., then Var ( X ) n • Serial dependence → higher variance • Effective sample size (ne): ne 2 Var ( X ) where Var ( X ) : estimated variance ← bootstrap 10 4. Bootstrap methods - Bootstrap intro • Efron (1979) • Let X1, X2, ... be i.i.d. random variables with (unknown) common distribution F – Xn={X1, ..., Xn} random sample – Tn=tn(Xn; F) random variable of interest, it’s distribution: Gn • Goal: approximation of the distribution Gn • Bootstrap method: – For given Xn, we draw a simple random sample X m of size m (usually m ≈ n) n 1 – Common distribution of X i ’s: Fn n X i 1 – T m ,n t m X m ; Fn – Repetition Gˆ m , n {X 1 , ..., X m } i 11 4. Bootstrap methods - CBB • Nonparametric bootstrap (sample size: n) – Block bootstrap • Circular block bootstrap (CBB) 1. Let Y t X mod ( t ) 2. For some m, let i1, i2 ..., im be a uniform sample from the set {1, 2, ..., n} 3. For block size b, construct n’=m·b (n’≈n) pseudo-data: Y mb j Y i j 1 for j=1,...,b 4. Functional of interest, e.g. bootstrap sample mean: Y n' n ' 1 Y1 ... Y n' n m 12 4. Bootstrap methods – Block-length selection D.N.Politis-H. White (2004): automatic block-length selection • Minimalize: MSE ( b , X ) 2 where D 4 3 G 2 b 2 D b o (b 2 n b ) o n 2 g (0) and G k R (k ) k g(.): spectral density function R(.): autocovariance function • Optimal block size: b opt • Estimation of G and D 2G 2 1/ 3 n D 13 5. Applications to wind speed maxima • Sample: n = 2591 observations of weekly wind speed maxima for 5 German towns • Automatic block-length selection results: Optimal blocklength Hamburg 31 Hannover 11 Bremerhaven 28 Fehmarn 31 Schleswig 15 Town meteorologically no sense 14 5. Applications to wind speed maxima Method: 1. Fitting AR(1) modell to the data: X t X t 1 Z t , Zt ~Extreme value distr. 2. Calculation of the theoretical Var ( X n ) from AR(1) parameters: n 1 2 2ˆ n ˆ 2ˆ n Var ( X n ) 2 2 n (1 ˆ ) n (ˆ 1) ˆ 2 3. b* optimal block size: where the simulated variance of the mean first crosses the theoretical value 15 5. Applications to wind speed maxima Bootstrap simulation results b* = 6 16 5. Applications to wind speed maxima Bootstrap simulation results Optimal block- X-mean length variance Hamburg 8-9 0,0038 Hannover 7 0,0067 Bremerhaven 6 0,0073 Fehmarn 7 0,0035 Schleswig 13 0,0037 Town Theoretical Deviation IID X-mean- Sample size value (%) variance reduction 0,0034 10,90% 0,0020 1,85 0,0071 -5,29% 0,0042 1,59 0,0077 -6,15% 0,0043 1,71 0,0034 3,43% 0,0020 1,74 0,0030 22,79% 0,0018 2,09 17 5. Applications to wind speed maxima Bremerhaven & Fehmarn Bremerhaven & Fehmarn 0.0 0.2 0.4 0.6 0.8 1.0 n=1514 n=2571 95% critical value observed statistics Empirical K Theoretical K 0.0 0.6 0.8 1.0 1.0 0.0 0.2 0.4 0.6 0.8 1.0 CLAYTON 0.6 0.8 GUMBEL 0.4 STUD-T 0.2 GAUSS 0.0 Empirical K Theoretical K 0.0000 0.0 Empirical K Theoretical K 0.0 0.4 0.4 0.8 0.0004 Student-t 0.8 Gauss 0.2 0.4 0.0008 0.0 Empirical K Theoretical K 0.0 0.4 0.4 0.0012 0.8 Clayton 0.8 Gumbel 18 5. Applications to wind speed maxima Bremerhaven & Schleswig Bremerhaven & Schleswig 0.0 0.2 0.4 0.6 0.8 1.0 n=1514 n=2571 95% critical value observed statistics Empirical K Theoretical K 0.0 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 CLAYTON 1.0 GUMBEL 0.6 0.8 STUD-T 0.2 0.4 GAUSS 0.0 Empirical K Theoretical K 0.0000 0.0 Empirical K Theoretical K 0.0 0.4 0.4 0.8 0.0004 Student-t 0.8 Gauss 0.2 0.4 0.0008 0.0 Empirical K Theoretical K 0.0 0.4 0.4 0.0012 0.8 Clayton 0.8 Gumbel 19 5. Applications to wind speed maxima Fehmarn & Schleswig Fehmarn & Schleswig 0.8 CLAYTON GUMBEL STUD-T 0.0 0.2 0.4 0.6 0.8 1.0 GAUSS Empirical K Theoretical K 0.0000 0.4 0.0 K?t? 0.8 0.0004 Student-t 0.0008 0.0 Gauss 0.8 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 95% critical value observed statistics Empirical K Theoretical K 0.0 0.2 0.4 0.6 0.8 1.0 Empirical K Theoretical K n=1514 n=2571 0.4 K?t? 0.4 0.0 Empirical K Theoretical K 0.4 0.0 0.0012 Clayton 0.8 Gumbel 20 5. Applications to wind speed maxima Prediction regions (Bremerhaven & Fehmarn) 25 block=1 lower bound block=7 lower bound block=30 lower bound block=1 upper bound block=7 upper bound block=30 upper bound Wind speed (m/s) 20 15 10 5 Pred. regions: 50-95-99.8% lower(5%) bounds upper(95%) bounds 0 0 5 10 15 20 Wind speed (m/s) 25 30 21 Final remarks Conclusions • Copula choice is important • Serial dependence largely influences the critical values of GoF tests • Block size does not have a major impact on the estimated prediction region Future work • Multivariate effective sample size • Parametric bootstrap Acknowledgement • We are grateful to the Doctoral School of Mathematics of ELTE for supporting L. Varga’s participation at SMTDA Conference. Thank you for the attention 23 References • P. Rakonczai, A. Zempléni: Copulas and goodness of fit tests. Recent advances in stochastic modeling and data analysis, World Scientific, pp. 198-206, 2007. • S.N. Lahiri: Resampling Methods for Dependent Data. Springer, 2003. • D.N.Politis, H.White: Automatic Block-Length Selection for the Dependent Bootstrap. Econometric Reviews, Vol. 23, pp. 53-70, 2004. • P. Embrechts, F. Lindskog, A. McNeil: Modelling Dependence with Copulas and Applications to Risk Management. Department of Mathematics, ETHZ, Zürich, 2001. • L.Kish: Survey Sampling, J. Wiley, 1965. 24