Document

advertisement
Pál Rakonczai, László Varga,
András Zempléni
Copula fitting to time-dependent data,
with applications to wind speed maxima
Eötvös Loránd University
Faculty of Science
Institute of Mathematics
Department of Probability
Theory and Statistics
Outline
1.
2.
3.
4.
5.
Copulae
Goodness-of-fit tests
Bootstrap methods
Serial dependence
Applications to wind speed maxima
2
1. Copulae
• C is a copula, if it is a d-dimensional random vector
with marginals ~ Unif [0,1]
• Existence (Sklar’s Theorem): to any d-dimensional
random variable X with c.d.f. H and marginals Fi
(i=1,...,d) there exists a copula C :
H( x1, …, xd ) = C ( F1(x1), …, Fd(xd ) )
• Uniqueness: if Fi are continuous (i=1,...,d)
• Separation of the marginal model and the dependence
3
1. Copulae – Examples
Elliptical Copulae – copulae of elliptical distributions
– Gaussian: X ~ Nn(0,Σ)

C
Ga

1

(u ) 

( u1 )


where
1
(ud )
  2 
C  ,v (u ) 
t


where


1/ 2
e
1
2
t
x 
1
x
dx 1 ... dx d
Φ: c.d.f. of N(0,1)
1
tv
( u1 )
n/2

– Student’s t: X ~ Student
1
tv

1
(ud )





n ,v
0 ,  
vn
vn



1 t 1  2

 2 
dx 1 ... dx d
1  x  x 
v
1/ 2 
v
n/2






2
tv: c.d.f. of Student’s t distribution with v degrees of freedom
4
1. Copulae - Examples
Archimedean Copulae
Copula generator function:  (u ) : 0 ,1  0 ,  
ϕ is continuous, strictly decreasing and ϕ(1)=0.
d-variate Archimedean copula:
– Gumbel:
C Gumbel ( u )  e

d

i 1
 d

   (u i ) 
 i 1

where   1,  
  ( u )   ln( u ) 




C  (u )  
1

 log  u i  


1 / 

– Clayton:   ( u )  u    1 where  > 0



C Clayton ( u )    u i  d  1 
 i 1

d
1 /
5
1. Copulae - Examples
Simulation from t-copula
1.0
1.0
Simulation from Gauss copula
0.8
n 1500
0.0
0.0
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
0.75
n 1500
0.0
0.2
0.4
0.6
0.8
0.0
1.0
0.2
1.0
Simulation from Gumbel copula
0.4
0.6
0.8
1.0
Simulation from Clayton copula
2.5
n 1500
0.0
0.0
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1.0
2.5
n 1500
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
6
2. Goodness-of-fit tests
in one dimension
1. Estimation of the model parameter
2. Goodness-of fit test: H 0 : C  C 0  C θ , θ  Θ 

a) Cramér-von Mises tests:
•
•
•
T n
2

Fn: empirical c.d.f.
F: c.d.f.
Φ : weight function
Anderson-Darling:
 ( Fn ( x )  F ( x ))  ( x ) dF ( x )
 ( x) 
b) Critical value – simulation:
1
F  x 1  F  x 
1) Simulate a sample from the copula model Cθ under H0
2) Re-estimate ˆ by ML-method
3) Calculate the test statistics
Repetition and estimation of p values
7
2. Goodness-of-fit tests
in more dimensions
• Probability integral transformation (PIT) – mapping into
the d-dimensional unit cube:
  Observatio
  ns 
X i  ( X i1 ,..., X id )
 PIT
~H 
observatio ns
Pseudo
 
 
U i  (U i1 ,..., U id ) ~C, for i=1,...,n
• Kendall’s transform: (K function)
Κ ( , t )  P ( C  ( F1 ( X 1 ),..., ( Fd ( X d ))  t )  P ( C  (U 1 ,..., U d )  t )
Advantage: one-dimensional
d 1
 1i
i 1
i!
– Example: Archimedean copulas: K  , t   t  
where
f i  , t  
d
dx
i
1
i


i


 f i  , t 
t

x 
x    t 
8
2. Goodness-of-fit tests
in more dimensions
• Empirical version:
K n t  
n
1
1 E

n
in
 t , t  0 ,1
i 1
where
E in 
n
1 U

n
1
j1
 U i1 ,..., U
jd
 U id

j 1
• Kendall’s process:  n ( t ) 

n ( K ( n , t )  K n ( t ))
favorable asymptotic properties
1
• Cramér-von Mises type statistic: S n   ( n ( t )) 2  ( t ) dt

where Φ : weight function
0
9
3. Serial dependence
• Let X1, X2, ..., Xn be univariate stationary
observations; EXi =μ , Var(Xi )=σ2.
2

• If X1, X2, ..., Xn are i.i.d., then Var ( X ) 
n
• Serial dependence → higher variance
• Effective sample size (ne):
ne 

2

Var ( X )

where Var ( X ) : estimated variance ← bootstrap
10
4. Bootstrap methods - Bootstrap intro
• Efron (1979)
• Let X1, X2, ... be i.i.d. random variables with (unknown)
common distribution F
– Xn={X1, ..., Xn} random sample
– Tn=tn(Xn; F) random variable of interest, it’s distribution: Gn
• Goal: approximation of the distribution Gn
• Bootstrap method:
– For given Xn, we draw a simple random sample X m
of size m (usually m ≈ n)
n

1
– Common distribution of X i ’s: Fn  n   X
i 1


– T m ,n  t m X m ; Fn 
– Repetition  Gˆ m , n


 {X 1 , ..., X m }
i
11
4. Bootstrap methods - CBB
• Nonparametric bootstrap (sample size: n)
– Block bootstrap
• Circular block bootstrap (CBB)
1. Let Y t  X mod ( t )
2. For some m, let i1, i2 ..., im be a uniform sample
from the set {1, 2, ..., n}
3. For block size b, construct n’=m·b (n’≈n)
pseudo-data: Y mb  j  Y i  j 1 for j=1,...,b
4. Functional of interest, e.g. bootstrap sample
mean: Y n'   n '  1 Y1  ...  Y n' 
n
m
12
4. Bootstrap methods – Block-length selection
D.N.Politis-H. White (2004): automatic block-length
selection
• Minimalize: MSE ( b , X ) 
2
where
D 
4
3
G
2
b
2
D
b
 o (b
2
n
b
)  o 
n

2
g (0)
and G 

k R (k )
k  
g(.): spectral density function
R(.): autocovariance function
• Optimal block size: b opt
• Estimation of G and D
 2G 2  1/ 3
n
 

D


13
5. Applications to wind speed maxima
• Sample: n = 2591 observations of weekly wind speed
maxima for 5 German towns
• Automatic block-length
selection results:
Optimal blocklength
Hamburg
31
Hannover
11
Bremerhaven
28
Fehmarn
31
Schleswig
15
Town
meteorologically no sense
14
5. Applications to wind speed maxima
Method:
1. Fitting AR(1) modell to the data:
X t     X t 1  Z t , Zt ~Extreme value distr.
2. Calculation of the theoretical Var ( X n )
from AR(1) parameters:
n 1
2
2ˆ  n ˆ  2ˆ  n
Var ( X n ) 
2
2
n (1  ˆ )
n (ˆ  1)
ˆ
2
3. b* optimal block size: where the
simulated variance of the mean first
crosses the theoretical value
15
5. Applications to wind speed maxima
Bootstrap simulation results
b* = 6
16
5. Applications to wind speed maxima
Bootstrap simulation results
Optimal block- X-mean
length
variance
Hamburg
8-9
0,0038
Hannover
7
0,0067
Bremerhaven
6
0,0073
Fehmarn
7
0,0035
Schleswig
13
0,0037
Town
Theoretical Deviation IID X-mean- Sample size
value
(%)
variance
reduction
0,0034
10,90%
0,0020
1,85
0,0071
-5,29%
0,0042
1,59
0,0077
-6,15%
0,0043
1,71
0,0034
3,43%
0,0020
1,74
0,0030
22,79%
0,0018
2,09
17
5. Applications to wind speed maxima
Bremerhaven & Fehmarn
Bremerhaven & Fehmarn
0.0
0.2
0.4
0.6 0.8
1.0
n=1514
n=2571
95% critical value
observed statistics
Empirical K
Theoretical K
0.0
0.6 0.8
1.0
1.0
0.0
0.2 0.4
0.6 0.8
1.0
CLAYTON
0.6 0.8
GUMBEL
0.4
STUD-T
0.2
GAUSS
0.0
Empirical K
Theoretical K
0.0000
0.0
Empirical K
Theoretical K
0.0
0.4
0.4
0.8
0.0004
Student-t
0.8
Gauss
0.2 0.4
0.0008
0.0
Empirical K
Theoretical K
0.0
0.4
0.4
0.0012
0.8
Clayton
0.8
Gumbel
18
5. Applications to wind speed maxima
Bremerhaven & Schleswig
Bremerhaven & Schleswig
0.0
0.2 0.4
0.6 0.8
1.0
n=1514
n=2571
95% critical value
observed statistics
Empirical K
Theoretical K
0.0
0.6 0.8
1.0
0.0
0.2 0.4
0.6 0.8
1.0
CLAYTON
1.0
GUMBEL
0.6 0.8
STUD-T
0.2 0.4
GAUSS
0.0
Empirical K
Theoretical K
0.0000
0.0
Empirical K
Theoretical K
0.0
0.4
0.4
0.8
0.0004
Student-t
0.8
Gauss
0.2 0.4
0.0008
0.0
Empirical K
Theoretical K
0.0
0.4
0.4
0.0012
0.8
Clayton
0.8
Gumbel
19
5. Applications to wind speed maxima
Fehmarn & Schleswig
Fehmarn & Schleswig
0.8
CLAYTON
GUMBEL
STUD-T
0.0 0.2 0.4 0.6 0.8 1.0
GAUSS
Empirical K
Theoretical K
0.0000
0.4
0.0
K?t?
0.8
0.0004
Student-t
0.0008
0.0
Gauss
0.8
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
95% critical value
observed statistics
Empirical K
Theoretical K
0.0 0.2 0.4 0.6 0.8 1.0
Empirical K
Theoretical K
n=1514
n=2571
0.4
K?t?
0.4
0.0
Empirical K
Theoretical K
0.4
0.0
0.0012
Clayton
0.8
Gumbel
20
5. Applications to wind speed maxima
Prediction regions (Bremerhaven & Fehmarn)
25
block=1 lower bound
block=7 lower bound
block=30 lower bound
block=1 upper bound
block=7 upper bound
block=30 upper bound
Wind speed (m/s)
20
15
10
5
Pred. regions: 50-95-99.8%
lower(5%) bounds
upper(95%) bounds
0
0
5
10
15
20
Wind speed (m/s)
25
30
21
Final remarks
Conclusions
• Copula choice is important
• Serial dependence largely influences the critical values of
GoF tests
• Block size does not have a major impact on the estimated
prediction region
Future work
• Multivariate effective sample size
• Parametric bootstrap
Acknowledgement
• We are grateful to the Doctoral School of Mathematics of
ELTE for supporting L. Varga’s participation at SMTDA
Conference.
Thank you for the attention
23
References
• P. Rakonczai, A. Zempléni: Copulas and goodness of fit tests.
Recent advances in stochastic modeling and data analysis,
World Scientific, pp. 198-206, 2007.
• S.N. Lahiri: Resampling Methods for Dependent Data. Springer,
2003.
• D.N.Politis, H.White: Automatic Block-Length Selection for the
Dependent Bootstrap. Econometric Reviews, Vol. 23, pp. 53-70,
2004.
• P. Embrechts, F. Lindskog, A. McNeil: Modelling Dependence
with Copulas and Applications to Risk Management.
Department of Mathematics, ETHZ, Zürich, 2001.
• L.Kish: Survey Sampling, J. Wiley, 1965.
24
Download