Archimedean Copulas

advertisement
Archimedean Copulas
Theodore Charitos
MSc. Student
CROSS
Task-Goal
• Examination of relations between Archimedean
copulas and diagonal band or minimum
information copulas given correlation constraints
• Computation of relative information with respect
to uniform distribution for each family.
Accomplishment of tasks
• Use of the algorithm provided in the paper of
Christian Genest and Louis-Paul Rivest
“Statistical Inference Procedures for Bivariate
Archimedean Copulas”.
• Use of small program in Matlab for calculating
numerically the relative information with respect
to uniform distribution.
Structure of presentation
• Theoretical Background
• Explanation and description of the whole
procedure proposed by Genest and Rivest.
• Analysis of datasets sampled from Unicorn
software.
• Results
• Conclusions-Discussions
Theoretical Background
Definition: A bivariate distribution function with
marginals F x  and G  y  is said to be generated by an
Archimedean copula if it can be expressed in the form
H x, y  
  F x    G y  for some convex,
1
decreasing function  on 0,1 in such a way that  1  0
Proposition: Let X and Y be uniform random variates whose
1
dependence function H x, y  is of the form   x     y 
for some convex decreasing function


defined on 0,1 with the
 X 
property that  1  0 . Set U 
, V  H X ,Y 
  X    Y 
and
V
 u  
 u 
. Then
 ' u 
is distributed as
U is uniformly distributed on 0,1 ,
K u   u   u  and U , V are
independent random variables.
Proposition: Let X and Y be uniform random variables with
dependence function H x, y  . For 0  u  1 let

K
u
 lim K t 
K u   PrH  X , Y   u and define
 
t u
The function  u  is convex,decreasing and satisfies  1  0
 

and only if K u  u for all
if
0  u  1.
It is obvious from the above propositions that  u  is determined
as long as  u  can be determined from the dataset. This will be
done in our case via a nonparametric estimation of the distribution
K u  of V based on a decomposition of Kendall’s tau.
A pair of random variables is concordant if large
values of one tend to be associated with large
values of the other and vice versa. More
precisely, if we have two observations xi , yi 
and x j , y j  from a vector  X , Y  of continuous
random variables we say that xi , yi  and x j , y j 
are concordant if xi  x j and yi  y j . Similarly,
xi , yi  and x j , y j  are discordant if xi  x j
and yi  y j or vice versa.
Definition
The Kendall’s tau for the sample is defined as
c  d
n

 2

 
where c is the number of concordant pairs, d is
the number of discordant pairs from n
n
observations of a vector  X , Y  and  2  is the
number of distinct pairs of observations in the
sample.
For Archimedean copulas the Kendall’s tau
statistic can be conveniently computed via the
identity
1
  4 E V   1  4  u du  1
0
Apparently, the problem now of estimating the
bivariate dependence function relies on the
estimation of
. Genest and Rivest provide a
nonparametric procedure for estimating K u 
and also
.


Analysis of various datasets
The algorithm proposed by Genest and Rivest uses
# X j , Y j  : X j  X i , Y j  Yi  where
the variables
Vi 
n  1
the symbol # stands for the cardinality of a set. If
 t  denotes the distribution function of a point
mass at the origin, then a nonparametricn
 u  Vi 
estimator of K u  is given by K n u   
n
1
Knowing that   4EV   1 , a sample
equivalent for the estimation of  is  n  4V
1

Family
Clayton
-
Frank
v
a




v1  v
a
1
a
a

1  exp av  1  exp a  

log
a exp av
 1  exp av 
 1  exp a  

log
 1  exp av 
Gumbel  logv
a 1
 v log v 
a 1
a
a2
4D1 a   1
1
a
a
a 1
The next step of the analysis concerns the
performance of a Pearson chi-squared goodness
of fit test statistic for each family in order to
assess the fit of the various models. This means
that a classification of the dataset is made each
time constituting the observed frequencies.
However, since the chi-squared test requires
predicted values for its computation, it is
necessary to generate random variates u, v 
whose joint distribution belongs to one of the
mentioned Archimedean families
Algorithm for sampling from Archimedean families
1.Generate two independent uniform 0,1 variates
u and t.
'



u
'

2. Set w   
1

t

3. Set v =  1  w   u 
4. The desired pair is u, v 
Archimedean Family
H x, y 
Clayton
x


a
 y 1

Frank
Gumbel
a
1 / a


1  e  ax  1 e  ay  1 
 ln 1 

a 
e a  1

exp   ln x 
a 1

a  1 1 /  a  1
  ln y 

Clayton’s joint density with a=1.514
Frank’s joint density with a=4.604
Gumbel’s joint density with a=0.757
In general, the relative information with respect to
uniform distribution for the bivariate case is
1 1
computed as I h / u     hx, y   loghx, y dxdy
0 0
where hx, y  is the joint density of X and Y
An approximation of the real solution in each case
will be provided, which however is enough to
indicate what should someone expect from each
Archimedean family.
• To illustrate the above procedure six datasets
(n=1000) were at first sampled and thoroughly
analyzed. The correlations were 0.2, 0.65 and 0.9
for both the diagonal band and the minimum
information copulas. A 4  4 classification of
the frequencies was also decided.
• For the sake of completeness, six more datasets
with similar correlations constraints but different
size (n=5000) and 6  6 classification were
also analyzed in order to compare results.
Recapitulation-Steps
• Sample from diagonal band and minimum information copulas.
• Estimate Kendal’s tau and the empirical lambda function.
• Estimate the parameters for each family according to the previous results.
• Estimate the lambda functions for each family.
• Classify the dataset in categories and simulate values from each family according
to their estimated parameters.
• Perform chi-square goodness of fit test and compare the resulting fits.
• Compute the relative information with respect to uniform distribution.
• Repeat the whole procedure for different correlations and size of the dataset.
Examples of classifications from diag.band with 0.2
4  4 Cross-Classification of X and Y (Observed values)
0,0.1
0.5,0.9 0.9,1
X\Y
0.1,0.5
0,0.1
0.1,0.5
0.5,0.9
0.9,1

8
47
40
0
43
166
131
49

36
144
186
46
0
38
52
14
6  6 Cross-Classification of X and Y (Observed values)
X\Y
0,0.1 0.1,0.3 0.3,0.5
0.5,0.7 0.7,0.9
0,0.1
0.1,0.3
0.3,0.5
0.5,0.7
0.7,0.9
0.9,1
67
105
129
126
80
0
126
222
211
196
119
84
118
230
207
121
183
123
111
173
131
199
233
152
78
118
203
244
239
126
0.9,1
1
78
131
128
127
77
 n =0.0877273
 2 Statistic
df
Clayton
47.2727 8
Frank
34.4113 8
Gumbel
49.5668
8
 n =0.1076116
 2Statistic
df
Clayton
52.3819
7
Frank
32.7662
8
Gumbel
47.2727
8
n
=0.431007
 2 Statistic
df
219.842
5
66.528
7
Gumbel
129.861
6
n
=0.424140
Clayton
Frank
Clayton
Frank
Gumbel
 2 Statistic
df
113.601
5
24.808
7
119.047
7
n
=0.6820981
 2 Statistic
df
Clayton 265.805
3
Frank
67.031
3
182.181
3
Gumbel
n
=0.6991151
 2 Statistic
df
Clayton 215.628
3
Frank
39.309
3
Gumbel 140.177
3
n
=0.1083165
 2 Statistic
df
Clayton
356.184
25
Frank
301.967
25
Gumbel
357.828
25
n
Clayton
Frank
Gumbel
=0.1040581
 2 Statistic
df
148.598
25
65.967
25
94.2128
25
 n =0.4270299
 2 Statistic
df
Clayton
1707.548
22
Frank
604.8511
23
Gumbel
1398.062
23
 n =0.4398811
 2 Statistic
df
Clayton
1044.165
23
Frank
127.982
23
Gumbel
557.023
23
 n =0.6801317
 2 Statistic
df
Clayton
1685.918
15
Frank
473.3928
17
Gumbel
1128.659
17
n
=0.6906168
 2 Statistic
Clayton
df
1142.479
14
Frank
124.384
16
Gumbel
677.908
17
Relative Information with respect to uniform distribution
Clayton
Frank
Gumbel
Diag.band 0.2 (n=1000)
0.0144
0.8261
0.0887
Min.inf.0.2 (n=1000)
0.0215
0.7925
0.1099
Diag.band 065 (n=1000)
0.3207
0.4932
0.5222
Min.inf.0.65 (n=1000)
0.3107
0.4944
0.5126
Diag.band 0.9 (n=1000)
0.8653
0.6884
0.8794
Min.inf.0.9 (n=1000)
0.9202
0.7247
0.9015
Diag.band0.2 (n=5000)
0.0218
0.7913
0.1106
Min.inf.0.2 (n=5000)
0.0202
0.7983
0.1060
Diag.band 0.65 (n=5000)
0.3150
0.4939
0.5167
Min.inf.0.65 (n=5000)
0.3343
0.4920
0.5351
Diag.band0.9 (n=5000)
0.8592
0.6844
0.8768
Min.inf.0.9 (n=5000)
0.8924
0.7060
0.8905
Conclusions-Comments
• for correlation 0.2 all the three families seem to fit reasonably well when n=1000,but
when n=5000 only Frank’s and also Gumbel’s for the min.information.
• for correlation 0.65 the results are quite promising only when n=1000
• for correlation 0.9 Frank’s and Gumbel’s family seem to fit the data
and only Frank’s family when n=5000.
when n=1000
• the results are more promising in the cases of minimum information copula and this is
actually a fact that holds for all the datasets no matter what the correlation is.
• the results are much better when the size of the dataset is smaller.
• It is obvious that the chi-square test statistic is sensitive to the number of cells.For
greater size n and 6x6 cross-classification the results in almost all cases are
disappointing. A performance of another goodness of fit test might result in more
encouraging conclusions.
• for correlation 0.2 Clayton’s family has the smallest values of relative information
with respect to uniform distribution.
• Nonetheless, for greater correlations, Frank’s family has the smallest values.
Download