Archimedean Copulas Theodore Charitos MSc. Student CROSS Task-Goal • Examination of relations between Archimedean copulas and diagonal band or minimum information copulas given correlation constraints • Computation of relative information with respect to uniform distribution for each family. Accomplishment of tasks • Use of the algorithm provided in the paper of Christian Genest and Louis-Paul Rivest “Statistical Inference Procedures for Bivariate Archimedean Copulas”. • Use of small program in Matlab for calculating numerically the relative information with respect to uniform distribution. Structure of presentation • Theoretical Background • Explanation and description of the whole procedure proposed by Genest and Rivest. • Analysis of datasets sampled from Unicorn software. • Results • Conclusions-Discussions Theoretical Background Definition: A bivariate distribution function with marginals F x and G y is said to be generated by an Archimedean copula if it can be expressed in the form H x, y F x G y for some convex, 1 decreasing function on 0,1 in such a way that 1 0 Proposition: Let X and Y be uniform random variates whose 1 dependence function H x, y is of the form x y for some convex decreasing function defined on 0,1 with the X property that 1 0 . Set U , V H X ,Y X Y and V u u . Then ' u is distributed as U is uniformly distributed on 0,1 , K u u u and U , V are independent random variables. Proposition: Let X and Y be uniform random variables with dependence function H x, y . For 0 u 1 let K u lim K t K u PrH X , Y u and define t u The function u is convex,decreasing and satisfies 1 0 and only if K u u for all if 0 u 1. It is obvious from the above propositions that u is determined as long as u can be determined from the dataset. This will be done in our case via a nonparametric estimation of the distribution K u of V based on a decomposition of Kendall’s tau. A pair of random variables is concordant if large values of one tend to be associated with large values of the other and vice versa. More precisely, if we have two observations xi , yi and x j , y j from a vector X , Y of continuous random variables we say that xi , yi and x j , y j are concordant if xi x j and yi y j . Similarly, xi , yi and x j , y j are discordant if xi x j and yi y j or vice versa. Definition The Kendall’s tau for the sample is defined as c d n 2 where c is the number of concordant pairs, d is the number of discordant pairs from n n observations of a vector X , Y and 2 is the number of distinct pairs of observations in the sample. For Archimedean copulas the Kendall’s tau statistic can be conveniently computed via the identity 1 4 E V 1 4 u du 1 0 Apparently, the problem now of estimating the bivariate dependence function relies on the estimation of . Genest and Rivest provide a nonparametric procedure for estimating K u and also . Analysis of various datasets The algorithm proposed by Genest and Rivest uses # X j , Y j : X j X i , Y j Yi where the variables Vi n 1 the symbol # stands for the cardinality of a set. If t denotes the distribution function of a point mass at the origin, then a nonparametricn u Vi estimator of K u is given by K n u n 1 Knowing that 4EV 1 , a sample equivalent for the estimation of is n 4V 1 Family Clayton - Frank v a v1 v a 1 a a 1 exp av 1 exp a log a exp av 1 exp av 1 exp a log 1 exp av Gumbel logv a 1 v log v a 1 a a2 4D1 a 1 1 a a a 1 The next step of the analysis concerns the performance of a Pearson chi-squared goodness of fit test statistic for each family in order to assess the fit of the various models. This means that a classification of the dataset is made each time constituting the observed frequencies. However, since the chi-squared test requires predicted values for its computation, it is necessary to generate random variates u, v whose joint distribution belongs to one of the mentioned Archimedean families Algorithm for sampling from Archimedean families 1.Generate two independent uniform 0,1 variates u and t. ' u ' 2. Set w 1 t 3. Set v = 1 w u 4. The desired pair is u, v Archimedean Family H x, y Clayton x a y 1 Frank Gumbel a 1 / a 1 e ax 1 e ay 1 ln 1 a e a 1 exp ln x a 1 a 1 1 / a 1 ln y Clayton’s joint density with a=1.514 Frank’s joint density with a=4.604 Gumbel’s joint density with a=0.757 In general, the relative information with respect to uniform distribution for the bivariate case is 1 1 computed as I h / u hx, y loghx, y dxdy 0 0 where hx, y is the joint density of X and Y An approximation of the real solution in each case will be provided, which however is enough to indicate what should someone expect from each Archimedean family. • To illustrate the above procedure six datasets (n=1000) were at first sampled and thoroughly analyzed. The correlations were 0.2, 0.65 and 0.9 for both the diagonal band and the minimum information copulas. A 4 4 classification of the frequencies was also decided. • For the sake of completeness, six more datasets with similar correlations constraints but different size (n=5000) and 6 6 classification were also analyzed in order to compare results. Recapitulation-Steps • Sample from diagonal band and minimum information copulas. • Estimate Kendal’s tau and the empirical lambda function. • Estimate the parameters for each family according to the previous results. • Estimate the lambda functions for each family. • Classify the dataset in categories and simulate values from each family according to their estimated parameters. • Perform chi-square goodness of fit test and compare the resulting fits. • Compute the relative information with respect to uniform distribution. • Repeat the whole procedure for different correlations and size of the dataset. Examples of classifications from diag.band with 0.2 4 4 Cross-Classification of X and Y (Observed values) 0,0.1 0.5,0.9 0.9,1 X\Y 0.1,0.5 0,0.1 0.1,0.5 0.5,0.9 0.9,1 8 47 40 0 43 166 131 49 36 144 186 46 0 38 52 14 6 6 Cross-Classification of X and Y (Observed values) X\Y 0,0.1 0.1,0.3 0.3,0.5 0.5,0.7 0.7,0.9 0,0.1 0.1,0.3 0.3,0.5 0.5,0.7 0.7,0.9 0.9,1 67 105 129 126 80 0 126 222 211 196 119 84 118 230 207 121 183 123 111 173 131 199 233 152 78 118 203 244 239 126 0.9,1 1 78 131 128 127 77 n =0.0877273 2 Statistic df Clayton 47.2727 8 Frank 34.4113 8 Gumbel 49.5668 8 n =0.1076116 2Statistic df Clayton 52.3819 7 Frank 32.7662 8 Gumbel 47.2727 8 n =0.431007 2 Statistic df 219.842 5 66.528 7 Gumbel 129.861 6 n =0.424140 Clayton Frank Clayton Frank Gumbel 2 Statistic df 113.601 5 24.808 7 119.047 7 n =0.6820981 2 Statistic df Clayton 265.805 3 Frank 67.031 3 182.181 3 Gumbel n =0.6991151 2 Statistic df Clayton 215.628 3 Frank 39.309 3 Gumbel 140.177 3 n =0.1083165 2 Statistic df Clayton 356.184 25 Frank 301.967 25 Gumbel 357.828 25 n Clayton Frank Gumbel =0.1040581 2 Statistic df 148.598 25 65.967 25 94.2128 25 n =0.4270299 2 Statistic df Clayton 1707.548 22 Frank 604.8511 23 Gumbel 1398.062 23 n =0.4398811 2 Statistic df Clayton 1044.165 23 Frank 127.982 23 Gumbel 557.023 23 n =0.6801317 2 Statistic df Clayton 1685.918 15 Frank 473.3928 17 Gumbel 1128.659 17 n =0.6906168 2 Statistic Clayton df 1142.479 14 Frank 124.384 16 Gumbel 677.908 17 Relative Information with respect to uniform distribution Clayton Frank Gumbel Diag.band 0.2 (n=1000) 0.0144 0.8261 0.0887 Min.inf.0.2 (n=1000) 0.0215 0.7925 0.1099 Diag.band 065 (n=1000) 0.3207 0.4932 0.5222 Min.inf.0.65 (n=1000) 0.3107 0.4944 0.5126 Diag.band 0.9 (n=1000) 0.8653 0.6884 0.8794 Min.inf.0.9 (n=1000) 0.9202 0.7247 0.9015 Diag.band0.2 (n=5000) 0.0218 0.7913 0.1106 Min.inf.0.2 (n=5000) 0.0202 0.7983 0.1060 Diag.band 0.65 (n=5000) 0.3150 0.4939 0.5167 Min.inf.0.65 (n=5000) 0.3343 0.4920 0.5351 Diag.band0.9 (n=5000) 0.8592 0.6844 0.8768 Min.inf.0.9 (n=5000) 0.8924 0.7060 0.8905 Conclusions-Comments • for correlation 0.2 all the three families seem to fit reasonably well when n=1000,but when n=5000 only Frank’s and also Gumbel’s for the min.information. • for correlation 0.65 the results are quite promising only when n=1000 • for correlation 0.9 Frank’s and Gumbel’s family seem to fit the data and only Frank’s family when n=5000. when n=1000 • the results are more promising in the cases of minimum information copula and this is actually a fact that holds for all the datasets no matter what the correlation is. • the results are much better when the size of the dataset is smaller. • It is obvious that the chi-square test statistic is sensitive to the number of cells.For greater size n and 6x6 cross-classification the results in almost all cases are disappointing. A performance of another goodness of fit test might result in more encouraging conclusions. • for correlation 0.2 Clayton’s family has the smallest values of relative information with respect to uniform distribution. • Nonetheless, for greater correlations, Frank’s family has the smallest values.