FIN 685: Risk Management Topic 4: Dependencies Larry Schrenk, Instructor Dependency Rank Order Statistics – Spearman’s r – Kendall’s t Correlation Copulae Dependency Require that you ask three questions – Does an dependencies exist? – If an dependency exists, then how strong is it? – What is the pattern or direction of the dependency ? When we consider these questions, we begin to explain the nature (if there is one) of the relationship between our variables When we do this, we cross a threshold into a higher form of scientific inquiry In our example we looked at the independence of two variables When the test rejected the null of independence it revealed that there was evidence of some association between our variables It is important to recognize that the statistics do not prove a causal relationship, but they do give evidence that that relationship is likely to exist This can have far reaching implications because it opens up the possibility for modeling and prediction A huge Chi-Square value is some indication that there is a strong association, but this “impressionistic” approach is somewhat limited As we progress in this subject, we will investigate specific indices for describing the strength of an association These indices are generally scaled from 0 to 1 or from –1 to +1 A zero value generally means no association, while a 1 value means that there is nearly a perfect association Because of the pseudo-ordinal nature of the data in our example, attending lecture is clearly associated with higher exam scores In the simplest sense, more lectures attended translates into a higher score This means that the dependency is not only present, but also positive in its effect (big yields big and small yields small) Performance on Final Exam Lecture Attendance >66.7% ≤66.6% Row Sum Exam Score ≥80% 31 1 32 <79.9% 2 13 15 Column Sum 33 14 47 Total Sum What Is the Pattern or Direction of the Dependency ? A silly example: Activity on Saturday Night How You Feel on Sunday "Where the #*&% am I?" "Woohoo!" A few beers with friends ;) "I am a monk" Great! 0 1 1 7 9 Good 0 2 4 2 8 Ok 1 3 4 1 9 I think I am going to die 9 4 1 0 14 10 10 10 10 40 Note that the dependency here is negative It appears that getting so drunk that you forget where you are on Saturday night can have an adverse effect on how you feel on Sunday This is negative association: big boozing provides little contentment on Sunday morning, while little alcohol will yield big contentment on Sunday morning If an dependency exists, then we should the strength of the relationship in a standard manner via an index We do this so that we can compare associations between many variables and thereby determine which have the strongest influence over others The relationship between any two variables can be portrayed graphically on an x- and y- axis. Each subject i1 has (x1, y1). When score s for an entire sample are plotted, the result is called scatter plot. Variables can be positively or negatively correlated. Positive correlation: A value of one variable increase, value of other variable increase. Negative correlation: A value of one variable increase, value of other variable decrease. The magnitude of correlation: Indicated by its numerical value ignoring the sign expresses the strength of the linear relationship between the variables. r =1.00 r = .42 r =.85 r =.17 Rank Order Statistics: Spearman’s r Non-Parametric Range:–1.0 to zero to 1.0 Like correlation between ranked variables Ordinal Ordinal data is defined as data that has a clear hierarchy This form of data can often appear similar to nominal data (in categories) or interval data (ranked from 1 to N) However there is more information to ordinal categories than nominal categories And there is less to ranks than there is to real data at an interval level of measure 1.00 means that the rankings are in perfect agreement -1.00 is if they are in perfect disagreement 0 signifies that there is no relationship Convert data to ranks, xi, yi – Excel: RANK function Assuming no tied ranks r 1 6 x i y i n n2 1 2 Spearman's r A 4 6 8 7 B 2 13 11 5 Rank 4 3 1 2 Rank 4 1 2 3 Diff 0 2 -1 -1 Sum r Diff2 0 4 1 1 6 0.4 Rank Order Statistics: Kendall’s t Non-Parametric Range:–1.0 to zero to 1.0 ‘Pairs’ Oriented Ordinal The basic premise behind Kendall’s is that for observations with two pieces of information (two variables) you can rank the value of each and treat it as a pair to be compared to all other pairs Each pair will have an X (dependent variable) value and a Y (independent variable) value If we order the X values, we would would expect for the Y values to have a similar order if there is a a strong positive correlation between X and Y Kendall’s has a range from –1 to +1 with large positive values denoting positive associations and large negative values denoting negative associations, a 0 denotes no association This series of tests works off of the comparison of pairs to all other pairs Any comparison of pairs can have only three possible results – Concordant – Ordinally correct – Discordant – Ordinally incorect – Tied – Exactly the same Note that for n pairs there are n(n-1)/2 comparisons, hence the equation from your book This series of tests works off of the comparison of pairs to all other pairs Any comparison of pairs can have only three possible results – Concordant (Nc) – Ordinally correct – Discordant (Nd) – Ordinally incorrect – Tied – Exactly the same Note that for n pairs there are n(n-1)/2 comparisons, hence the equation Nc Nd t n(n 1) / 2 Correlation Pearson’s Product Moment Correlation Devised by Francis Galton The coefficient is essentially the sum of the products of the z-scores for each variable divided by the degrees of freedom Its computation can take on a number of forms depending on your resources Parametric: Elliptical Linear Range:–1.0 to zero to 1.0 Cardinal z r x zy n 1 r ( x x)( y y ) r ( x x)( y y) ( x x) ( y y ) 2 Mathematically Simplified (n 1) s x s y r 2 sx ( x x) 2 n 1 xy ( x)( y) / n x ( x ) / n y ( y ) 2 2 2 2 /n Computationally Easier The sample covariance is the upper center equation without the sample standard deviations in the denominator Covariance measures how two variables covary and it is this measure that serves as the numerator in Pearson’s r s xy How it works graphically: ( x x)( y y ) (n 1) r = 0.89, cov = 788.6944 x(bar) 550 500 450 y(bar) 400 +,+ 350 -,300 250 200 50 60 70 80 90 So we now understand Covariance Standard deviation is also comfortable term by now So we can calculate Pearson’s r, but what does it mean: – r is scaled from –1 to +1 and its magnitude gives the strength of association, while its sign shows how the variables covary Correlation = 0.58 Multivariate Normal – Ellipitical Linear Relationships 1. Correlation represents a linear relations. Correlation tells you how much two variables are linearly related, not necessarily how much they are related in general. There are some cases that two variables may have a strong perfect relationship but not linear. For example, there can be a curvilinear relationship. An Extreme Example x 1 2 3 4 5 6 7 8 9 10 x2 1 4 9 16 25 36 49 64 81 100 x3 1 8 27 64 125 216 343 512 729 1000 r(x,x 2) 0.974559 r(x,x 3) 0.928391 2. Restricted range Correlation can be deceiving if the full information about each of the variable is not available. A correlation between two variable is smaller if the range of one or both variables is truncated. Because the full variation of one variables is not available, there is not enough information to see the two variables covary together. 3. Outliers Outliers are scores that are so obviously deviant from the remainder of the data. On-line outliers–artificially inflate the correlation coefficient. Off-line outliers–artificially deflate the correlation coefficient An outlier which falls near where the regression line would normally fall would necessarily increase the size of the correlation coefficient, as seen below. r = .457 An outlier that falls some distance away from the original regression line would decrease the size of the correlation coefficient, as seen below: r = .336 3. Distributional Assumptions Multivariate Normal Assets not Normal Combining Distributions 3. Time Stability Higher Correlation in Bad Markets Two things that go together may not necessarily mean that there is a causation. One variable can be strongly related to another, yet not cause it. Correlation does not imply causality. When there is a correlation between X and Y. Does X cause Y or Y cause X, or both? Or is there a third variable Z causing both X and Y , and therefore, X and Y are correlated? Copulae First introduced in 1959 by Abe Sklar. Has since played an important role in areas of probability and statistics – especially in dependence studies. Most easily viewed as connecting two univariate marginal distributions to their joint distribution Statistician who moved over to business Worked at a credit derivative market in 1997 and knew about the need to measure default correlation Colleagues in actuarial science working on solution for death correlation, a function called the copula. Latin word that means “to fasten or fit” Bridge between marginal distributions and a joint distribution. – In the case of death correlation, the marginal distribution is made up of probabilities of time until death for one person, and joint distribution shows the probability of two people dying in close succession. Joint distribution Marginal distributions If you have a joint distribution function along with marginal distribution functions, then there exists a copula function that links them; if the marginal distributions are continuous, then the copula is unique. (http://www.mathworks.com/access/helpdesk/help/toolbox/stats/copula_14.gif) Assumes that if the marginal probability distributions are normal, then the joint probability distribution will also be normal. C(u,v) = ф2(ф-1(u), ф-1(v), ρ), -1≤ ρ ≤1 J – C is the copula function of two normal distributions, – ф2 is the multivariate normal distribution function with correlation coefficient ρ, and – Ф^(-1) is the inverse of the cumulative univariate normal distribution functions, u and v From this definition of the Gaussian copula, it is clear we will need two other pieces of information aside from the choice in copula: – the normal marginal distribution functions – a correlation coefficient Specifies the shape of the multivariate distribution – Zero correlation = circular – Positive or negative correlation = ellipse The correlation number is always independent of the marginals (Hull 515). Assumptions – one-to-one relationship between asset correlation and default correlation based on the definition of default as an asset falling below a certain value. – Correlation number is always positive (Li 11-12). The correlation number is an extremely important factor in this model because it determines the information you get out of the model.