FIN 685: Risk Management

advertisement
FIN 685: Risk Management
Topic 4: Dependencies
Larry Schrenk, Instructor

Dependency

Rank Order Statistics
– Spearman’s r
– Kendall’s t

Correlation

Copulae
Dependency

Require that you ask three questions
– Does an dependencies exist?
– If an dependency exists, then how strong is it?
– What is the pattern or direction of the
dependency ?


When we consider these questions, we
begin to explain the nature (if there is one)
of the relationship between our variables
When we do this, we cross a threshold into a
higher form of scientific inquiry




In our example we looked at the independence of
two variables
When the test rejected the null of independence it
revealed that there was evidence of some
association between our variables
It is important to recognize that the statistics do
not prove a causal relationship, but they do give
evidence that that relationship is likely to exist
This can have far reaching implications because it
opens up the possibility for modeling and
prediction




A huge Chi-Square value is some indication
that there is a strong association, but this
“impressionistic” approach is somewhat limited
As we progress in this subject, we will
investigate specific indices for describing the
strength of an association
These indices are generally scaled from 0 to 1
or from –1 to +1
A zero value generally means no association,
while a 1 value means that there is nearly a
perfect association



Because of the pseudo-ordinal nature of the data in our
example, attending lecture is clearly associated with higher
exam scores
In the simplest sense, more lectures attended translates into
a higher score
This means that the dependency is not only present, but also
positive in its effect (big yields big and small yields small)
Performance on Final Exam
Lecture Attendance
>66.7% ≤66.6%
Row Sum
Exam Score ≥80%
31
1
32
<79.9%
2
13
15
Column Sum
33
14
47
Total Sum
What Is the Pattern or
Direction of the Dependency ?

A silly example:
Activity on Saturday Night
How You Feel on Sunday "Where the #*&% am I?" "Woohoo!" A few beers with friends ;) "I am a monk"
Great!
0
1
1
7 9
Good
0
2
4
2 8
Ok
1
3
4
1 9
I think I am going to die
9
4
1
0 14
10
10
10
10 40



Note that the dependency here is negative
It appears that getting so drunk that you forget
where you are on Saturday night can have an
adverse effect on how you feel on Sunday
This is negative association: big boozing provides
little contentment on Sunday morning, while little
alcohol will yield big contentment on Sunday
morning
If an dependency exists, then we should
the strength of the relationship in a
standard manner via an index
 We do this so that we can compare
associations between many variables
and thereby determine which have the
strongest influence over others


The relationship between any two
variables can be portrayed graphically on
an x- and y- axis.

Each subject i1 has (x1, y1). When score s
for an entire sample are plotted, the
result is called scatter plot.
Variables can be positively or negatively
correlated.
Positive correlation: A value of one
variable increase, value of other variable
increase.
Negative correlation: A value of one
variable increase, value of other variable
decrease.
The magnitude of correlation:
Indicated by its numerical value
 ignoring the sign
 expresses the strength of the linear
relationship between the variables.

r =1.00
r = .42
r =.85
r =.17
Rank Order Statistics:
Spearman’s r

Non-Parametric

Range:–1.0 to zero to 1.0

Like correlation between ranked variables

Ordinal
Ordinal data is defined as data that has a
clear hierarchy
 This form of data can often appear similar
to nominal data (in categories) or interval
data (ranked from 1 to N)
 However there is more information to
ordinal categories than nominal categories
 And there is less to ranks than there is to
real data at an interval level of measure


1.00 means that the rankings are in perfect
agreement

-1.00 is if they are in perfect disagreement

0 signifies that there is no relationship

Convert data to ranks, xi, yi
– Excel: RANK function

Assuming no tied ranks
r  1
6  x i  y i 


n n2  1
2
Spearman's r
A
4
6
8
7
B
2
13
11
5
Rank
4
3
1
2
Rank
4
1
2
3
Diff
0
2
-1
-1
Sum
r
Diff2
0
4
1
1
6
0.4
Rank Order Statistics:
Kendall’s t

Non-Parametric

Range:–1.0 to zero to 1.0

‘Pairs’ Oriented

Ordinal




The basic premise behind Kendall’s is that for
observations with two pieces of information (two
variables) you can rank the value of each and treat
it as a pair to be compared to all other pairs
Each pair will have an X (dependent variable) value
and a Y (independent variable) value
If we order the X values, we would would expect for
the Y values to have a similar order if there is a a
strong positive correlation between X and Y
Kendall’s has a range from –1 to +1 with large
positive values denoting positive associations and
large negative values denoting negative
associations, a 0 denotes no association
This series of tests works off of the
comparison of pairs to all other pairs
 Any comparison of pairs can have only
three possible results

– Concordant – Ordinally correct
– Discordant – Ordinally incorect
– Tied – Exactly the same

Note that for n pairs there are n(n-1)/2
comparisons, hence the equation from
your book

This series of tests works off of the
comparison of pairs to all other pairs

Any comparison of pairs can have only
three possible results
– Concordant (Nc) – Ordinally correct
– Discordant (Nd) – Ordinally incorrect
– Tied – Exactly the same

Note that for n pairs there are n(n-1)/2
comparisons, hence the equation
Nc  Nd
t
n(n  1) / 2
Correlation
Pearson’s Product Moment Correlation
 Devised by Francis Galton
 The coefficient is essentially the sum of
the products of the z-scores for each
variable divided by the degrees of
freedom
 Its computation can take on a number of
forms depending on your resources


Parametric: Elliptical

Linear

Range:–1.0 to zero to 1.0

Cardinal
z

r
x
zy
n 1
r
( x  x)( y  y )

r
 ( x  x)( y  y)
 ( x  x)  ( y  y )
2
Mathematically Simplified


(n  1) s x s y
r
2
sx 
 ( x  x)
2
n 1
 xy  ( x)( y) / n
 x  ( x ) / n  y  ( y )
2
2
2
2
/n
Computationally Easier
The sample covariance is the upper center
equation without the sample standard
deviations in the denominator
Covariance measures how two variables
covary and it is this measure that serves as
the numerator in Pearson’s r
s xy

How it works graphically:
( x  x)( y  y )


(n  1)
r = 0.89, cov = 788.6944
x(bar)
550
500
450
y(bar)
400
+,+
350
-,300
250
200
50
60
70
80
90
So we now understand Covariance
 Standard deviation is also comfortable
term by now
 So we can calculate Pearson’s r, but what
does it mean:

– r is scaled from –1 to +1 and its magnitude
gives the strength of association, while its
sign shows how the variables covary

Correlation = 0.58

Multivariate Normal
– Ellipitical

Linear Relationships
1. Correlation represents a linear relations.
 Correlation tells you how much two variables
are linearly related, not necessarily how much
they are related in general.

There are some cases that two variables may
have a strong perfect relationship but not
linear. For example, there can be a curvilinear
relationship.

An Extreme Example
x
1
2
3
4
5
6
7
8
9
10
x2
1
4
9
16
25
36
49
64
81
100
x3
1
8
27
64
125
216
343
512
729
1000
r(x,x 2)
0.974559
r(x,x 3)
0.928391
2. Restricted range
 Correlation can be deceiving if the full
information about each of the variable is not
available. A correlation between two variable is
smaller if the range of one or both variables is
truncated.

Because the full variation of one variables is not
available, there is not enough information to
see the two variables covary together.
3. Outliers
 Outliers are scores that are so obviously
deviant from the remainder of the data.

On-line outliers–artificially inflate the
correlation coefficient.

Off-line outliers–artificially deflate the
correlation coefficient

An outlier which falls near where the regression
line would normally fall would necessarily
increase the size of the correlation coefficient,
as seen below.

r = .457

An outlier that falls some distance away from
the original regression line would decrease the
size of the correlation coefficient, as seen
below:

r = .336
3. Distributional Assumptions
 Multivariate Normal

Assets not Normal

Combining Distributions
3. Time Stability

Higher Correlation in Bad Markets





Two things that go together may not
necessarily mean that there is a causation.
One variable can be strongly related to
another, yet not cause it. Correlation does not
imply causality.
When there is a correlation between X and Y.
Does X cause Y or Y cause X, or both?
Or is there a third variable Z causing both X and
Y , and therefore, X and Y are correlated?
Copulae
First introduced in 1959 by Abe Sklar.
 Has since played an important role in
areas of probability and statistics –
especially in dependence studies.
 Most easily viewed as connecting two
univariate marginal distributions to their
joint distribution




Statistician who moved over to business
Worked at a credit derivative market in 1997 and
knew about the need to measure default
correlation
Colleagues in actuarial science working on
solution for death correlation, a function called
the copula.
Latin word that means “to fasten or fit”
 Bridge between marginal distributions
and a joint distribution.
– In the case of death correlation, the
marginal distribution is made up of
probabilities of time until death for one
person, and joint distribution shows
the probability of two people dying in
close succession.

Joint distribution
Marginal distributions

If you have a joint distribution function
along with marginal distribution
functions, then there exists a copula
function that links them; if the marginal
distributions are continuous, then the
copula is unique.
(http://www.mathworks.com/access/helpdesk/help/toolbox/stats/copula_14.gif)

Assumes that if the marginal probability
distributions are normal, then the joint
probability distribution will also be normal.
C(u,v) = ф2(ф-1(u), ф-1(v), ρ), -1≤ ρ ≤1

J
– C is the copula function of two normal distributions,
– ф2 is the multivariate normal distribution function
with correlation coefficient ρ, and
– Ф^(-1) is the inverse of the cumulative univariate
normal distribution functions, u and v

From this definition of the Gaussian copula, it
is clear we will need two other pieces of
information aside from the choice in copula:
– the normal marginal distribution
functions
– a correlation coefficient




Specifies the shape of the multivariate distribution
– Zero correlation = circular 
– Positive or negative correlation
= ellipse
The correlation number is always independent of the
marginals (Hull 515).
Assumptions
– one-to-one relationship between asset correlation and
default correlation based on the definition of default as an
asset falling below a certain value.
– Correlation number is always positive (Li 11-12).
The correlation number is an extremely important factor in
this model because it determines the information you get out
of the model.
Download