Statistics for Finance

advertisement
Statistics for Finance
1. Lecture 2: Some Basic Distributions.
We start be extending the notion of kurtosis and skewness for random variables.
Let a random variable X, with variance σ 2 and mean µ. The skewness is defined
as
"
E
X −µ
σ
3 #
and the kurtosis as
"
E
X −µ
σ
4 #
.
The following Lemma will be useful. It tells us how to derive the pdf of functions
of random variables, whose pdf is known.
Lemma 1. Let X a continuous random variable with pdf fX (x) and g a differentiable
and strictly monotonic function. Then Y = g(X) has the pdf
d
fY (y) = fX (g −1 (y)) g −1 (y),
dy
for y such that y = g(x), for some x. Otherwise fY (y) = 0.
Proof. The proof is an easy application of the change of variables and is left as an
exercise.
The following is also a well known fact
Proposition 1. Let X1 , X2 , . . . , Xn independent random variables. Then Var(X1 +
· · · + Xn ) = Var(X1 ) + · · · + Var(Xn )
Some important notions are the characteristic function and the moment generating function of a random variable X. The characteristic function is defined as
φ(t) = E eitX ,
t∈R
and the moment generating function, or Laplace transform is defined as
M (t) = E etX ,
t∈R
The importance of these quantinties is that they uniquely define the corresponding
distribution.
1
2
1.1. Normal Distribution.
Normal distributions are probably the most fundamental ones. The reason for this
lies in the Central Limit Theorem , which states that if X1 , X2 , . . . are independent
random variables with mean zero and variance one, then
X 1 + · · · + Xn
√
n
converges in distribution to a standard normal distribution.
The standard normal distribution, often denoted by N (0, 1), is the distribution
with probability density function (pdf)
x2
1
f (x) = √ e− 2 .
2π
The normal distribution with mean µ and variance σ 2 , often denoted by N (µ, σ 2 ),
has pdf
(x−µ)2
1
f (x) = √
e− 2σ2 .
2πσ 2
Let
Z x
y2
1
√
e− 2 dy
F (x) =
2π −∞
be the cumulative function of a N (0, 1) distribution. The q-quantile of the N (0, 1)
distribution is F −1 (q). The (1 − α)−quantile of the N (0, 1) distribution is denoted
by zα . We will see later on that zα is widely used for confidence intervals.
The characteristic function of a random variable X with N (µ, σ 2 ) distribution is
Z ∞
itX (x−µ)2
1
e− 2σ2 dx
eitx √
E e
=
2πσ 2
−∞
= e−
σ 2 t2
2
.
1.2. Lognormal Distribution.
Consider a N (µ, σ 2 ) random variable Z, then the random variable X = exp(Z)
is said to have a lognormal distribution. In other words X is lognormal if its
logarithm log X has a normal distribution. It is easy to see that the pdf of a
lognormal distribution associated to a N (µ, σ 2 ) distribution is
(log x−µ)2
1
f (x) = √
e− 2σ2 .
x 2πσ 2
The median of the above distribution is exp(µ), while its mean is exp(µ+σ 2 /2). The
mean is larger than the median which indicates that the lognormal distribution is
right skewed. In fact the larger the variance σ 2 of the associated normal distribution,
the more skewed the lognormal distribution is.
3
Lognormal distributions are particularly important in mahtematical finance, as
it appears in the modelling of returns, where geometric Brownian Motion appears.
We will try to sketch this relation below. For more detailed discussion you can look
at Ruppert: ”Statistics and Finance: An Introduction”, pg. 75-83.
The Net return of an asset measures the changes in prices of assets expressed
as fractions of the initial price. For example if Pt is the proce of the asset at time t
then the net return at time t is defined as
Pt
Pt − Pt−1
Rt =
−1=
.
Pt−1
Pt
The revenue from holding an asset is
revenue=initial investment× net return.
The simple gross return is
Pt
= Rt + 1.
Pt−1
The gross return over a period of k units of time is
Pt
Pt Pt−1
Pt−k+1
1 + Rt (k) =
=
·
···
Pt−k
Pt−1 Pt−2
Pt−k
= (1 + Rt )(1 + Rt−1 ) · · · (1 + Rt−k+1 )
Often it is easier to work with log returns (also known as continuously compounded
returns). This is
Pt
rt = log(1 + Rt ) = log
Pt−1
By analogy with above the log return over a period of k units of time is
rt (k) = rt + · · · + rt−k+1 .
4
A very common assumption in finance is to assume that the log returns on different
times are independent and identically distributed.
By the definition of the return the price of the asset at time t will be given by the
formula
Pt = P0 exp rt + · · · + r1
If the distribution of each ri is N (µ, σ 2 ) then the distribution of the sum in the
above exponential will be N (tµ, tσ 2 ). Therefore, the price of the asset at time t will
be a log-normal distribution.
Later on, you will see that is the time increments are taken to be infinitesimal
the sum in the above exponential will approach a Brownian Motion with drift and
then the price of the asset will follow the exponential Brownian Motion.
1.3. Exponential, Laplace, Gamma. The exponential distribution with sclae
parameter θ > 0, often denoted by Exp(θ) has pdf
e−x/θ
,
θ
x > 0,
mean θ and standard deviation θ. The Laplace distribution with mean µ and scale
parameter θ has pdf
e−|x−µ|
,
2θ
x ∈ R.
√
The standard deviation of the Laplace distribution is 2θ.
The Gamma distribution with scale parameter θ and shape parameter α has pdf
θ−α α−1 −x/θ
x e
,
Γ(α)
x > 0,
with the normalisation
Z
Γ(α) =
∞
xα−1 e−u du,
α>0
0
which is the so called gamma function. Notice that when α = 1 one recovers the
exponential distribution with scale parameter θ.
5
Proposition 2. Consider two independent random variables X1 , X2 , gamma distributed with shape parameters α1 , α2 respectively and scale parameters equal to θ.
Then the distribution of X1 + X2 is gamma with shape parameter α1 + α2 and scale
parameter θ.
The proof uses the following lemma
Lemma 2. If X1 , X2 are independent random variables with continuous probability
density functions fX1 (x) and fX2 (x), then the pdf of X1 + X2 is
Z
fX1 +X2 (x) = fX1 (x − y)fX2 (y)dy.
Proof. Let formally fX1 +X2 (x) = P (X1 + X2 = x). We know that striclty speaking
the right hand side is zero in the case of a continuous random variable. We think,
though, of this as P (X1 + X2 ' x). We then have
Z
P (X1 + X2 = x) =
P (X1 + X2 = x, X2 = y)dy
Z
=
P (X1 = x − y, X2 = y)dy
Z
=
P (X1 = x − y)P (X2 = y)dy
Z
=
fX1 (x − y)fX2 (y)dy,
where in the last step we used the independence of X1 , X2 .
We are now ready for the proof of Proposition 2
6
Proof. For simplicity we will assume that θ = 1. The general case follows along the
same lines. Based on the previous Lemma we have
Z
fX1 +X2 (x) =
fX1 (x − y)fX2 (y)dy
Z x
(x − y)α1 −1 −(x−y) y α2 −1 −y
=
e
e dy
Γ(α1 )
Γ(α2 )
0
Z
xα1 +α2 −1 −x 1
(1 − y)α1 −1 y α2 −1 dy
e
=
Γ(α1 )Γ(α2 )
0
α1 +α2 −1
x
=
e−x
Γ(α1 + α2 )
The third equality follows from an easy change of variables, while the last from
the well known property of Gamma functions that
Z 1
Γ(α1 )Γ(α2 )
(1 − x)α1 −1 xα2 −1 dx =
.
Γ(α1 + α2 )
0
1.4. χ2 Distribution.
If X is a N (0, 1) random variable, then the distribution of X 2 is called the χ2
distribution with 1 degree of freedom. Often we denote the χ2 distribution with
one degree of freedom by χ21 .
It follows easily using Lemma 1, that the χ21 distribution is actually a Gamma
distribution with shape and scale parameters 1/2 and 2, respectively. (Check this !)
If now U1 , U2 , . . . , Un are independent χ21 distributions, then then distribution of
the sum U12 + · · · + Un2 is called the χ2 distribution with n degrees of freedom
and is denoted by χ2n .
Since the χ21 is a Gamma(1/2,2) distribution, we know from Proposition 2 that
the χ2n distribution is acutally the Gamma distribution with shape parameter n/2
and scale parameter 2.
The χ2 distribution is important since it is used to estimate the variance of a
random variable, based on the sample variance as this will be measured in a sampling
process. To see the relevance compare with the definition of the sample variance as
this is given in Definition 4 of Lecture 1.
1.5. t-Distribution.
The t-distribution is important when we want to derive confidence intervals (we
will study this later on) for certain parameters of interest, when the (population)
variance of the underlying distribution is not known.At this stage we would need to
7
have a sampling estimate for the (population) variance, thus the t−distribution is
related to the χ2 distribution.
Let’s proceede with the definition of
p the t−distribution. If Z ∼ N (0, 1), and
2
U ∼ χn then the distribution of Z/ U/n is called the t-distribution with n
degrees of freedom, often denoted by tn .
The pdf of the t−distribution with n degrees of freedom is given by
Γ((n + 1)/2)
p
nπΓ(n/2)
t2
1+
n
−(n+1)/2
To prove this we will need the following lemma
Lemma 3. Let X, Y two continuous random variables with joint pdf fX,Y (x, y).
Then the pdf of the quotient Z = Y /X is given by
Z
fZ (z) = |x|fX,Y (x, xz)dx
Proof.
P
Y
<z
X
Z
∞
Z
zx
=
Z
0
Z
∞
fXY (x, y) +
0
−∞
fXY (x, y)
−∞
zx
Differentiating both sides with repsect to z we get
Z ∞
Z 0
fY /X (z) =
xfXY (x, xz)dx −
xfXY (x, zx)dx
0
−∞
and the result follows.
The rest is left as an exercise.
As the degrees of freedom n tend to infinity the tn distribution approximates the
standard normal distribution. To see this one needs to use the fact that
n+1 2
x2 (n+1)/2
2
)
∼ e 2n x ∼ ex /2
(1 +
n+1
and the asymptotics of the Gamma function
√
Γ(n) ∼ 2πnnn e−n .
Recall that when n is an integer Γ(n+1) = n!, so the above is just Stirling’s formula,
but it also holds in the general case that n is not integer.
8
1.6. F-Distribution.
If U, V independent and U ∼ χ2n1 , V ∼ χ2n2 then the distribution of
W =
U/n1
V /n2
is called the F-distribution with n1 , n2 degrees of freedom. F-distributions are
used in regression analysis. The pdf of the F -distribution is given by
n /2
−(n1 +n2 )/2
Γ((n1 + n2 )/2) n1 1
n1
n1 /2−1
x
1+ x
Γ(n1 /2)Γ(n2 /2) n2
n2
The proof is similar to the derivation of the pdf of the t−distribution and is therefore
left as an exercise.
9
1.7. Heavy-Tailed Distributions.
Distributions with high tail probabilities compared to a normal distribution, with
same mean and variance are called heavy-tailed. In other words a distribution F
with mean zero and variance one is heavy (right) tailed if
1 − F (x)
√ >> 1,
R∞
e−x2 /2 / 2π
x
x → +∞
Similar statement holds for the left tail. A heavy-tailed distribution can also be
detected from high kurtosis (why ?)
A heavy-tailed distribution is more prone to extreme values, often called outliers.
In finance applications one is especially concerned with heavy-tailed returns, since
the possibility of an extreme negative value can deplete the capital reserves of a
firm.
For example t-distribution is heavy tailed, since its density is proportional to
1
2
∼ |x|−(n+1) >> e−x /2
2
(n+1)/2
1 + (x /n)
for large x.
A particular class of heavy-teiled distribution are the Pareto distributions or
simply power law distributions.
These are distributions with pdf
L(x)
.
|x|α
L(x) is a slowly varying function, that is a function with the property that, for any
constant c,
L(cx)
→ 1,
x → ∞.
L(x)
An example of a slowly varying function is log x, or exp (log x)β , for β < 1. In the
Pareto distribution α > 1, or α = 1 if L(x) decays sufficiently fast.
1.8. Multivariate Normal Distributions.
The random vector (X1 , X2 , . . . , Xn ) ∈ Rn is said to have a multivariate normal
distribution if for every constant vector (c1 , c2 , . . . , cn ) ∈ Rn , the distribution of
c1 X1 + · · · + cn Xn is normal.
Multivariate normal distributions facilitate modelling on portfolios. A portfolio
is a weighted average of the assets with weights that sum up to one. The weights
specify what fraction of the total investment is allocated to assets.
As in one dimensional normal distributions, the multivariate distribution is determined by the mean
(µ1 , . . . , µn ),
10
with µi = E[Xi ] and the covariance matrix. That is the matric G = (Gi,j ) with
entries
Gij = E[Xi Xj ] − µi , µj .
For simplicity let’s assume that µi = 0, for i = 1, 2 . . . , n. Then the multivariate
normal density function is given by
1
1
−1
x ∈ Rn ,
exp − < x, G x > ,
(2π)n/2 (detG)1/2
2
where G−1 is the inverse of G, detG the determinant of G P
and < ·, · > denotes the
n
n
inner product in R , that is, if x, , y ∈ R , then < x, y >= ni=1 xi yi .
1.9. Exercises.
1 (Logonormal Distributions) Compute the moments of a logonormal distribution
X = eZ , with Z a normal N (µ, σ 2 ) distribution. In particular compute its mean,
standard deviation, skewness and kurtosis.
2. (Exponentials and Poissons) Exponential distributions often arise in the study
of arrivals, qeueing etc. modeling the time between interarrivals. Consider T to
be the time for the first arrival in a system and suppose it has an exponential
distribution with scale parameter 1. Compute P (T > t + s|T > s).
Suppose that the number of arrivals on a system is a Poisson process with parameter λ. That is
P rob(#{arrivals before time t} = k) = e−λt (λt)k k!
and arrivals in disjoint time intervals are independent.
What is the distribution of the interarrival times ?
3. Compute the moments of a gamma distribution with shape parameter α and
scale parameter 1.
4. Prove Lemma 1
5. Prove that the distribution of χ21 is a Gamma(1/2,1/2).
6. Show that a heavy tailed distribution has high kurtosis.
7. Derive the pdf of the t−distribution.
8. Compute the kurtosis of (a) N (0, 1), (b) an exponential with scale parameter
one.
9. (Mixture models). Let X1 ∼ N (0, σ12 ) and X2 ∼ N (0, σ22 ) two independent
normal distribution. Let also Y be another independent random variable with a
Bernoulli distribution, that is P (Y = 1) = p and P (Y = 0) = 1 − p, for some
0 < p < 1.
A. What is the mean and the variance of Z = Y X1 + (1 − Y )X2 ?
B. Are the tails of its distribution heavier or lighter when compared to a normal
distribution with the same mean and variance? If so, for what values of p ? Give
also an intuitive explanation of your mathematical derivation
11
Use some statistical software to draw the distribution of the mixture model, for
some values of the parameter p and compare it (especially the tails) with the one of
the corresponding normal.
Download