Statistics for Finance 1. Lecture 2: Some Basic Distributions. We start be extending the notion of kurtosis and skewness for random variables. Let a random variable X, with variance σ 2 and mean µ. The skewness is defined as " E X −µ σ 3 # and the kurtosis as " E X −µ σ 4 # . The following Lemma will be useful. It tells us how to derive the pdf of functions of random variables, whose pdf is known. Lemma 1. Let X a continuous random variable with pdf fX (x) and g a differentiable and strictly monotonic function. Then Y = g(X) has the pdf d fY (y) = fX (g −1 (y)) g −1 (y), dy for y such that y = g(x), for some x. Otherwise fY (y) = 0. Proof. The proof is an easy application of the change of variables and is left as an exercise. The following is also a well known fact Proposition 1. Let X1 , X2 , . . . , Xn independent random variables. Then Var(X1 + · · · + Xn ) = Var(X1 ) + · · · + Var(Xn ) Some important notions are the characteristic function and the moment generating function of a random variable X. The characteristic function is defined as φ(t) = E eitX , t∈R and the moment generating function, or Laplace transform is defined as M (t) = E etX , t∈R The importance of these quantinties is that they uniquely define the corresponding distribution. 1 2 1.1. Normal Distribution. Normal distributions are probably the most fundamental ones. The reason for this lies in the Central Limit Theorem , which states that if X1 , X2 , . . . are independent random variables with mean zero and variance one, then X 1 + · · · + Xn √ n converges in distribution to a standard normal distribution. The standard normal distribution, often denoted by N (0, 1), is the distribution with probability density function (pdf) x2 1 f (x) = √ e− 2 . 2π The normal distribution with mean µ and variance σ 2 , often denoted by N (µ, σ 2 ), has pdf (x−µ)2 1 f (x) = √ e− 2σ2 . 2πσ 2 Let Z x y2 1 √ e− 2 dy F (x) = 2π −∞ be the cumulative function of a N (0, 1) distribution. The q-quantile of the N (0, 1) distribution is F −1 (q). The (1 − α)−quantile of the N (0, 1) distribution is denoted by zα . We will see later on that zα is widely used for confidence intervals. The characteristic function of a random variable X with N (µ, σ 2 ) distribution is Z ∞ itX (x−µ)2 1 e− 2σ2 dx eitx √ E e = 2πσ 2 −∞ = e− σ 2 t2 2 . 1.2. Lognormal Distribution. Consider a N (µ, σ 2 ) random variable Z, then the random variable X = exp(Z) is said to have a lognormal distribution. In other words X is lognormal if its logarithm log X has a normal distribution. It is easy to see that the pdf of a lognormal distribution associated to a N (µ, σ 2 ) distribution is (log x−µ)2 1 f (x) = √ e− 2σ2 . x 2πσ 2 The median of the above distribution is exp(µ), while its mean is exp(µ+σ 2 /2). The mean is larger than the median which indicates that the lognormal distribution is right skewed. In fact the larger the variance σ 2 of the associated normal distribution, the more skewed the lognormal distribution is. 3 Lognormal distributions are particularly important in mahtematical finance, as it appears in the modelling of returns, where geometric Brownian Motion appears. We will try to sketch this relation below. For more detailed discussion you can look at Ruppert: ”Statistics and Finance: An Introduction”, pg. 75-83. The Net return of an asset measures the changes in prices of assets expressed as fractions of the initial price. For example if Pt is the proce of the asset at time t then the net return at time t is defined as Pt Pt − Pt−1 Rt = −1= . Pt−1 Pt The revenue from holding an asset is revenue=initial investment× net return. The simple gross return is Pt = Rt + 1. Pt−1 The gross return over a period of k units of time is Pt Pt Pt−1 Pt−k+1 1 + Rt (k) = = · ··· Pt−k Pt−1 Pt−2 Pt−k = (1 + Rt )(1 + Rt−1 ) · · · (1 + Rt−k+1 ) Often it is easier to work with log returns (also known as continuously compounded returns). This is Pt rt = log(1 + Rt ) = log Pt−1 By analogy with above the log return over a period of k units of time is rt (k) = rt + · · · + rt−k+1 . 4 A very common assumption in finance is to assume that the log returns on different times are independent and identically distributed. By the definition of the return the price of the asset at time t will be given by the formula Pt = P0 exp rt + · · · + r1 If the distribution of each ri is N (µ, σ 2 ) then the distribution of the sum in the above exponential will be N (tµ, tσ 2 ). Therefore, the price of the asset at time t will be a log-normal distribution. Later on, you will see that is the time increments are taken to be infinitesimal the sum in the above exponential will approach a Brownian Motion with drift and then the price of the asset will follow the exponential Brownian Motion. 1.3. Exponential, Laplace, Gamma. The exponential distribution with sclae parameter θ > 0, often denoted by Exp(θ) has pdf e−x/θ , θ x > 0, mean θ and standard deviation θ. The Laplace distribution with mean µ and scale parameter θ has pdf e−|x−µ| , 2θ x ∈ R. √ The standard deviation of the Laplace distribution is 2θ. The Gamma distribution with scale parameter θ and shape parameter α has pdf θ−α α−1 −x/θ x e , Γ(α) x > 0, with the normalisation Z Γ(α) = ∞ xα−1 e−u du, α>0 0 which is the so called gamma function. Notice that when α = 1 one recovers the exponential distribution with scale parameter θ. 5 Proposition 2. Consider two independent random variables X1 , X2 , gamma distributed with shape parameters α1 , α2 respectively and scale parameters equal to θ. Then the distribution of X1 + X2 is gamma with shape parameter α1 + α2 and scale parameter θ. The proof uses the following lemma Lemma 2. If X1 , X2 are independent random variables with continuous probability density functions fX1 (x) and fX2 (x), then the pdf of X1 + X2 is Z fX1 +X2 (x) = fX1 (x − y)fX2 (y)dy. Proof. Let formally fX1 +X2 (x) = P (X1 + X2 = x). We know that striclty speaking the right hand side is zero in the case of a continuous random variable. We think, though, of this as P (X1 + X2 ' x). We then have Z P (X1 + X2 = x) = P (X1 + X2 = x, X2 = y)dy Z = P (X1 = x − y, X2 = y)dy Z = P (X1 = x − y)P (X2 = y)dy Z = fX1 (x − y)fX2 (y)dy, where in the last step we used the independence of X1 , X2 . We are now ready for the proof of Proposition 2 6 Proof. For simplicity we will assume that θ = 1. The general case follows along the same lines. Based on the previous Lemma we have Z fX1 +X2 (x) = fX1 (x − y)fX2 (y)dy Z x (x − y)α1 −1 −(x−y) y α2 −1 −y = e e dy Γ(α1 ) Γ(α2 ) 0 Z xα1 +α2 −1 −x 1 (1 − y)α1 −1 y α2 −1 dy e = Γ(α1 )Γ(α2 ) 0 α1 +α2 −1 x = e−x Γ(α1 + α2 ) The third equality follows from an easy change of variables, while the last from the well known property of Gamma functions that Z 1 Γ(α1 )Γ(α2 ) (1 − x)α1 −1 xα2 −1 dx = . Γ(α1 + α2 ) 0 1.4. χ2 Distribution. If X is a N (0, 1) random variable, then the distribution of X 2 is called the χ2 distribution with 1 degree of freedom. Often we denote the χ2 distribution with one degree of freedom by χ21 . It follows easily using Lemma 1, that the χ21 distribution is actually a Gamma distribution with shape and scale parameters 1/2 and 2, respectively. (Check this !) If now U1 , U2 , . . . , Un are independent χ21 distributions, then then distribution of the sum U12 + · · · + Un2 is called the χ2 distribution with n degrees of freedom and is denoted by χ2n . Since the χ21 is a Gamma(1/2,2) distribution, we know from Proposition 2 that the χ2n distribution is acutally the Gamma distribution with shape parameter n/2 and scale parameter 2. The χ2 distribution is important since it is used to estimate the variance of a random variable, based on the sample variance as this will be measured in a sampling process. To see the relevance compare with the definition of the sample variance as this is given in Definition 4 of Lecture 1. 1.5. t-Distribution. The t-distribution is important when we want to derive confidence intervals (we will study this later on) for certain parameters of interest, when the (population) variance of the underlying distribution is not known.At this stage we would need to 7 have a sampling estimate for the (population) variance, thus the t−distribution is related to the χ2 distribution. Let’s proceede with the definition of p the t−distribution. If Z ∼ N (0, 1), and 2 U ∼ χn then the distribution of Z/ U/n is called the t-distribution with n degrees of freedom, often denoted by tn . The pdf of the t−distribution with n degrees of freedom is given by Γ((n + 1)/2) p nπΓ(n/2) t2 1+ n −(n+1)/2 To prove this we will need the following lemma Lemma 3. Let X, Y two continuous random variables with joint pdf fX,Y (x, y). Then the pdf of the quotient Z = Y /X is given by Z fZ (z) = |x|fX,Y (x, xz)dx Proof. P Y <z X Z ∞ Z zx = Z 0 Z ∞ fXY (x, y) + 0 −∞ fXY (x, y) −∞ zx Differentiating both sides with repsect to z we get Z ∞ Z 0 fY /X (z) = xfXY (x, xz)dx − xfXY (x, zx)dx 0 −∞ and the result follows. The rest is left as an exercise. As the degrees of freedom n tend to infinity the tn distribution approximates the standard normal distribution. To see this one needs to use the fact that n+1 2 x2 (n+1)/2 2 ) ∼ e 2n x ∼ ex /2 (1 + n+1 and the asymptotics of the Gamma function √ Γ(n) ∼ 2πnnn e−n . Recall that when n is an integer Γ(n+1) = n!, so the above is just Stirling’s formula, but it also holds in the general case that n is not integer. 8 1.6. F-Distribution. If U, V independent and U ∼ χ2n1 , V ∼ χ2n2 then the distribution of W = U/n1 V /n2 is called the F-distribution with n1 , n2 degrees of freedom. F-distributions are used in regression analysis. The pdf of the F -distribution is given by n /2 −(n1 +n2 )/2 Γ((n1 + n2 )/2) n1 1 n1 n1 /2−1 x 1+ x Γ(n1 /2)Γ(n2 /2) n2 n2 The proof is similar to the derivation of the pdf of the t−distribution and is therefore left as an exercise. 9 1.7. Heavy-Tailed Distributions. Distributions with high tail probabilities compared to a normal distribution, with same mean and variance are called heavy-tailed. In other words a distribution F with mean zero and variance one is heavy (right) tailed if 1 − F (x) √ >> 1, R∞ e−x2 /2 / 2π x x → +∞ Similar statement holds for the left tail. A heavy-tailed distribution can also be detected from high kurtosis (why ?) A heavy-tailed distribution is more prone to extreme values, often called outliers. In finance applications one is especially concerned with heavy-tailed returns, since the possibility of an extreme negative value can deplete the capital reserves of a firm. For example t-distribution is heavy tailed, since its density is proportional to 1 2 ∼ |x|−(n+1) >> e−x /2 2 (n+1)/2 1 + (x /n) for large x. A particular class of heavy-teiled distribution are the Pareto distributions or simply power law distributions. These are distributions with pdf L(x) . |x|α L(x) is a slowly varying function, that is a function with the property that, for any constant c, L(cx) → 1, x → ∞. L(x) An example of a slowly varying function is log x, or exp (log x)β , for β < 1. In the Pareto distribution α > 1, or α = 1 if L(x) decays sufficiently fast. 1.8. Multivariate Normal Distributions. The random vector (X1 , X2 , . . . , Xn ) ∈ Rn is said to have a multivariate normal distribution if for every constant vector (c1 , c2 , . . . , cn ) ∈ Rn , the distribution of c1 X1 + · · · + cn Xn is normal. Multivariate normal distributions facilitate modelling on portfolios. A portfolio is a weighted average of the assets with weights that sum up to one. The weights specify what fraction of the total investment is allocated to assets. As in one dimensional normal distributions, the multivariate distribution is determined by the mean (µ1 , . . . , µn ), 10 with µi = E[Xi ] and the covariance matrix. That is the matric G = (Gi,j ) with entries Gij = E[Xi Xj ] − µi , µj . For simplicity let’s assume that µi = 0, for i = 1, 2 . . . , n. Then the multivariate normal density function is given by 1 1 −1 x ∈ Rn , exp − < x, G x > , (2π)n/2 (detG)1/2 2 where G−1 is the inverse of G, detG the determinant of G P and < ·, · > denotes the n n inner product in R , that is, if x, , y ∈ R , then < x, y >= ni=1 xi yi . 1.9. Exercises. 1 (Logonormal Distributions) Compute the moments of a logonormal distribution X = eZ , with Z a normal N (µ, σ 2 ) distribution. In particular compute its mean, standard deviation, skewness and kurtosis. 2. (Exponentials and Poissons) Exponential distributions often arise in the study of arrivals, qeueing etc. modeling the time between interarrivals. Consider T to be the time for the first arrival in a system and suppose it has an exponential distribution with scale parameter 1. Compute P (T > t + s|T > s). Suppose that the number of arrivals on a system is a Poisson process with parameter λ. That is P rob(#{arrivals before time t} = k) = e−λt (λt)k k! and arrivals in disjoint time intervals are independent. What is the distribution of the interarrival times ? 3. Compute the moments of a gamma distribution with shape parameter α and scale parameter 1. 4. Prove Lemma 1 5. Prove that the distribution of χ21 is a Gamma(1/2,1/2). 6. Show that a heavy tailed distribution has high kurtosis. 7. Derive the pdf of the t−distribution. 8. Compute the kurtosis of (a) N (0, 1), (b) an exponential with scale parameter one. 9. (Mixture models). Let X1 ∼ N (0, σ12 ) and X2 ∼ N (0, σ22 ) two independent normal distribution. Let also Y be another independent random variable with a Bernoulli distribution, that is P (Y = 1) = p and P (Y = 0) = 1 − p, for some 0 < p < 1. A. What is the mean and the variance of Z = Y X1 + (1 − Y )X2 ? B. Are the tails of its distribution heavier or lighter when compared to a normal distribution with the same mean and variance? If so, for what values of p ? Give also an intuitive explanation of your mathematical derivation 11 Use some statistical software to draw the distribution of the mixture model, for some values of the parameter p and compare it (especially the tails) with the one of the corresponding normal.