Useful Univariate Distributions Modeling Univariate Distributions Location, Scale, and Shape Parameters Location parameter: shifts a distribution to the right or left without changing the distribution’s shape or variability. If f is a pdf, µ is a location parameter for the density f if f can be written as f (x − µ) and f (x − µ; µ) = f (x; 0). Which means that if µ is the location parameter of X, then a + µ is the location parameter of a + X. Scale parameter: quantifies dispersion. A parameter is a scale parameter for a univariate sample if the parameter is incrased by the amount |a| when the data are multiplied by a. So if σ(X) is a scale parameter for a random variable X, then σ(aX) = |a|σ(X), provided that the s.d. is finite. If F is a CDF and f the corresponding pdf, λ is a scale parameter if F (x; λ) = F ( λx ; 1) or equivalently f ( λx ; 1) λ −1 If λ is a scale parameter (quantifying dispersion), then λ is a precision parameter. f (x; λ) = Examples: • if X ∼ N [µ, σ 2 ], then µ is a location parameter, σ is a scale parameter, and sigma−1 is a precision parameter. • if X ∼ U [a, a + b], then a is a location parameter, b is a scale parameter, and bˆ-{-1} is a precision parameter. Location-scale family of distribution: If f is a pdf such that: x−µ −1 ; 0, 1 f (x; µ, λ) = λ λ then f is a family of distributions with location parameter µ and scale parameter λ (for example Normal and Uniform). Shape parameter: it is any parameter which is not changed by location and scale changes, it affects its shape rather than shift or stretch it. The most important moments to characterize a shape are the skewness and the kurtosis. Skewness: it measures the degree of asymmetry, with: • Symmetry implying zero skewness. • Positive skewness (right skewness) indicating a relatively long right tail compared to the left tail. • Negative skewness (left skewness) indicating the opposite. The skewness of a random variable X is: Sk = E " X − E[x] sd[X] 1 3 # Kurtosis: the kurtosis of a random variable X is: " 4 # X − E[x] Sk = E sd[X] It is possible to prove that Kur > 1. Furthermore, the kurtosis is usually only considered for symmetric distributions, even though symmetry is not necessary for its definition. We can estimate them from an i.i.d. sample with the estimators: " 3 # n Xi − X̄n 1X Sk = n i=1 S̄n " 4 # n 1X Xi − X̄n Kur = n i=1 S̄n Deviations from sample skewness of 0 and/of sample kurtosis of 3 are possible indicators of nonormality, but these estimators do not have good properties as they are biased and the sampling distribution cannot, in general, be obtained. Tests of Normality The null hypothesis is that the sample comes from a normal distribution, and the alternative is that it comes from a nonnormal distribution. Often the distribution of the test statistics for these tests is not of a common family, for small values of n their results should be considererd with care. The Shapiro-Wilk test uses a test statistic that can be interpreted as the square correlation between sample quantiles and the quantiles of a standard normal distribution. The Jarque-Bera test uses a test statistic combining skeweness and kurtosis: ! ˆ2 ˆ − 3)2 (Kur Sk + JB = n 6 24 JB = 0 when sample skewness is 0 and sample kurtosis is 3 as in a normal, and it increases in the other cases. The Normal Distribution The PDF of a normal distribution is: f (x) = (x−µ)2 1 √ e− 2σ2 σ 2π The normal distribution with µ = 0 and σ 2 = 1 is called standard normal distribution. Given X ∼ N [µ, σ 2 ] the transformed r.v. Z = X−µ follows a standard normal distribution. The q-quantile of X can be obtained σ as a linear transformation of the corresponding quantile of the standard normal: xq = µ + σzq . • A linear transformation of the normal distribution is itself a normal. • If X ∼ N [µ, σ 2 ] then for any value of µ and σ 2 , we have that Sk = 0 and Kur = 3. #dnorm(x, mean, sd) returns the PDF evaluated in x dnorm(0, mean = 0, sd = 1) ## [1] 0.3989423 2 0.2 0.0 dnorm(x) 0.4 curve(dnorm, from = -4, to = 4) −4 −2 0 2 4 x #pnorm(q) returns the integral from -inf to q of the normal pdf, where q is a z-score. pnorm(0) ## [1] 0.5 #if we include lower.tail = FALSE it returns integral from q to +inf pnorm(4.5, 5, 1, lower.tail = FALSE) ## [1] 0.6914625 0.8 0.4 0.0 pnorm(x, 3, 0.5) curve(pnorm(x, 3, 0.5), from = 0, to = 6) 0 1 2 3 4 x 3 5 6 4.0 3.0 2.0 qnorm(x, 3, 0.5) curve(qnorm(x, 3, 0.5), from = 0, to = 1) 0.0 0.2 0.4 0.6 0.8 1.0 x #The quantile function is given by qnorm(x), which is the inverse of pnorm() qnorm(c(0.025, 0.975)) #critical values for two tailed 95% CI ## [1] -1.959964 1.959964 pnorm(qnorm(c(0.025, 0.975))) ## [1] 0.025 0.975 qnorm(0.05) ## [1] -1.644854 qnorm(0.95, lower.tail = FALSE) ## [1] -1.644854 Skew Normal Distribution It’s an extension of the normal pdf, that allows for skewness. Let φ and Φ be respectively standard normal pdf and CDF. The pdf: f (x; α) = 2φ(x)Φ(αX) where α is the shape parameter has the following properties: • if α = 0 it coincides with standard normal (symmetric). 4 • the skweness increases as α increases in absolute value. To the right if α > 0, to the left if α < 0. 0 −3 2 0.0 0.3 dsn(x) 0.6 .. −4 −2 0 2 4 x Location scale transformation: given the random variable X with pdf f (x; α), the linear transformation Y = µ + λX is said to have a skew-normal distribution, with location paramter µ, scale parameter λ and shape parameter α, so Y ∼ SN [µ, λ2 , α]. √ Let δ = α/ 1 + α2 , then: p • E[Y ] = µ + λ 2/πδ. 2 • V ar[Y ] = λ2 (1 − 2δπ ) √ (δ 2/π)3 • Sk[Y ] = 4−π 2 (1−2δ 2 /π)3/2 √ (δ 2/π)4 • Kur[Y ] = 3 + 2(π − 3) (1−2δ2 /π)2 Student’s-t Distribution T is a Student’s t distribution with ν degrees of freedom with the following density function Γ( v+1 ) fT (x) = √ 2 ν νπΓ( 2 ) − ν+1 2 x2 1+ ν where: • • • • • E[T ] = 0, exists only if ν > 1. ν V ar[T ] = ν−2 if ν > 2, ∞ for 1 < ν ≤ 2 and indefinite otherwise. Sk[T ] = 0, for ν > 3. 6 Kur[T ] = 3 + ν−4 for ν > 4 –> always greater (or equal) than 3. When ν → ∞ it converges to a Gaussian. 5 0.4 0.2 1 5 10 400 0.0 dt(x, 1) v −4 −2 0 2 4 x v Student's−t Gaussian 0.2 1 5 10 0.0 dnorm(x) 0.4 Comparing a Student’s t with a Gaussian: −4 −2 0 2 4 x t-distribution We can extend the standardized t-distribution by introducing a location and scale parameter with a lineal transformation. If T has a tν distribution, then Y = µ + λT, µ ∈ R, λ 6= 0 is said to have a tν [µ, λ2 ] distribution, where µ is the location parameter and λ is the scale parameter. tν [µ = 0, λ = 1] = tν . We have that: • E[Y ] = µ, for ν > 1. ν • V ar[T ] = λ2 ν−2 if ν > 2. • Sk[T ] = 0, for ν > 3. 6 6 • Kur[T ] = 3 + ν−4 for ν > 4, and Kur[T ] = +∞ for 2 < ν ≤ 4. • ν can take any value above 0, not just the integers. Lognormal Distribution The lognormal distribution is a distribution whose logarithm has a normal distribution, this can be used (though not always optimally) to model right-skewed financial data. Let X ∼ N [µ, σ 2 ] and Y = exp(X). Then Y is said to have a lognormal distribution Y ∼ lnN [µ, σ 2 ]. The two parameters are called log-mean and log-variance, but actually they are the expected value and variance of log(Y ) (the normal). We have that E(Y ) = exp(µ + σ 2 /2) and V ar(Y ) = (exp(2µ + σ 2 )(exp(σ 2 ) − 1)) = (E(Y ))2 (exp(σ 2 ) − 1) –> Recall Jensen’s inequality. Let X ∼ lnN [µ, σ 2 ]: • • • • The log-mean µ is a scale parameter. The log-standard deviation p σ is a shape parameter. Sk(X) = (exp(σ 2 )) + 2 exp(σ 2 ) − 1 –> right skewed. Kur(Y ) = exp(4σ 4 ) + 2exp(3σ 2 ) + 3exp(2σ 2 ) − 3 –> often with an important kurtosis. 0.3 0.0 dlnorm(x) 0.6 curve(dlnorm, to = 6) 0 1 2 3 4 x curve(plnorm, to = 6) 7 5 6 0.8 0.4 0.0 plnorm(x) 0 1 2 3 4 5 6 x Notice that with quantiles (only) we can: qlnorm(p = 0.95) ## [1] 5.180252 exp(qnorm(p = 0.95)) ## [1] 5.180252 The Binomial Distribution We conduct n experiments and on each there are two possible outcomes, the probability of one (success) is p and the probability of “failure” is q = 1-p. It’s assumed p and q are constant. The PDF of the Binomial(n,p) is: n k n−k P (Y = k) = p q , k = 0, 1, 2, ..., n. k n! where nk = k!(n−k)! , E(Y ) = np and V ar(Y ) = npq. The Binomial(1,p) is also called the Bernoulli distribution and its density is: P (Y = y) = py (1 − p)1−y , y = 0, 1. py is equal to either p (when y = 1 ) or 1 (when y = 0 ), and same for $(1-p)ˆ{1-y}. plot(0:20, dbinom(0:20, 20, 0.3), type = 'h') 8 0.10 0.00 dbinom(0:20, 20, 0.3) 0 5 10 15 20 0:20 The Uniform Distribution A Uniform(a,b) on the interval (a,b) has a PDF equal to 1/(b − a) on (a,b) and 0 outside this interval. E(Y ) = V ar(Y ) = . a+b 2 (b − a)2 12 The χ2 distribution Definition: Let Z ∼ N (0, 1). Then Z 2 ∼ χ21 , chi-square with 1 degree of freedom. Pk Sum: Let Zi ∼ N (0, 1) with i=1,. . . ,k independently. Then X = i=1 Zi2 ∼ χ2k with: • E[X] = k. • V ar[X] = 2k. • M ode = max(k − 2, 0) Its density function is: fX (x) = 1 xk/2−1 e−x/2 2k/2 Γ(k/2) where Γ() is the Gamma function: Γ(z) = Z ∞ exp(−x)xz−1 dx 0 9 0.0 0.4 0.8 1.2 dchisq(x, 1) k 1 2 3 5 0 2 4 6 8 10 x Double Expontential (Laplace) Distribution The random variable X follows a Laplace distribution with mean µ and scale parameter θ, so X ∼ Laplace[µ, θ] if X has pdf: 1 |x − µ| f (x) = exp − , −∞ < x < ∞ 2θ θ with −∞ < µ + ∞ and θ > 0. The distribution is symmetric about µ with: E[X] = µ. V ar[X] = 2θ2 . Sk[X] = 0. Kur[X] = 6. Laplace[0,1] N[0,1] Laplace[0,v2] 0.2 0.0 f(x) 0.4 • • • • −4 −2 0 2 x 10 4