Department of Statistics, The Chinese University of Hong Kong RMSC 2001 Introduction to Risk Management (Term 1, 2020–21) Tutorial 1 · Sihan Chen · 17th September 2020 1 Basic Probability 1.1 Probability 1. Basic set theory and notations: empty set ∅, subset ⊂, union ∪, intersection ∩, complement Ac . 2. The sample space Ω is the set containing all possible outcomes. 3. An event E is a subset of Ω. 4. The event space F is a collection of events of Ω. 5. For a sample space containing n distinct elements, there are 2n distinct events and hence there are 2n elements in F. 6. A probability measure is a function P : F → [0, 1] satisfying: (a) P (Ω) = 1; (b) P (∪∞ n=1 An ) = P∞ n=1 P (An ) for any disjoint events {An ∈ F}∞ n=1 . Remark: A probability space refers to the triple (Ω, F, P ). Example: Consider flipping a coin once. The sample space and event space are: Ω = {H, T }, F = {∅, {H}, {T }, {H, T }}. We can then assign probabilities to the events in F by defining a function P such that P ({T }) = 0.6. P ({H}) = 0.4, Then we can verify that P is a probability measure. Exercise: How about flipping a coin twice? 1.2 Random Variables A random variable (r.v.) is NOT a variable that is random. It is defined as: 1. A random variable is a function X : Ω → R. 2. Given x ∈ R, write {X = x} := {ω ∈ Ω : X(ω) = x} as an event in the event space F of Ω. 3. Suppose a r.v. X is defined on (Ω, F, P ). A set S ∈ R is called a support of X if P {X ∈ S} = 1. Remark: A r.v. is a representation of outcomes ω ∈ Ω by real numbers for convenience of calculation, as ω’s themselves may not be numerical. Example: Let Ω = {ω1 , ω2 , ω3 }. Define a r.v. X : Ω → R s.t. X(ω1 ) = X(ω2 ) = 1, 1 X(ω3 ) = 0. Therefore we have {X = 1} = {ω1 , ω2 }. Most r.v.’s are either discrete or continuous, here we talk about some of their properties. 1.2.1 Discrete r.v. 1. The support S is a countable set. 2. P f (x) = P (X = x) is the probability mass function (pmf) of X, satisfying f (x) ∈ [0, 1] and x∈S f (x) = 1. 3. F (x) = P (X ≤ x) is the cumulative distribution function (cdf) of X. Discrete Dist. Uniform Bernoulli Binomial Poisson Geometric Hyper-Geometric Negative Binomial Notation U{1, ..., m} B(1, p) B(n, p) P o(λ) Geom(p) HG(r, n, m) N B(p, r) pmf f (x) = m−1 1{x=1,...,m} f (x) = px (1 − p)n−x 1{x=0,1} f (x) = nr px (1 − p)n−x 1{x=0,...,n} f (x) = λx e−λ /x!1{x=0,1,...} f (x) = p(1 − p)x−1 1{x=1,2,...} m N / r 1{x=0,...,r∧n;r−x≤m} f (x) = nx r−x r f (x) = x−1 p (1 − p)x−r 1{x=r,r+1,...} r−1 Mean (m + 1)/2 p np λ 1/p rn/N r/p Variance (m2 − 1)/12 p(1 − p) np(1 − p) λ (1 − p)/p2 rnm(N − r)/[N 2 (N − 1)] r(1 − p)/p2 Table 1: Some Common Discrete Distributions 1.2.2 Continuous r.v. 1. F (x) = P (X ≤ x), the cdf of X, is continuous. 2. Probability density function (pdf) of X, f (x), satisfies f (x) ≥ 0 and 3. P (a ≤ X ≤ b) = Continuous Dist. Uniform Normal Exponential Gamma Beta Rb a R∞ −∞ f (x)dx f (x)dx and P (X = x) = 0, ∀x ∈ R. Notation U[a, b] N (µ, σ 2 ) Exp(θ) Gamma(α, γ) Beta(α, β) pdf f (x) = (b − a)−1 1{a<x<b} f (x) = (2πσ 2 )−1/2 exp{−(x − µ)2 /(2σ 2 )} f (x) = θ−1 exp{−x/θ}1{x>0} f (x) = xα−1 e−x/γ /[Γ(α)γ α ]1{x>0} f (x) = xα−1 (1 − x)β−1 /B(α, β)1{0<x<1} Mean (a + b)/2 µ θ αγ α/(α + β) Table 2: Some Common Continuous Distributions 1.2.3 = 1. Joint distribution 1. Joint pmf (discrete): • fX,Y (x, y) = P {(X, Y ) = (x, y)} ∈ [0, 1], where (x, y) ∈ SX × SY . P P • x∈SX y∈SY fX,Y (x, y) = 1. 2. Joint pdf (continuous): • fX,Y (x, y) ≥ 0 2 Variance (b − a)2 /12 σ2 θ2 αγ 2 αβ/[(α + β + 1)(α + β)2 ] • R∞ R∞ −∞ −∞ fX,Y (x, y)dxdy = 1. • P {(X, Y ) ∈ [x1 , x2 ] × [y1 , y2 ]} = R y2 R x2 y1 x1 fX,Y (x, y)dxdy, ∀x1 < x2 , y1 < y2 . D Remark: If FX (t) = FY (t)∀t, then X and Y are identically distributed, written X = Y . Exercise: Find the value of λ s.t. f (x) = λe−|x| , x ∈ R is a pdf. R∞ R∞ R0 R∞ R∞ Solution: −∞ f (x)dx = −∞ λe−|x| dx = −∞ λex dx + 0 λe−x dx = 2 0 λe−x dx = 2[−λe−x ]∞ 0 = R∞ 2 × [−(0 − λ × 1)] = 2λ, in order that f (x) is a pdf, it must satisfy −∞ f (x)dx = 1, hence we have λ = 12 . 1.3 Independence 1. Two events A and B are independent, denoted by A ⊥⊥ B, iff P (A ∩ B) = P (A)P (B). 2. Two r.v.’s X and Y are independent, denoted by X ⊥⊥ Y , iff FX,Y (x, y) = FX (x)FY (y), ∀x, y; equivalently, fX,Y (x, y) = fX (x)fY (y), ∀x, y, provided that the densities exist. D Remark 1: X and Y are identically and independently distributed (i.i.d.) iff X = Y and X ⊥⊥ Y . Remark 2: If A ∩ B = ∅, A and B are mutually exclusive, which does not mean independence. Exercise: Can two sets A, B be both independent and mutually exclusive? Solution. Suppose such two sets A, B exist. Since they are mutually exclusive, we must have A ∩ B = ∅, therefore P (A ∩ B) = 0, as for any probability measure P , we must have P (∅) = 0. Also, since A, B are independent, we have 0 = P (A ∩ B) = P (A)P (B), therefore at least one of the events A and B are of probability zero. This is the only such case. Optional: To prove that for any probability measure P , we must have P (∅) = 0: Note that ∅∩∅ = ∅, therefore all the empty sets are disjoint, also the union of empty sets is still an empty set. Apply the property (b) for probability P∞ ∞ in page 1 (1.1.6), let A1 , A2 , ... be all empty sets, then they are all disjoint, then P (∪n=1 An ) = n=1 P (An ) implies P (∅) = P (∅) + P (∅) + ..., which means P (∅) = 0. 1.4 Expectation, Variance & Covariance Let X, Y be two r.v.’s. Suppose SX and fX are support and pdf of X, let g be a nice enough function. Here are some definitions: 1. Law of Unconsciousness of Statistician: P • Discrete: E[g(X)] := x∈SX g(x)fX (x). R • Continuous: E[g(X)] := SX g(x)fX (x)dx. 2. Expectation of X: EX. 3. Variance of X: V ar(X) := E[(X − EX)2 ] = E(X 2 ) − (EX)2 . p 4. Standard deviation of X: SD(X) := V ar(X). 5. Covariance of X and Y : Cov(X, Y ) := E[(X − EX)(Y − EY )] = E(XY ) − (EX)(EY ). 6. Correlation of X and Y : Corr(X, Y ) := √ Cov(X,Y ) . V ar(X)V ar(Y ) 7. X and Y are uncorrelated, written X⊥Y , iff Corr(X, Y ) = 0. 3 Some useful properties: 1. E(aX + bY + c) = aEX + bEY + c. 2. E(XY ) = EXEY if X ⊥ ⊥Y. 3. E(g(X)) 6= g(EX) in general, e.g., E(X 2 ) 6= (EX)2 . 4. Cov(X, Y ) = Cov(Y, X). 5. V ar(X) = Cov(X, X) 6. Cov(aX + b, cY + d) = acCov(X, Y ), ∀a, b, c, d ∈ R. 7. Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z). 8. V ar(X ± Y ) = V ar(X) + V ar(Y ) ± 2Cov(X, Y ). 9. V ar(X ± Y ) = V ar(X) + V ar(Y ) if X, Y are uncorrelated. 10. Corr(X, Y ) ∈ [−1, 1]. 11. Corr(aX + b, cY + d) = Corr(X, Y ) if a, c have the same sign. 12. Corr(aX + b, cY + d) = −Corr(X, Y ) if a, c have the opposite sign. Remark: In general, for two r.v.’s X, Y , {X ⊥⊥ Y } ⇒ {X⊥Y }, but inverse not true. Example: Let X ∼ U[−1, 1], Y = X 2 , then Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 0 − 0 = 0 ⇒ Corr(X, Y ) = 0 ⇒ X⊥Y . However, X, Y are not independent as Y is a function of X. 2 Asymptotic Statistics 2.1 Markov’s Inequality Markov’s inequality gives an upper bound for the probability that a non-negative random variable is greater than or equal to some positive constant. Theorem 2.1. Let X be a random variable with density f such that E|X| exists and a > 0, then P (|X| ≥ a) ≤ 2.2 E|X| a (1) Chebyshev’s Inequality Chebyshev’s Inequality is another important inequality that can be used to prove the Weak Law of Large Number, and it can be proved in a similar way. Theorem 2.2. Let X be a random variable with density f with E|X| = µ and V ar(X) = σ 2 , then ∀k > 0, we have V ar(X) P (|X − µ| ≥ k) ≤ . (2) k2 4 2.3 Weak Law of Large Number The P Weak Law of Large Number (WLLN) concerns the limiting behaviour (i.e., when n → ∞) of X¯n := n −1 n i=1 Xi . Theorem 2.3. If X1 , X2 . . . are iid (independent and identically distributed) r.v.’s, with EXi = µ and V ar(Xi ) = σ 2 < ∞ for all i, then for all > 0, lim P (|X¯n − µ| > ) = 0. (3) n→∞ Optional: A sequence of r.v.’s Y1 , Y2 , . . . converges to a r.v. Y in probability if ∀ > 0, P (|Yn − Y | > ) → 0 p as n → ∞, denoted by Yn → Y . The r.v. Y can also be replaced by a constant, like µ above, hence (3) can be ¯p rewritten as Xn → µ. P Proof. Let X¯n := n−1 ni=1 Xi , since EXi = µ and V ar(Xi ) = σ 2 for all i, we have E X¯n = µ and 2 V ar(X¯n ) = σn (Why?). Apply Chebyshev’s Inequality by replacing X by X¯n , we have ∀ > 0, σ 2 /n P (|X¯n − µ| > ) ≤ 2 . As n → ∞, σ 2 /n → 0, since 2 > 0, the result follows. Optional: You may have noticed that by Chebyshev’s Inequality, we should talk about P (|X¯n − µ| ≥ ), while WLLN concerns on P (|X¯n − µ| > ). However, note that X > is equivalent to (X ≥ 0 , ∀0 < ), therefore ∀ > 0, P (|X¯n − µ| ≥ ) is equivalent to ∀ > 0, P (|X¯n − µ| > ). Exercise 2.1. Let Li be a r.v. representing the loss of a company on the ith year. Assume that L1 , L2 , . . . are iid, and 0 < V ar(L1 ) < ∞. Suppose that you are the risk manager, and you are asked to predict the loss on the 10th year. 1. Give a (meaningful) bound on the probability that L10 will deviate from its mean by at most k times of its standard deviation, where k > 1. 2. You’d like to model L10 by an exponential distribution with mean µ ∈ R. Express the probability stated in (1) in terms of k. Compare it with the bound you found in (1). 3. Having observed Pseveral years’ data L1 , . . . , L9 , please predict L10 directly by a reasonable estimate of L10 . It is given that 9i=1 Li = 8. Solution. 1. We want to find a bound for P (|L10 − µ| < kσ), where we let µ = EL10 and σ 2 = V ar(L10 ). By Chebyshev’s Inequality, we have P (|L10 − µ| ≥ kσ) ≤ σ2 1 = 2. (kσ)2 k Therefore we have P (|L10 − µ| < kσ) ≥ 1 − k −2 . −x 2. Suppose L10 ∼ exp(µ), then the pdf of L10 will be f (x) = µ1 e µ , V ar(L10 ) = µ2 . The same as in (1), suppose k > 1, note that the value of an exponential dist must be non-negative, 5 then: µ+kµ Z P (|L10 − µ| < kσ) = 0 1 − µx e dx = µ 1+k Z e−y dy = 1 − e−(k+1) . 0 You can plug in different values of k to compare the results in 1 and 2. 3. Since {Li }10 i=1 are iid, then we predict L10 by µ̂, the estimated mean of Li ’s, where µ̂ = 8 9. 1 9 P9 i=1 Li = Exercise 2.2. (Optional) Suppose X1 , . . . Xn are iid r.v.’s with mean 2µ and variance σ 2 < ∞, Y1 , . . . Y3n are iid r.v.’s with mean µ and variance ν 2 < ∞, n ∈ N. Assume all Xi ’s and Yj ’s are also independent, let µ̂n := P3n 1 Pn j=1 Yj ), prove that ∀ > 0, as n → ∞, we have P (|µ̂n − µ| > ) → 0. 5n ( i=1 Xi + Proof. Define Zi := (Xi + Yi + Yn+i + Y2n+i )/5, i = 1, . . . , n. Note that Zi ’s are iid r.v.’s since Xi ’s 1 and Yi ’s are iid. Also note that EZi = 15 (EXi + 3EYi ) = µ and V ar(Zi ) = 25 (σ 2 + 3ν 2 ) < ∞. Let P n 1 Z¯n = n i=1 Zi , then µ̂n = Z¯n by definition. Applying WLLN on Z¯n yields that P (|Z¯n − µ| > ) → 0, ∀ > 0, as n → ∞. The result follows. 2.4 Central Limit Theorem The WLLN tells the limiting location of X̄, while the CLT tells the limiting variability of it. Theorem 2.4. If X1 , X2 , . . . are independent r.v.’s, with EXi = µ, V ar(Xi ) = σ 2 ∈ (0, ∞), ∀i, then as n → ∞, n X X̄ − µ D √ → N (0, 1), X̄ = Xi . σ/ n i=1 3 Basic Mathematics 3.1 Limit of Series Consider a sequence {xi }i∈N of real numbers. Pn P P 1. We write limn→∞ ni=1 xi = C or just ∞ i=1 xi = C, if the value of P i=1 xi tends to some constant C as n going to infinity (formally: if ∃C ∈ R s.t. ∀ > 0, ∃N ∈ N+ s.t. | ni=1 xi − C| < , ∀n > N ). 2. Arithmetic Progression (AP): If xi+1 − xi = d, ∀i ∈ N, then • • Pn i=1 xi P∞ i=1 xi = (x1 +xn )n 2 = [2x1 +(n−1)d]n . 2 does not exist unless xi = 0, ∀i. 3. Geometric Progression (GP): If xi+1 /xi = r, ∀i ∈ N, then • • Pn i=1 xi P∞ i=1 xi = = x1 (1−rn ) 1−r . x1 1−r only if |r| < 1, otherwise it does not exist. 4. Euler’s number: For any real x, ∞ X xn n=1 n! = ex , lim n→∞ 6 1+ x n = ex . n 3.2 Differentiation Let f, g be real-valued functions of x, write f 0 (x) = 1. Sum: d dx [f (x) + g(x)] = f 0 (x) + g 0 (x). 2. Product: d dx [f (x)g(x)] 3. Fraction: d f (x) dx [ g(x) ] 4. Chain Rule: d dx f (x), and the same for g. = = f (x)g 0 (x) + f 0 (x)g(x). f 0 (x)g(x)−f (x)g 0 (x) . g 2 (x) d dx f (g(x)) df (g(x)) dg(x) dx . dg(x) = Exercise: Differentiate f (x) = cos(x2 ex ). Solution. 3.3 d[cos(x2 ex )] d(x2 ex ) d f (x) = = − sin(x2 ex )(2xex + x2 ex ). dx d(x2 ex ) dx Integration Integration by part: b Z f (x)dg(x) = [f (x)g(x)]ba a Z b − g(x)df (x) a d The idea comes Rfrom the product rule of differentiation: dx [f (x)g(x)] = f (x)g 0 (x) + f 0 (x)g(x). Example: Compute ln xdx. Z Z Z 1 ln xdx = x ln x − xd ln x = x ln x − x dx = x ln x − x + C x R Exercise: Compute ex cos xdx. Solution. Z Z x x Z x e cos xdx = e d sin x = e sin x − Z Z also − ex sin xdx = ex (− sin x)dx = Z x x Z sin xde = e sin x − ex d cos x = ex cos x − Z ex sin xdx, ex cos xdx, therefore Z Z Z ex sin x + ex cos x ex cos xdx = ex sin x + ex cos x − ex cos xdx ⇒ ex cos xdx = . 2 3.4 Convex Functions Let f : C → R, C ⊂ R be a function, f is said to be convex if ∀x, y ∈ C, ∀λ ∈ [0, 1], we have f ((1−λ)x+λy) ≤ (1 − λ)f (x) + λf (y). 7 • If f is differentiable, i.e., f 0 (x) exists, then f is convex iff f 0 (x) is monotonically non-decreasing, iff f (x) ≥ f (y) + f 0 (y)(x − y). • If f is twice differentiable, i.e., f 00 (x) exists, then f is convex iff f 00 (x) ≥ 0, ∀x ∈ C. • Strictly convex functions can be defined likewise, by replacing all the ‘≤’ and ‘≥’ above with ‘<’ and ‘>’ respectively. • f is concave iff −f is convex. 8