Department of Statistics, The Chinese University of Hong Kong
RMSC 2001 Introduction to Risk Management (Term 1, 2020–21)
Tutorial 1 · Sihan Chen · 17th September 2020
1
Basic Probability
1.1
Probability
1. Basic set theory and notations: empty set ∅, subset ⊂, union ∪, intersection ∩, complement Ac .
2. The sample space Ω is the set containing all possible outcomes.
3. An event E is a subset of Ω.
4. The event space F is a collection of events of Ω.
5. For a sample space containing n distinct elements, there are 2n distinct events and hence there are 2n
elements in F.
6. A probability measure is a function P : F → [0, 1] satisfying:
(a) P (Ω) = 1;
(b) P (∪∞
n=1 An ) =
P∞
n=1 P (An ) for any disjoint events {An
∈ F}∞
n=1 .
Remark: A probability space refers to the triple (Ω, F, P ).
Example: Consider flipping a coin once. The sample space and event space are:
Ω = {H, T },
F = {∅, {H}, {T }, {H, T }}.
We can then assign probabilities to the events in F by defining a function P such that
P ({T }) = 0.6.
P ({H}) = 0.4,
Then we can verify that P is a probability measure.
Exercise: How about flipping a coin twice?
1.2
Random Variables
A random variable (r.v.) is NOT a variable that is random. It is defined as:
1. A random variable is a function X : Ω → R.
2. Given x ∈ R, write {X = x} := {ω ∈ Ω : X(ω) = x} as an event in the event space F of Ω.
3. Suppose a r.v. X is defined on (Ω, F, P ). A set S ∈ R is called a support of X if P {X ∈ S} = 1.
Remark: A r.v. is a representation of outcomes ω ∈ Ω by real numbers for convenience of calculation, as ω’s
themselves may not be numerical.
Example: Let Ω = {ω1 , ω2 , ω3 }. Define a r.v. X : Ω → R s.t.
X(ω1 ) = X(ω2 ) = 1,
1
X(ω3 ) = 0.
Therefore we have {X = 1} = {ω1 , ω2 }.
Most r.v.’s are either discrete or continuous, here we talk about some of their properties.
1.2.1
Discrete r.v.
1. The support S is a countable set.
2. P
f (x) = P (X = x) is the probability mass function (pmf) of X, satisfying f (x) ∈ [0, 1] and
x∈S f (x) = 1.
3. F (x) = P (X ≤ x) is the cumulative distribution function (cdf) of X.
Discrete Dist.
Uniform
Bernoulli
Binomial
Poisson
Geometric
Hyper-Geometric
Negative Binomial
Notation
U{1, ..., m}
B(1, p)
B(n, p)
P o(λ)
Geom(p)
HG(r, n, m)
N B(p, r)
pmf
f (x) = m−1 1{x=1,...,m}
f (x) = px (1 − p)n−x 1{x=0,1}
f (x) = nr px (1 − p)n−x 1{x=0,...,n}
f (x) = λx e−λ /x!1{x=0,1,...}
f (x) = p(1 − p)x−1 1{x=1,2,...}
m N
/ r 1{x=0,...,r∧n;r−x≤m}
f (x) = nx r−x
r
f (x) = x−1
p
(1
− p)x−r 1{x=r,r+1,...}
r−1
Mean
(m + 1)/2
p
np
λ
1/p
rn/N
r/p
Variance
(m2 − 1)/12
p(1 − p)
np(1 − p)
λ
(1 − p)/p2
rnm(N − r)/[N 2 (N − 1)]
r(1 − p)/p2
Table 1: Some Common Discrete Distributions
1.2.2
Continuous r.v.
1. F (x) = P (X ≤ x), the cdf of X, is continuous.
2. Probability density function (pdf) of X, f (x), satisfies f (x) ≥ 0 and
3. P (a ≤ X ≤ b) =
Continuous Dist.
Uniform
Normal
Exponential
Gamma
Beta
Rb
a
R∞
−∞ f (x)dx
f (x)dx and P (X = x) = 0, ∀x ∈ R.
Notation
U[a, b]
N (µ, σ 2 )
Exp(θ)
Gamma(α, γ)
Beta(α, β)
pdf
f (x) = (b − a)−1 1{a<x<b}
f (x) = (2πσ 2 )−1/2 exp{−(x − µ)2 /(2σ 2 )}
f (x) = θ−1 exp{−x/θ}1{x>0}
f (x) = xα−1 e−x/γ /[Γ(α)γ α ]1{x>0}
f (x) = xα−1 (1 − x)β−1 /B(α, β)1{0<x<1}
Mean
(a + b)/2
µ
θ
αγ
α/(α + β)
Table 2: Some Common Continuous Distributions
1.2.3
= 1.
Joint distribution
1. Joint pmf (discrete):
• fX,Y (x, y) = P {(X, Y ) = (x, y)} ∈ [0, 1], where (x, y) ∈ SX × SY .
P
P
•
x∈SX
y∈SY fX,Y (x, y) = 1.
2. Joint pdf (continuous):
• fX,Y (x, y) ≥ 0
2
Variance
(b − a)2 /12
σ2
θ2
αγ 2
αβ/[(α + β + 1)(α + β)2 ]
•
R∞ R∞
−∞ −∞ fX,Y (x, y)dxdy
= 1.
• P {(X, Y ) ∈ [x1 , x2 ] × [y1 , y2 ]} =
R y2 R x2
y1
x1
fX,Y (x, y)dxdy, ∀x1 < x2 , y1 < y2 .
D
Remark: If FX (t) = FY (t)∀t, then X and Y are identically distributed, written X = Y .
Exercise: Find the value of λ s.t. f (x) = λe−|x| , x ∈ R is a pdf.
R∞
R∞
R0
R∞
R∞
Solution: −∞ f (x)dx = −∞ λe−|x| dx = −∞ λex dx + 0 λe−x dx = 2 0 λe−x dx = 2[−λe−x ]∞
0 =
R∞
2 × [−(0 − λ × 1)] = 2λ, in order that f (x) is a pdf, it must satisfy −∞ f (x)dx = 1, hence we have λ = 12 .
1.3
Independence
1. Two events A and B are independent, denoted by A ⊥⊥ B, iff P (A ∩ B) = P (A)P (B).
2. Two r.v.’s X and Y are independent, denoted by X ⊥⊥ Y , iff FX,Y (x, y) = FX (x)FY (y), ∀x, y; equivalently, fX,Y (x, y) = fX (x)fY (y), ∀x, y, provided that the densities exist.
D
Remark 1: X and Y are identically and independently distributed (i.i.d.) iff X = Y and X ⊥⊥ Y .
Remark 2: If A ∩ B = ∅, A and B are mutually exclusive, which does not mean independence.
Exercise: Can two sets A, B be both independent and mutually exclusive?
Solution. Suppose such two sets A, B exist. Since they are mutually exclusive, we must have A ∩ B = ∅,
therefore P (A ∩ B) = 0, as for any probability measure P , we must have P (∅) = 0. Also, since A, B are
independent, we have 0 = P (A ∩ B) = P (A)P (B), therefore at least one of the events A and B are of
probability zero. This is the only such case.
Optional: To prove that for any probability measure P , we must have P (∅) = 0: Note that ∅∩∅ = ∅, therefore all
the empty sets are disjoint, also the union of empty sets is still an empty set. Apply the property (b) for
probability
P∞
∞
in page 1 (1.1.6), let A1 , A2 , ... be all empty sets, then they are all disjoint, then P (∪n=1 An ) =
n=1 P (An )
implies P (∅) = P (∅) + P (∅) + ..., which means P (∅) = 0.
1.4
Expectation, Variance & Covariance
Let X, Y be two r.v.’s. Suppose SX and fX are support and pdf of X, let g be a nice enough function. Here are
some definitions:
1. Law of Unconsciousness of Statistician:
P
• Discrete: E[g(X)] := x∈SX g(x)fX (x).
R
• Continuous: E[g(X)] := SX g(x)fX (x)dx.
2. Expectation of X: EX.
3. Variance of X: V ar(X) := E[(X − EX)2 ] = E(X 2 ) − (EX)2 .
p
4. Standard deviation of X: SD(X) := V ar(X).
5. Covariance of X and Y : Cov(X, Y ) := E[(X − EX)(Y − EY )] = E(XY ) − (EX)(EY ).
6. Correlation of X and Y : Corr(X, Y ) := √
Cov(X,Y )
.
V ar(X)V ar(Y )
7. X and Y are uncorrelated, written X⊥Y , iff Corr(X, Y ) = 0.
3
Some useful properties:
1. E(aX + bY + c) = aEX + bEY + c.
2. E(XY ) = EXEY if X ⊥
⊥Y.
3. E(g(X)) 6= g(EX) in general, e.g., E(X 2 ) 6= (EX)2 .
4. Cov(X, Y ) = Cov(Y, X).
5. V ar(X) = Cov(X, X)
6. Cov(aX + b, cY + d) = acCov(X, Y ), ∀a, b, c, d ∈ R.
7. Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z).
8. V ar(X ± Y ) = V ar(X) + V ar(Y ) ± 2Cov(X, Y ).
9. V ar(X ± Y ) = V ar(X) + V ar(Y ) if X, Y are uncorrelated.
10. Corr(X, Y ) ∈ [−1, 1].
11. Corr(aX + b, cY + d) = Corr(X, Y ) if a, c have the same sign.
12. Corr(aX + b, cY + d) = −Corr(X, Y ) if a, c have the opposite sign.
Remark: In general, for two r.v.’s X, Y , {X ⊥⊥ Y } ⇒ {X⊥Y }, but inverse not true.
Example: Let X ∼ U[−1, 1], Y = X 2 , then Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 0 − 0 = 0 ⇒
Corr(X, Y ) = 0 ⇒ X⊥Y . However, X, Y are not independent as Y is a function of X.
2
Asymptotic Statistics
2.1
Markov’s Inequality
Markov’s inequality gives an upper bound for the probability that a non-negative random variable is greater than
or equal to some positive constant.
Theorem 2.1. Let X be a random variable with density f such that E|X| exists and a > 0, then
P (|X| ≥ a) ≤
2.2
E|X|
a
(1)
Chebyshev’s Inequality
Chebyshev’s Inequality is another important inequality that can be used to prove the Weak Law of Large Number,
and it can be proved in a similar way.
Theorem 2.2. Let X be a random variable with density f with E|X| = µ and V ar(X) = σ 2 , then ∀k > 0, we
have
V ar(X)
P (|X − µ| ≥ k) ≤
.
(2)
k2
4
2.3
Weak Law of Large Number
The P
Weak Law of Large Number (WLLN) concerns the limiting behaviour (i.e., when n → ∞) of X¯n :=
n
−1
n
i=1 Xi .
Theorem 2.3. If X1 , X2 . . . are iid (independent and identically distributed) r.v.’s, with EXi = µ and V ar(Xi ) =
σ 2 < ∞ for all i, then for all > 0,
lim P (|X¯n − µ| > ) = 0.
(3)
n→∞
Optional: A sequence of r.v.’s Y1 , Y2 , . . . converges to a r.v. Y in probability if ∀ > 0, P (|Yn − Y | > ) → 0
p
as n → ∞, denoted by Yn → Y . The r.v. Y can also be replaced by a constant, like µ above, hence (3) can be
¯p
rewritten as Xn → µ.
P
Proof. Let X¯n := n−1 ni=1 Xi , since EXi = µ and V ar(Xi ) = σ 2 for all i, we have E X¯n = µ and
2
V ar(X¯n ) = σn (Why?). Apply Chebyshev’s Inequality by replacing X by X¯n , we have ∀ > 0,
σ 2 /n
P (|X¯n − µ| > ) ≤ 2 .
As n → ∞, σ 2 /n → 0, since 2 > 0, the result follows.
Optional: You may have noticed that by Chebyshev’s Inequality, we should talk about P (|X¯n − µ| ≥ ), while
WLLN concerns on P (|X¯n − µ| > ). However, note that X > is equivalent to (X ≥ 0 , ∀0 < ), therefore
∀ > 0, P (|X¯n − µ| ≥ ) is equivalent to ∀ > 0, P (|X¯n − µ| > ).
Exercise 2.1. Let Li be a r.v. representing the loss of a company on the ith year. Assume that L1 , L2 , . . . are iid, and
0 < V ar(L1 ) < ∞. Suppose that you are the risk manager, and you are asked to predict the loss on the 10th year.
1. Give a (meaningful) bound on the probability that L10 will deviate from its mean by at most k times of its
standard deviation, where k > 1.
2. You’d like to model L10 by an exponential distribution with mean µ ∈ R. Express the probability stated in
(1) in terms of k. Compare it with the bound you found in (1).
3. Having observed
Pseveral years’ data L1 , . . . , L9 , please predict L10 directly by a reasonable estimate of L10 .
It is given that 9i=1 Li = 8.
Solution.
1. We want to find a bound for P (|L10 − µ| < kσ), where we let µ = EL10 and σ 2 = V ar(L10 ). By
Chebyshev’s Inequality, we have
P (|L10 − µ| ≥ kσ) ≤
σ2
1
= 2.
(kσ)2
k
Therefore we have P (|L10 − µ| < kσ) ≥ 1 − k −2 .
−x
2. Suppose L10 ∼ exp(µ), then the pdf of L10 will be f (x) = µ1 e µ , V ar(L10 ) = µ2 .
The same as in (1), suppose k > 1, note that the value of an exponential dist must be non-negative,
5
then:
µ+kµ
Z
P (|L10 − µ| < kσ) =
0
1 − µx
e dx =
µ
1+k
Z
e−y dy = 1 − e−(k+1) .
0
You can plug in different values of k to compare the results in 1 and 2.
3. Since {Li }10
i=1 are iid, then we predict L10 by µ̂, the estimated mean of Li ’s, where µ̂ =
8
9.
1
9
P9
i=1 Li
=
Exercise 2.2. (Optional) Suppose X1 , . . . Xn are iid r.v.’s with mean 2µ and variance σ 2 < ∞, Y1 , . . . Y3n are
iid r.v.’s
with mean
µ and variance ν 2 < ∞, n ∈ N. Assume all Xi ’s and Yj ’s are also independent, let µ̂n :=
P3n
1 Pn
j=1 Yj ), prove that ∀ > 0, as n → ∞, we have P (|µ̂n − µ| > ) → 0.
5n ( i=1 Xi +
Proof. Define Zi := (Xi + Yi + Yn+i + Y2n+i )/5, i = 1, . . . , n. Note that Zi ’s are iid r.v.’s since Xi ’s
1
and Yi ’s are
iid. Also note that EZi = 15 (EXi + 3EYi ) = µ and V ar(Zi ) = 25
(σ 2 + 3ν 2 ) < ∞. Let
P
n
1
Z¯n = n i=1 Zi , then µ̂n = Z¯n by definition. Applying WLLN on Z¯n yields that P (|Z¯n − µ| > ) →
0, ∀ > 0, as n → ∞. The result follows.
2.4
Central Limit Theorem
The WLLN tells the limiting location of X̄, while the CLT tells the limiting variability of it.
Theorem 2.4. If X1 , X2 , . . . are independent r.v.’s, with EXi = µ, V ar(Xi ) = σ 2 ∈ (0, ∞), ∀i, then as
n → ∞,
n
X
X̄ − µ D
√ → N (0, 1), X̄ =
Xi .
σ/ n
i=1
3
Basic Mathematics
3.1
Limit of Series
Consider a sequence {xi }i∈N of real numbers.
Pn
P
P
1. We write limn→∞ ni=1 xi = C or just ∞
i=1 xi = C, if the value of P i=1 xi tends to some constant C
as n going to infinity (formally: if ∃C ∈ R s.t. ∀ > 0, ∃N ∈ N+ s.t. | ni=1 xi − C| < , ∀n > N ).
2. Arithmetic Progression (AP): If xi+1 − xi = d, ∀i ∈ N, then
•
•
Pn
i=1 xi
P∞
i=1 xi
=
(x1 +xn )n
2
=
[2x1 +(n−1)d]n
.
2
does not exist unless xi = 0, ∀i.
3. Geometric Progression (GP): If xi+1 /xi = r, ∀i ∈ N, then
•
•
Pn
i=1 xi
P∞
i=1 xi
=
=
x1 (1−rn )
1−r .
x1
1−r only if |r|
< 1, otherwise it does not exist.
4. Euler’s number: For any real x,
∞
X
xn
n=1
n!
= ex ,
lim
n→∞
6
1+
x n
= ex .
n
3.2
Differentiation
Let f, g be real-valued functions of x, write f 0 (x) =
1. Sum:
d
dx [f (x)
+ g(x)] = f 0 (x) + g 0 (x).
2. Product:
d
dx [f (x)g(x)]
3. Fraction:
d f (x)
dx [ g(x) ]
4. Chain Rule:
d
dx f (x), and the same for g.
=
= f (x)g 0 (x) + f 0 (x)g(x).
f 0 (x)g(x)−f (x)g 0 (x)
.
g 2 (x)
d
dx f (g(x))
df (g(x)) dg(x)
dx .
dg(x)
=
Exercise: Differentiate f (x) = cos(x2 ex ).
Solution.
3.3
d[cos(x2 ex )] d(x2 ex )
d
f (x) =
= − sin(x2 ex )(2xex + x2 ex ).
dx
d(x2 ex )
dx
Integration
Integration by part:
b
Z
f (x)dg(x) =
[f (x)g(x)]ba
a
Z
b
−
g(x)df (x)
a
d
The idea comes Rfrom the product rule of differentiation: dx
[f (x)g(x)] = f (x)g 0 (x) + f 0 (x)g(x).
Example: Compute ln xdx.
Z
Z
Z
1
ln xdx = x ln x − xd ln x = x ln x − x dx = x ln x − x + C
x
R
Exercise: Compute ex cos xdx.
Solution.
Z
Z
x
x
Z
x
e cos xdx =
e d sin x = e sin x −
Z
Z
also
−
ex sin xdx =
ex (− sin x)dx =
Z
x
x
Z
sin xde = e sin x −
ex d cos x = ex cos x −
Z
ex sin xdx,
ex cos xdx,
therefore
Z
Z
Z
ex sin x + ex cos x
ex cos xdx = ex sin x + ex cos x − ex cos xdx ⇒ ex cos xdx =
.
2
3.4
Convex Functions
Let f : C → R, C ⊂ R be a function, f is said to be convex if ∀x, y ∈ C, ∀λ ∈ [0, 1], we have f ((1−λ)x+λy) ≤
(1 − λ)f (x) + λf (y).
7
• If f is differentiable, i.e., f 0 (x) exists, then f is convex iff f 0 (x) is monotonically non-decreasing, iff
f (x) ≥ f (y) + f 0 (y)(x − y).
• If f is twice differentiable, i.e., f 00 (x) exists, then f is convex iff f 00 (x) ≥ 0, ∀x ∈ C.
• Strictly convex functions can be defined likewise, by replacing all the ‘≤’ and ‘≥’ above with ‘<’ and
‘>’ respectively.
• f is concave iff −f is convex.
8
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )