Uploaded by Felix Yung

2020S RMSC2001 T1

advertisement
Department of Statistics, The Chinese University of Hong Kong
RMSC 2001 Introduction to Risk Management (Term 1, 2020–21)
Tutorial 1 · Sihan Chen · 17th September 2020
1
Basic Probability
1.1
Probability
1. Basic set theory and notations: empty set ∅, subset ⊂, union ∪, intersection ∩, complement Ac .
2. The sample space Ω is the set containing all possible outcomes.
3. An event E is a subset of Ω.
4. The event space F is a collection of events of Ω.
5. For a sample space containing n distinct elements, there are 2n distinct events and hence there are 2n
elements in F.
6. A probability measure is a function P : F → [0, 1] satisfying:
(a) P (Ω) = 1;
(b) P (∪∞
n=1 An ) =
P∞
n=1 P (An ) for any disjoint events {An
∈ F}∞
n=1 .
Remark: A probability space refers to the triple (Ω, F, P ).
Example: Consider flipping a coin once. The sample space and event space are:
Ω = {H, T },
F = {∅, {H}, {T }, {H, T }}.
We can then assign probabilities to the events in F by defining a function P such that
P ({T }) = 0.6.
P ({H}) = 0.4,
Then we can verify that P is a probability measure.
Exercise: How about flipping a coin twice?
1.2
Random Variables
A random variable (r.v.) is NOT a variable that is random. It is defined as:
1. A random variable is a function X : Ω → R.
2. Given x ∈ R, write {X = x} := {ω ∈ Ω : X(ω) = x} as an event in the event space F of Ω.
3. Suppose a r.v. X is defined on (Ω, F, P ). A set S ∈ R is called a support of X if P {X ∈ S} = 1.
Remark: A r.v. is a representation of outcomes ω ∈ Ω by real numbers for convenience of calculation, as ω’s
themselves may not be numerical.
Example: Let Ω = {ω1 , ω2 , ω3 }. Define a r.v. X : Ω → R s.t.
X(ω1 ) = X(ω2 ) = 1,
1
X(ω3 ) = 0.
Therefore we have {X = 1} = {ω1 , ω2 }.
Most r.v.’s are either discrete or continuous, here we talk about some of their properties.
1.2.1
Discrete r.v.
1. The support S is a countable set.
2. P
f (x) = P (X = x) is the probability mass function (pmf) of X, satisfying f (x) ∈ [0, 1] and
x∈S f (x) = 1.
3. F (x) = P (X ≤ x) is the cumulative distribution function (cdf) of X.
Discrete Dist.
Uniform
Bernoulli
Binomial
Poisson
Geometric
Hyper-Geometric
Negative Binomial
Notation
U{1, ..., m}
B(1, p)
B(n, p)
P o(λ)
Geom(p)
HG(r, n, m)
N B(p, r)
pmf
f (x) = m−1 1{x=1,...,m}
f (x) = px (1 − p)n−x 1{x=0,1}
f (x) = nr px (1 − p)n−x 1{x=0,...,n}
f (x) = λx e−λ /x!1{x=0,1,...}
f (x) = p(1 − p)x−1 1{x=1,2,...}
m N
/ r 1{x=0,...,r∧n;r−x≤m}
f (x) = nx r−x
r
f (x) = x−1
p
(1
− p)x−r 1{x=r,r+1,...}
r−1
Mean
(m + 1)/2
p
np
λ
1/p
rn/N
r/p
Variance
(m2 − 1)/12
p(1 − p)
np(1 − p)
λ
(1 − p)/p2
rnm(N − r)/[N 2 (N − 1)]
r(1 − p)/p2
Table 1: Some Common Discrete Distributions
1.2.2
Continuous r.v.
1. F (x) = P (X ≤ x), the cdf of X, is continuous.
2. Probability density function (pdf) of X, f (x), satisfies f (x) ≥ 0 and
3. P (a ≤ X ≤ b) =
Continuous Dist.
Uniform
Normal
Exponential
Gamma
Beta
Rb
a
R∞
−∞ f (x)dx
f (x)dx and P (X = x) = 0, ∀x ∈ R.
Notation
U[a, b]
N (µ, σ 2 )
Exp(θ)
Gamma(α, γ)
Beta(α, β)
pdf
f (x) = (b − a)−1 1{a<x<b}
f (x) = (2πσ 2 )−1/2 exp{−(x − µ)2 /(2σ 2 )}
f (x) = θ−1 exp{−x/θ}1{x>0}
f (x) = xα−1 e−x/γ /[Γ(α)γ α ]1{x>0}
f (x) = xα−1 (1 − x)β−1 /B(α, β)1{0<x<1}
Mean
(a + b)/2
µ
θ
αγ
α/(α + β)
Table 2: Some Common Continuous Distributions
1.2.3
= 1.
Joint distribution
1. Joint pmf (discrete):
• fX,Y (x, y) = P {(X, Y ) = (x, y)} ∈ [0, 1], where (x, y) ∈ SX × SY .
P
P
•
x∈SX
y∈SY fX,Y (x, y) = 1.
2. Joint pdf (continuous):
• fX,Y (x, y) ≥ 0
2
Variance
(b − a)2 /12
σ2
θ2
αγ 2
αβ/[(α + β + 1)(α + β)2 ]
•
R∞ R∞
−∞ −∞ fX,Y (x, y)dxdy
= 1.
• P {(X, Y ) ∈ [x1 , x2 ] × [y1 , y2 ]} =
R y2 R x2
y1
x1
fX,Y (x, y)dxdy, ∀x1 < x2 , y1 < y2 .
D
Remark: If FX (t) = FY (t)∀t, then X and Y are identically distributed, written X = Y .
Exercise: Find the value of λ s.t. f (x) = λe−|x| , x ∈ R is a pdf.
R∞
R∞
R0
R∞
R∞
Solution: −∞ f (x)dx = −∞ λe−|x| dx = −∞ λex dx + 0 λe−x dx = 2 0 λe−x dx = 2[−λe−x ]∞
0 =
R∞
2 × [−(0 − λ × 1)] = 2λ, in order that f (x) is a pdf, it must satisfy −∞ f (x)dx = 1, hence we have λ = 12 .
1.3
Independence
1. Two events A and B are independent, denoted by A ⊥⊥ B, iff P (A ∩ B) = P (A)P (B).
2. Two r.v.’s X and Y are independent, denoted by X ⊥⊥ Y , iff FX,Y (x, y) = FX (x)FY (y), ∀x, y; equivalently, fX,Y (x, y) = fX (x)fY (y), ∀x, y, provided that the densities exist.
D
Remark 1: X and Y are identically and independently distributed (i.i.d.) iff X = Y and X ⊥⊥ Y .
Remark 2: If A ∩ B = ∅, A and B are mutually exclusive, which does not mean independence.
Exercise: Can two sets A, B be both independent and mutually exclusive?
Solution. Suppose such two sets A, B exist. Since they are mutually exclusive, we must have A ∩ B = ∅,
therefore P (A ∩ B) = 0, as for any probability measure P , we must have P (∅) = 0. Also, since A, B are
independent, we have 0 = P (A ∩ B) = P (A)P (B), therefore at least one of the events A and B are of
probability zero. This is the only such case.
Optional: To prove that for any probability measure P , we must have P (∅) = 0: Note that ∅∩∅ = ∅, therefore all
the empty sets are disjoint, also the union of empty sets is still an empty set. Apply the property (b) for
probability
P∞
∞
in page 1 (1.1.6), let A1 , A2 , ... be all empty sets, then they are all disjoint, then P (∪n=1 An ) =
n=1 P (An )
implies P (∅) = P (∅) + P (∅) + ..., which means P (∅) = 0.
1.4
Expectation, Variance & Covariance
Let X, Y be two r.v.’s. Suppose SX and fX are support and pdf of X, let g be a nice enough function. Here are
some definitions:
1. Law of Unconsciousness of Statistician:
P
• Discrete: E[g(X)] := x∈SX g(x)fX (x).
R
• Continuous: E[g(X)] := SX g(x)fX (x)dx.
2. Expectation of X: EX.
3. Variance of X: V ar(X) := E[(X − EX)2 ] = E(X 2 ) − (EX)2 .
p
4. Standard deviation of X: SD(X) := V ar(X).
5. Covariance of X and Y : Cov(X, Y ) := E[(X − EX)(Y − EY )] = E(XY ) − (EX)(EY ).
6. Correlation of X and Y : Corr(X, Y ) := √
Cov(X,Y )
.
V ar(X)V ar(Y )
7. X and Y are uncorrelated, written X⊥Y , iff Corr(X, Y ) = 0.
3
Some useful properties:
1. E(aX + bY + c) = aEX + bEY + c.
2. E(XY ) = EXEY if X ⊥
⊥Y.
3. E(g(X)) 6= g(EX) in general, e.g., E(X 2 ) 6= (EX)2 .
4. Cov(X, Y ) = Cov(Y, X).
5. V ar(X) = Cov(X, X)
6. Cov(aX + b, cY + d) = acCov(X, Y ), ∀a, b, c, d ∈ R.
7. Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z).
8. V ar(X ± Y ) = V ar(X) + V ar(Y ) ± 2Cov(X, Y ).
9. V ar(X ± Y ) = V ar(X) + V ar(Y ) if X, Y are uncorrelated.
10. Corr(X, Y ) ∈ [−1, 1].
11. Corr(aX + b, cY + d) = Corr(X, Y ) if a, c have the same sign.
12. Corr(aX + b, cY + d) = −Corr(X, Y ) if a, c have the opposite sign.
Remark: In general, for two r.v.’s X, Y , {X ⊥⊥ Y } ⇒ {X⊥Y }, but inverse not true.
Example: Let X ∼ U[−1, 1], Y = X 2 , then Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 0 − 0 = 0 ⇒
Corr(X, Y ) = 0 ⇒ X⊥Y . However, X, Y are not independent as Y is a function of X.
2
Asymptotic Statistics
2.1
Markov’s Inequality
Markov’s inequality gives an upper bound for the probability that a non-negative random variable is greater than
or equal to some positive constant.
Theorem 2.1. Let X be a random variable with density f such that E|X| exists and a > 0, then
P (|X| ≥ a) ≤
2.2
E|X|
a
(1)
Chebyshev’s Inequality
Chebyshev’s Inequality is another important inequality that can be used to prove the Weak Law of Large Number,
and it can be proved in a similar way.
Theorem 2.2. Let X be a random variable with density f with E|X| = µ and V ar(X) = σ 2 , then ∀k > 0, we
have
V ar(X)
P (|X − µ| ≥ k) ≤
.
(2)
k2
4
2.3
Weak Law of Large Number
The P
Weak Law of Large Number (WLLN) concerns the limiting behaviour (i.e., when n → ∞) of X¯n :=
n
−1
n
i=1 Xi .
Theorem 2.3. If X1 , X2 . . . are iid (independent and identically distributed) r.v.’s, with EXi = µ and V ar(Xi ) =
σ 2 < ∞ for all i, then for all > 0,
lim P (|X¯n − µ| > ) = 0.
(3)
n→∞
Optional: A sequence of r.v.’s Y1 , Y2 , . . . converges to a r.v. Y in probability if ∀ > 0, P (|Yn − Y | > ) → 0
p
as n → ∞, denoted by Yn → Y . The r.v. Y can also be replaced by a constant, like µ above, hence (3) can be
¯p
rewritten as Xn → µ.
P
Proof. Let X¯n := n−1 ni=1 Xi , since EXi = µ and V ar(Xi ) = σ 2 for all i, we have E X¯n = µ and
2
V ar(X¯n ) = σn (Why?). Apply Chebyshev’s Inequality by replacing X by X¯n , we have ∀ > 0,
σ 2 /n
P (|X¯n − µ| > ) ≤ 2 .
As n → ∞, σ 2 /n → 0, since 2 > 0, the result follows.
Optional: You may have noticed that by Chebyshev’s Inequality, we should talk about P (|X¯n − µ| ≥ ), while
WLLN concerns on P (|X¯n − µ| > ). However, note that X > is equivalent to (X ≥ 0 , ∀0 < ), therefore
∀ > 0, P (|X¯n − µ| ≥ ) is equivalent to ∀ > 0, P (|X¯n − µ| > ).
Exercise 2.1. Let Li be a r.v. representing the loss of a company on the ith year. Assume that L1 , L2 , . . . are iid, and
0 < V ar(L1 ) < ∞. Suppose that you are the risk manager, and you are asked to predict the loss on the 10th year.
1. Give a (meaningful) bound on the probability that L10 will deviate from its mean by at most k times of its
standard deviation, where k > 1.
2. You’d like to model L10 by an exponential distribution with mean µ ∈ R. Express the probability stated in
(1) in terms of k. Compare it with the bound you found in (1).
3. Having observed
Pseveral years’ data L1 , . . . , L9 , please predict L10 directly by a reasonable estimate of L10 .
It is given that 9i=1 Li = 8.
Solution.
1. We want to find a bound for P (|L10 − µ| < kσ), where we let µ = EL10 and σ 2 = V ar(L10 ). By
Chebyshev’s Inequality, we have
P (|L10 − µ| ≥ kσ) ≤
σ2
1
= 2.
(kσ)2
k
Therefore we have P (|L10 − µ| < kσ) ≥ 1 − k −2 .
−x
2. Suppose L10 ∼ exp(µ), then the pdf of L10 will be f (x) = µ1 e µ , V ar(L10 ) = µ2 .
The same as in (1), suppose k > 1, note that the value of an exponential dist must be non-negative,
5
then:
µ+kµ
Z
P (|L10 − µ| < kσ) =
0
1 − µx
e dx =
µ
1+k
Z
e−y dy = 1 − e−(k+1) .
0
You can plug in different values of k to compare the results in 1 and 2.
3. Since {Li }10
i=1 are iid, then we predict L10 by µ̂, the estimated mean of Li ’s, where µ̂ =
8
9.
1
9
P9
i=1 Li
=
Exercise 2.2. (Optional) Suppose X1 , . . . Xn are iid r.v.’s with mean 2µ and variance σ 2 < ∞, Y1 , . . . Y3n are
iid r.v.’s
with mean
µ and variance ν 2 < ∞, n ∈ N. Assume all Xi ’s and Yj ’s are also independent, let µ̂n :=
P3n
1 Pn
j=1 Yj ), prove that ∀ > 0, as n → ∞, we have P (|µ̂n − µ| > ) → 0.
5n ( i=1 Xi +
Proof. Define Zi := (Xi + Yi + Yn+i + Y2n+i )/5, i = 1, . . . , n. Note that Zi ’s are iid r.v.’s since Xi ’s
1
and Yi ’s are
iid. Also note that EZi = 15 (EXi + 3EYi ) = µ and V ar(Zi ) = 25
(σ 2 + 3ν 2 ) < ∞. Let
P
n
1
Z¯n = n i=1 Zi , then µ̂n = Z¯n by definition. Applying WLLN on Z¯n yields that P (|Z¯n − µ| > ) →
0, ∀ > 0, as n → ∞. The result follows.
2.4
Central Limit Theorem
The WLLN tells the limiting location of X̄, while the CLT tells the limiting variability of it.
Theorem 2.4. If X1 , X2 , . . . are independent r.v.’s, with EXi = µ, V ar(Xi ) = σ 2 ∈ (0, ∞), ∀i, then as
n → ∞,
n
X
X̄ − µ D
√ → N (0, 1), X̄ =
Xi .
σ/ n
i=1
3
Basic Mathematics
3.1
Limit of Series
Consider a sequence {xi }i∈N of real numbers.
Pn
P
P
1. We write limn→∞ ni=1 xi = C or just ∞
i=1 xi = C, if the value of P i=1 xi tends to some constant C
as n going to infinity (formally: if ∃C ∈ R s.t. ∀ > 0, ∃N ∈ N+ s.t. | ni=1 xi − C| < , ∀n > N ).
2. Arithmetic Progression (AP): If xi+1 − xi = d, ∀i ∈ N, then
•
•
Pn
i=1 xi
P∞
i=1 xi
=
(x1 +xn )n
2
=
[2x1 +(n−1)d]n
.
2
does not exist unless xi = 0, ∀i.
3. Geometric Progression (GP): If xi+1 /xi = r, ∀i ∈ N, then
•
•
Pn
i=1 xi
P∞
i=1 xi
=
=
x1 (1−rn )
1−r .
x1
1−r only if |r|
< 1, otherwise it does not exist.
4. Euler’s number: For any real x,
∞
X
xn
n=1
n!
= ex ,
lim
n→∞
6
1+
x n
= ex .
n
3.2
Differentiation
Let f, g be real-valued functions of x, write f 0 (x) =
1. Sum:
d
dx [f (x)
+ g(x)] = f 0 (x) + g 0 (x).
2. Product:
d
dx [f (x)g(x)]
3. Fraction:
d f (x)
dx [ g(x) ]
4. Chain Rule:
d
dx f (x), and the same for g.
=
= f (x)g 0 (x) + f 0 (x)g(x).
f 0 (x)g(x)−f (x)g 0 (x)
.
g 2 (x)
d
dx f (g(x))
df (g(x)) dg(x)
dx .
dg(x)
=
Exercise: Differentiate f (x) = cos(x2 ex ).
Solution.
3.3
d[cos(x2 ex )] d(x2 ex )
d
f (x) =
= − sin(x2 ex )(2xex + x2 ex ).
dx
d(x2 ex )
dx
Integration
Integration by part:
b
Z
f (x)dg(x) =
[f (x)g(x)]ba
a
Z
b
−
g(x)df (x)
a
d
The idea comes Rfrom the product rule of differentiation: dx
[f (x)g(x)] = f (x)g 0 (x) + f 0 (x)g(x).
Example: Compute ln xdx.
Z
Z
Z
1
ln xdx = x ln x − xd ln x = x ln x − x dx = x ln x − x + C
x
R
Exercise: Compute ex cos xdx.
Solution.
Z
Z
x
x
Z
x
e cos xdx =
e d sin x = e sin x −
Z
Z
also
−
ex sin xdx =
ex (− sin x)dx =
Z
x
x
Z
sin xde = e sin x −
ex d cos x = ex cos x −
Z
ex sin xdx,
ex cos xdx,
therefore
Z
Z
Z
ex sin x + ex cos x
ex cos xdx = ex sin x + ex cos x − ex cos xdx ⇒ ex cos xdx =
.
2
3.4
Convex Functions
Let f : C → R, C ⊂ R be a function, f is said to be convex if ∀x, y ∈ C, ∀λ ∈ [0, 1], we have f ((1−λ)x+λy) ≤
(1 − λ)f (x) + λf (y).
7
• If f is differentiable, i.e., f 0 (x) exists, then f is convex iff f 0 (x) is monotonically non-decreasing, iff
f (x) ≥ f (y) + f 0 (y)(x − y).
• If f is twice differentiable, i.e., f 00 (x) exists, then f is convex iff f 00 (x) ≥ 0, ∀x ∈ C.
• Strictly convex functions can be defined likewise, by replacing all the ‘≤’ and ‘≥’ above with ‘<’ and
‘>’ respectively.
• f is concave iff −f is convex.
8
Download