Uploaded by ellensiem

MAS-I Formula Sheets 2023-4

advertisement
ACTEX Learning
Exam MAS-I
Formula & Review Sheet
ACTEX Learning
Exam MAS-I
Page 2
(updated 08/18/2023)
Probability
Normal approximation:
S∼
˙ N (µ = E[S], σ 2 = V ar(S))
→
Pr(S ≤ k) = N ( k−µ
σ )
Continuity correction:
Pr(X ≤ k) → Pr(X < k + 0.5)
→
Pr(X < k) → Pr(X < k − 0.5)
Pr(X ≥ k) → Pr(X > k − 0.5)
→
Pr(X > k) → Pr(X > k + 0.5)
REVIEW OF PROBABILITY
Order statistics pdf:
Cumulative distribution function:
F (x) = Pr(X ≤ x)
Survival function:
S(x) = 1 − F (x) = Pr (X > x)
Probability density function:
d
d
f (x) = dx
F (x) = − dx
S(x)
Hazard rate function:
f (x)
d
λ(x) = S(x)
= − dx
log S(x)
Cumulative hazard function:
Λ=
Expectation of discrete X:
E[X] =
Expectation of continuous X:
!x
E[X] =
For iidXi ’s
!x
→
S(x) = e− −∞ λ(t)dt
λ(t)dt
→
S(x) = e−Λ(x)
"
→
E[g(X)] =
−∞
x Pr(X = x)
!∞
→
xf (x)dx
−∞
Variance:
V ar(X) = E[X 2 ] − E[X]2
Standard deviation:
SD(X) =
Covariance:
Cov(X, Y ) = E[XY ] − E[X]E[Y ]
E[g(X)] =
"
Order statistics CDF:
FX(n) (x) = Pr(X1 ≤ x) · · · Pr(Xn ≤ x) = Pr(X ≤ x)n
For iidXi ’s
FX(1) (x) = 1 − Pr(X(1) > x) = 1 − Pr(X1 > x) · · · Pr(Xn > x) = 1 − Pr(X > x)n
INSURANCE LOSSES AND PAYMENTS
g(x) Pr(X = x)
!∞
g(x)f (x)dx
−∞
Loss:
X
Payment per loss:
YL =X −X ∧d=
YL =X ∧u=
#
V ar(X)
Correlation coefficient:
Cov(X,Y )
Corr(X, Y ) = SD(X)
SD(Y )
Linear combination:
E[W ] = aE[X] + bE[Y ]
2
Payment per payment:
2
W = aX + bY
V ar(W ) = a V ar(X) + b V ar(Y ) + 2abCov(X, Y )
Sum of iid X:
E[S] = nE[X]
S = X1 + · · · + Xn
V ar(S) = nV ar(X)
Conditional probability:
=y)
Pr(Y =y|X=x)
Pr(X = x|Y = y) = Pr(X=x,Y
= Pr(X=x)
Pr(Y =y)
Pr(Y =y)
Conditional density function:
f (x)f (y|x)
f (x|y) = ff(x,y)
(y) =
f (y)
Conditional expectations:
E[X|Y = y] =
E[X|Y = y] =
Law of total probability:
n!
fX(k) (x) = (k−1)!1!(n−k)!
(FX (x))k−1 fX (x)(1 − FX (x))n−k
"
x Pr(X = x|Y = y)
1
,a ≤ x ≤ b
b−a
Exponential
1 x
f (x) = e− θ , x > 0
θ
τ ( x )τ −( xθ )τ
f (x) =
e
,x > 0
x θ
Gamma
f (x) =
f (x) =
Beta
E[Y P ] =
→
F (x) =
x−a
b−a
F (x) = 1 − e
E[X] =
−x
θ
F (x) = 1 − e−( θ )
x
E[X] = E[E[X|Y ]]
V ar(X) = E[V ar(X|Y )] + V ar(E[X|Y ])
Need Help? Email support@actuarialuniversity.com
Copyright © ArchiMedia Advantage Inc. All Rights Reserved
ACTEX Learning
with both u and d.
E[Y L ]
Pr(X > d)
Exam MAS-I
Page 3
(b − a)2
12
V ar(X) =
E[X] = θ
V ar(X) = θ2
E [X] = αθ
V ar (X) = αθ2
F (x) = Pr (X ∗ ≥ α)
* +α
1
θ
x
xα−1 e− θ , x > 0
Γ (α)
X ∗ is Poisson with λ =
x
θ
If α is an integer.
Γ (a + b) a−1
b−1
x
(1 − x)
,
Γ (a) Γ (b)
www.ACTEXLearning.com
a+b
2
τ
E [X] =
0<x<1
Law of total variance:
with maximum covered loss u
⎪
⎩u,
Y P = Y L |X > d
f (x) =
Weibull
Pr(X ≤ x) = E[Pr(X ≤ x|Y )]
Law of total expectation:
with deductible d
⎧
⎪
⎨X, X < u
X≥u
⎧
⎪
⎪
0
X<d
⎪
⎪
⎨
L
Y = X ∧ u − X ∧ d = X − d, d ≤ X < u
⎪
⎪
⎪
⎪
⎩X − u, X ≥ u
Uniform
Continuous X
xf (x|y)dx
−∞
X<d
⎪
⎩x − d, X ≥ d
COMMONLY USED CONTINUOUS DISTRIBUTIONS
Discrete X
!∞
⎧
⎪
⎨0
a
a+b
if a and b are integers.
www.actuarialbookstore.com
, E X2 =
a (a + 1)
(a + b) (a + b + 1)
if a and b are integers.
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 4
CLASSIFICATION OF STATES
Pareto
f (x) =
Single P. Pareto
f (x) =
αθα
(x + θ)
α+1 , x > 0
αθα
,x > θ
xα+1
F (x) = 1 −
*
F (x) = 1 −
* +α
θ
x
θ
x+θ
+α
E [X] =
θ
α−1
if a is integer.
E [X] =
, E X2 =
αθ2
(α − 1)2 (α − 2)
αθ
α−1
Lognormal
f (x) =
1
√
xσ 2π
F (x) = N
x>0
*
log x − µ
σ
+
σ2
E [X] = eµ+ 2
, 2
E X 2 = e2µ+2σ
COMMONLY USED DISCRETE DISTRIBUTIONS
Poisson
P (x) =
Binomial
P (x) =
Geometric
Negative Binomial
e−λ λx
, x = 0, 1, 2, . . . , ∞
x!
• An absorbing state is one that cannot be exited.
• State j is accessible from state i if the probability of eventually going to state j from state i is greater than 0.
2
(log x − µ)
−
2σ 2
e
,
Classification of states:
• Two states communicate if each state is accessible from the other.
• A class of states is a maximal set of states that communicate with each other. An absorbing state is a class by
itself.
• A chain is irreducible if it has only one class.
• A state is recurrent if the probability of ever reentering the state is 1.
E [X] = λ
V ar (X) = λ
E [X] = mq
V ar (X) = mq (1 − q)
A finite Markov chain must have at least one recurrent class. If it is irreducible, then it is recurrent.
E [X] = β
V ar (X) = β (1 + β)
Reenter a transient state:
E [X] = rβ
V ar (X) = rβ (1 + β)
• A state is transient if it is not recurrent.
.m / x
m−x
, x = 0, 1, 2, . . . , ∞
x q (1 − q)
*
+*
+x
1
β
P (x) =
, x = 0, 1, 2, . . . , ∞
1+β
1+β
*
+r *
+x
.x+r−1/
1
β
P (x) =
, x = 0, 1, 2, . . . , ∞
x
1+β
1+β
Let N be the number of times a life reenters a transient state
Let p be the probability of reentering a transient state.
Pr (N > n) = pn+1
Stochastic Processes
E [N ] =
∞
0
Pr (N > n) =
n=0
MARKOV CHAINS
Random Walks:
∞
0
pn+1 =
n=0
p
1−p
Let p be the probability of moving up one state in each period.
Markov chain property:
Xt only depends on Xt−1 .
Let q = 1 − p be the probability of moving down one state in each period.
Transition probability:
Pij = Pr(X1 = j|X0 = i) = Pr(Xt+1 = j|Xt = i)
When p = q = 0.5, the chain is recurrent.
Chapman-Kolmogorov Equations: Pijt =
n
0
t−u
u
Pik
Pkj
When pq < 0.25, the chain is transient.
k=1
Gambler’s ruin:
LONG-RUN PROPORTION OF TIME
Let p = Pr(Win 1 chip at each round)
Let q = Pr(Lose 1 chip at each round)
Pr(Win N chips|Currently have j chips) = Pj =
Consider a recurrent state i:
⎧
⎪
⎪
⎪
⎨
j
N,
⎪
⎪
( q )j −1
⎪
⎩ ( qp)N −1 ,
p
E [Ni ] is finite
→
The state is positive recurrent.
q
p ̸= 1
Otherwise
→
The state is null recurrent.
Pr(Lose all chips|Currently have j chips) = 1 − Pj
Algorithmic efficiency:
Consider an irreducible and recurrent Markov chain.
Let Nj = number of steps to the best solution, given that there are j-1 better solutions.
Long-run proportion of time
a chain is in state i:
1
E[Nj ] = 1 + 12 + 13 + · · · + j−1
Note that:
1
1
V ar(Nj ) = 1(1 − 1) + 12 (1 − 12 ) + 13 (1 − 13 ) + · · · + j−1
(1 − j−1
)
www.ACTEXLearning.com
www.actuarialbookstore.com
Let Ni be the number of transitions until the state recurs.
q
p =1
www.studymanuals.com
www.ACTEXLearning.com
www.actuarialbookstore.com
πi =
"k
j=1 πj Pji
π1 + π2 + · · · + πk = 1
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 5
ACTEX Learning
Exam MAS-I
Probability of ever transitioning to state j for one in state i:
Consider an aperiodic positive recurrent irreducible Markov chain.
Limiting probability
αi = lim Pr (Xn = i)
fij = Pij +
n→∞
a chain is in state i:
sij
fij = sjj
Note that:
α i = πi
fij =
"
k̸=j Pik fkj
since sij = fij sjj .
since sij = 1 + fij sjj .
sij −1
sjj
BRANCHING PROCESS
Note the difference:The limiting probability of being in state i is πi
The limiting probability of transitioning from state i to state j is πi × Pij .
Let J be the number of offspring produced by an individual:
Pj = Pr (J = j)
TIME REVERSIBILITY
Long-run proportions of time πj are often called stationary probabilities.
µ = E [J]
If the Markov chain has been around for a long time and is ergodic, then the states have stationary
σ 2 = V ar (J)
probabilities πj .
Let Xn be the size of the n
Pr (Xt−1 = j) Pr (Xt = i|Xt−1 = j)
Pr (Xt = i)
Recall Bayes’ theorem:
Pr (Xt−1 = j|Xt = i) =
Define the following:
Qij = Pr (Xt−1 = j|Xt = i)
from i to j, go backward
Pij = Pr (Xt = j|Xt−1 = i)
from i to j, go forward
We have:
Qij =
πj Pji
πi
→
th
1
= a11 a22 −a
12 a21
Expected time in transient states:
S=
(
s22 s23
s32 s33
−1
)
Exponential
−a12 a11
distribution:
iid
www.studymanuals.com
Page 7
(
)
,
1
1
1
E X(k) = θ n1 + n−1
+ n−2
+ · · · + n−k+1
*
+
1
1
min (X1 , . . . , Xn ) ∼ Exp θ = 1 + 1 +···+
= λ1 +λ2 +···+λ
1
n
Minimum:
Xi ∼ Exp θi = λ1i
Greedy algorithm:
There are n persons and k jobs to do. Cost of each person X ∼ Exp (θ).
→
θ1
θ2
iid
X1 + · · · + Xn ∼ Gamma (α = n, θ)
Gamma distribution: X ∼ Gamma (α, θ)
→
FX (x; α = 1) = 1 − e− θ
X − k|X > k ∼ Exp (θ)
With limit u:
, .
u/
E Y L = E [X ∧ u] = θ 1 − e− θ
x
x
e− θ
www.ACTEXLearning.com
FX (x; α = 3) = 1 − e
Note the difference:
−x
θ
E [X] = λ1
or
Var [X] =
Pr (X > k + x|X > k) = Pr (X > x)
. 1 /2
λ
, E [Y L ]
→ E Y P = Pr (X>k) = θ
www.actuarialbookstore.com
www.studymanuals.com
Exam MAS-I
!2
Page 8
!4
At time 0, Pr ( Next arrival between time 2 and time 4) = e− 0 λ(r)dr − e− 0 λ(r)dr
( xθ )
e− θ ( x
θ)
−
1!
,
-
jτ
The expected time of the j th event is E X(j) = k+1
.
1
x
1
−
e− θ ( x
θ)
2!
SUMS, THINNING, AND MIXTURES
2
Thinning/splitting:
Pr (X = x) = e x!λ , x = 0, 1, 2, . . . , ∞
−λ
Counting process:
X (t) is the number of events that occur at or before time t.
Poisson process:
X (t) ∼ P oisson (λ (t))
x
. 1 /k
τ
.
Xi (t)’s are independent Poisson processes with λi (t)’s.
X (t) is a Poisson process with intensity function λ (t) .
XA (t) is a Poisson process with intensity function P (A) × λ (t) .
Sum of two ind. PPs:
X1 + · · · + Xn ∼ P oisson (λ1 + · · · + λn )
XA (t) is a Poisson process with parameter λA .
XB (t) is a Poisson process with parameter λB .
XA (t) + XB (t) is a Poisson process with parameter λA + λB ,
A
B
with P (A) = λAλ+λ
and P (B) = λAλ+λ
.
B
B
X (t) − X (s) is independent of X (v) − X (u) . For t > s > v > u > 0.
Mixture of two PPs:
X (t + s) − X (t) is a Poisson random variable. For s > 0.
X (t) ∼ P oisson (λ (t) = λ)
Homogeneous PP with parameter λ :
X (t + s) − X (t) ∼ P oi (λs)
Interarrival time/Time to next event:
.
/
T ∼ Exp θ = λ1
However, X (t) a mixture of XA (t) and XB (t) is NOT a Poisson process.
Negative binomial distribution: Let X|λ ∼ P oisson (λ) , where λ ∼ Gamma (α, θ) .
Then X ∼ N B (r = α, β = θ) .
→
FT (s) = 1 − e−λs
.
/
T ∼ Gamma α = n, θ = λ1
Define the following:
! t+s
X (t + s) − X (t) ∼ P oi (λ∗ )
where
λ∗ =
T is not exponential.
→
FT (s) = 1 − e−λ
t
← which is NOT Poisson
COMPOUND POISSON PROCESS
Nonhomogeneous PP with intensity function λ (t) :
www.actuarialbookstore.com
XA (t) is a Poisson process with parameter λA .
XB (t) is a Poisson process with parameter λB .
λ is a constant.
TIME TO NEXT EVENT
www.ACTEXLearning.com
or
X1 (t) + · · · + Xn (t) is a Poisson process with λ (t) = λ1 (t) + · · · + λn (t) .
Xi ∼ P oisson (λi )
Interarrival time/Time to next event:
F (x) = 1 − e−λx
The joint distribution of event times is fT1 ,...,Tk (t1 , . . . , tk ) =
Sum of ind. PPs:
Sum:
Time to nth event:
or
At time 2, Pr ( Next arrival before time 4) = 1 − e− 2 λ(r)dr
E [X] = V ar (X) = λ
Homogeneous PP:
f (x) = λe−λx , x > 0
Time of each event Tj ∼ U nif (0, τ ) .
X ∼ χ2 (n)
→
2
or
!4
POISSON PROCESS
ind
→
−x
θ
ACTEX Learning
1!
x
→
µ=1
µ≤1
/x
x
(
)
, d
d
E Y L = E [X] − E [X ∧ d] = θ − θ 1 − e− θ = θe− θ
x
FX (x; α = 2) = 1 − e− θ −
Poisson distribution: X ∼ P oisson (λ)
, µ ̸= 1
Given that exactly k independent Poisson events occurred before time τ :
→
.
/
X ∼ Gamma α = n2 , θ = 2 →
)
Pj (π{X0 =1} )j , µ > 1
k≤n
X1 , ..., Xn ∼ Exp (θ)
Chi-square:
∞
0
F (x) = 1 − e
Lack of memory:
θn
θ
θ
θ
Total expected cost = nθ + n−1
+ n−2
+ · · · + n−k+1
.
Sum:
−µ2n−1
1−µ
µ
xnσ 2 ,
f (x) = θ1 e− θ , x > 0
Var [X] = θ
that one is currently in state i.
→
)
→
where sij is the expected number of periods in state j given
Exam MAS-I
(
.
/
X ∼ Exp θ = λ1
E [X] = θ
ACTEX Learning
ind
( n−1
1,
π{X0 =x} = π{X0 =1}
where PT is the transient-state submatrix.
www.actuarialbookstore.com
X1 , . . . , Xn ∼ Exp (θ)
xσ 2
EXPONENTIAL DISTRIBUTION
. a22 −a21 /
With deductible d:
Order statistics:
1
Poisson Process
Note that sjj includes the period of being in state j currently.
www.ACTEXLearning.com
.
Probability of extinction, given that X0 = x :
TIME IN TRANSIENT STATES
S = (I − PT )
π{X0 =1} =
1
j=0
P12 P23 P31 = Q12 Q23 Q31 = π2πP1 21 π3πP2 32 π1πP3 13 = P13 P32 P21
−1
E [Xn |X0 = x] = xµn
Probability of extinction, given that X0 = 1 :
πi Qij = πj Pji
If the matrix is time-reversible:
a21
( aa11
)
12 a22
generation:
V ar (Xn |X0 = x) =
If Q = P, then P is said to be time-reversible.
Inverting a matrix:
Page 6
λ (r) dr
N is a Poisson random variable.
Primary r.v.
Xi ’s all follow the same distribution.
Secondary r.v.
N and Xi ’s are all independent.
∗
www.studymanuals.com
www.ACTEXLearning.com
www.actuarialbookstore.com
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 9
ACTEX Learning
Exam MAS-I
Page 10
PROBABILITIES
Compound Poisson r.v.:
S = X1 + · · · + XN
Mean:
E [S] = E [N ] E [X]
Variance:
V ar (S) = E [N ] V ar (X) + V ar (N ) E [X]
2
→
E [S] = λE [X]
→
, V ar (S) = λE X 2
Reliability
x = (x1 , . . . , xn )
φ (x) = 1 if the structure functions.
Minimal path sets:
The system functions when all the components of a minimal path set u j function.
Minimal cut sets:
The system fails when all the components of a minimal cut set vj fail.
Series system:
Functions when n out of n components function.
xi
Each component is a minimal cut set.
r (p) =
Parallel system:
r (p) = 1 −
There are
Two ways to express a system:
k
.
www.ACTEXLearning.com
/
→
2. We have J minimal cut sets v1 , . . . , vJ
k
ϕ(x) = 1 − (1 − x1 x3 ) (1 − x1 x4 ) (1 − x2 )
Reliability function:
r (p) = p1 p3 + p1 p4 + p2 − p1 p3 p4 − p1 p2 p3 − p1 p2 p4 + p1 p2 p3 p4
First upper bound:
r (p) ≤ p1 p3 + p1 p4 + p2
First lower bound:
r (p) ≥ p1 p3 + p1 p4 + p2 − p1 p3 p4 − p1 p2 p3 − p1 p2 p4
= x1 x3 + x1 x4 + x2 − x1 x3 x4 − x1 x2 x3 − x1 x2 x4 + x1 x2 x3 x4
ϕ(x) = (1 − (1 − x1 ) (1 − x2 )) (1 − (1 − x2 ) (1 − x3 ) (1 − x4 ))
+ (1 − x1 ) (1 − x2 ) (1 − x3 ) (1 − x4 )
Reliability function:
r (p) = 1 − (1 − p1 ) (1 − p2 ) − (1 − p2 ) (1 − p3 ) (1 − p4 )
First lower bound:
r (p) ≥ 1 − (1 − p1 ) (1 − p2 ) − (1 − p2 ) (1 − p3 ) (1 − p4 )
First upper bound:
r (p) ≤ 1 − (1 − p1 ) (1 − p2 ) − (1 − p2 ) (1 − p3 ) (1 − p4 )
+ (1 − p1 ) (1 − p2 ) (1 − p3 ) (1 − p4 )
+ (1 − p1 ) (1 − p2 ) (1 − p3 ) (1 − p4 )
minimal cut sets.
φ (u j ) =
3
uji
i
→
φ (vj ) = 1 −
3
i
(1 − vji )
→
φ (x) = 1 −
→
3
φ (x) =
3
j
(1 − φ (u j ))
φ (vj )
j
www.actuarialbookstore.com
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 11
www.ACTEXLearning.com
Number of lives:
The minimal path sets are {1, 3} , {1, 4} and {2} .
Life table:
Structure function:
φ (x) = 1 − (1 − x1 x3 ) (1 − x1 x4 ) (1 − x2 )
Upper bound:
r (p) ≤ 1 − (1 − p1 p3 ) (1 − p1 p4 ) (1 − p2 )
www.actuarialbookstore.com
Exam MAS-I
Lx+t ∼ Binomial (lx , t px )
t px =
lx+t
lx
t qx =
lx −lx+t
lx
t|u qx =
The minimal cut sets are {1, 2} and {2, 3, 4}.
Expectation:
ex =
k k px qx+k =
φ (x) = (1 − (1 − x1 ) (1 − x2 )) (1 − (1 − x2 ) (1 − x3 ) (1 − x4 ))
Lower bound:
r (p) ≥ (1 − (1 − p1 ) (1 − p2 )) (1 − (1 − p2 ) (1 − p3 ) (1 − p4 ))
Endowment function:
n
n Ex = v n px
TIME TO FAILURE
Insurance functions:
Ax =
T is a continuous random variable.
Survival probability:
pi (t) = Pr (Ti > t)
→
E [Ti ] =
Reliability function at time t:
r (p (t))
→
E [ system life ] =
0
Pr (T > t)dx
A1
x:n
0
r (p (t)) =
Parallel system:
2
pi (t) =
r (p (t)) = 1 −
2
2
Failure rate function:
Hazard function:
Λ (t) =
!t
0
!∞
0
Relations:
Si (t)
,
2
(Whole life insurance of 1)
v k+1 k px qx+k
(n-year term insurance of 1)
v k+1 k px qx+k + v n n px
(n-year endowment insurance of 1)
x:n
+ n E x Ax+n
x:n
+ nEx
(n-year deferred whole life insurance of 1)
Ax = vq x + vpx Ax+1
Fi (t)
λ (s) ds = Λ (t) = − log r (p (t))
→
r (p (t)) = e−Λ(t)
A1
x:n
= vq x + vpx A 1
x+1:n−1
ANNUITIES
Annuity functions:
äx =
)
∞ (
"
1−v k+1
d
k=0
äx:n =
n−1
"(
k=0
Simpler formulas:
∞
"
(Whole life annuity-due of 1 per year)
k px qx+k
1−v k+1
d
)
k px qx+k +
. 1−vn /
δ
n px
(n-year term annuity-due of 1 per year)
v k k px
k=0
t px = Pr (Tx > t)
n−1
"
v k k px
k=0
Relations:
äx = äx:n + n E x äx+n
n| äx = n E x äx+n
t+u px = t px u px+t
(n-year deferred whole life annuity-due of 1 per year)
äx:n = ä n + n E x äx+n (n-year certain whole life annuity-due of 1 per year)
t|u qx = t px u qx+t = t px − t+u px = t+u qx − t qx
www.actuarialbookstore.com
äx =
äx:n =
t q x = 1 − t px = Pr (Tx ≤ t)
www.ACTEXLearning.com
n−1
"
Ax:n = A 1
Recursive formulas:
SURVIVAL MODELS
Formulas:
n−1
"
Ax = A 1
r (p (t)) dx
Life Contingencies
Actuarial notation:
(Pure endowment of 1)
v k+1 k px q x+k
n| Ax = n E x Ax+n
(1 − pi (t)) = 1 −
− d r(p(t))
f (t)
λ (t) = S(t)
= dt
r(p(t))
k px
k=0
pi (t)dx
(
)
1
For a k out of n system with Ti ∼ Exp (θ) : E [ system life ] = E T(n−k+1) = θ n1 + n−1
+ · · · + k1
iid
∞
"
k=0
Ax:n =
!∞
Note that Ti and xi are NOT the same thing. We have xi = 1 when Ti > t. Thus, Pr (xi = 1) = Pr (Ti > t) = pi (t) .
Series system:
∞
"
=
V ar (Lx+t ) = lx t px t qx
INSURANCE
k=0
E [T ] =
E [Lx+t ] = lx t px
k=1
Structure function:
Time to failure:
→
Page 12
lx+t −lx+t+u
lx
∞
"
k=1
Expectation formula:
www.studymanuals.com
ACTEX Learning
Example (Intersections Bounds):
!∞
Assume that p1 = p2 = · · · = pn = p.
pk (1 − p)n−k
= 1 − (1 − x1 ) (1 − x2 ) − (1 − x2 ) (1 − x3 ) (1 − x4 )
minimal path sets.
n
k−k+1
1. We have J minimal path sets u 1 , . . . , u J
(1 − pi )
Structure function:
Structure function:
Functions when k out of n components function.
. n/
2
The minimal path sets are {1, 3} , {1, 4} and {2}.
(1 − xi )
The entire system is a minimal cut set.
There are
E [φ (x)] = r (p)
pi
0 *n +
r (p) =
Every component is a minimal path set.
k out of n system:
2
E [xi ] = pi
The minimal cut sets are {1, 2} and {2, 3, 4}.
Functions when 1 out of n components functions.
2
Series system:
→
Second upper bound: r (p) ≤ p1 p3 + p1 p4 + p2 − p1 p3 p4 − p1 p2 p3 − p1 p2 p4 + p1 p2 p3 p4
The entire system is a minimal path set.
φ (x) = 1 −
φ (x) ∼ Bernoulli (r (p))) →
Example (Inclusion/Exclusion Bounds):
Structure function:
Parallel system:
xi ∼ Bernoulli (pi )
→
where xi = 1 if it functions, and xi = 0 if it fails.
Assume that x1 , . . . , xn are all mutually independent.
φ(x) =
→
r (p) = Pr (φ (x) = 1)
k to n
Note that xki = xi for k ≥ 1.
2
pi = Pr (xi = 1)
Reliability function:
k out of n system:
STRUCTURE FUNCTIONS
State vector:
Bernoulli probability:
www.studymanuals.com
www.ACTEXLearning.com
www.actuarialbookstore.com
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 13
ACTEX Learning
Exam MAS-I
Page 14
REJECTION METHOD
Recursive formulas:
äx = 1 + vpx äx+1
Specify the following:
f (x) is the density function of variable to simulate, F (x) is the corresponding CDF.
äx:n = 1 + vpx äx+1:n−1
Insurance to annuity:
g (x) is the base density function, G (x) is the corresponding CDF.
x
äx = 1−A
d
äx:n =
General method:
1−Ax:n
d
(x)
1. Determine c = max fg(x)
.
PREMIUMS
2. Generate two uniform numbers u1 and u2 .
Equivalence principle:
EP V (Net Premiums)=EP V (Insurance Benefits)
Premium functions:
x
Px = A
äx
P1
x:n
3. Calculate x1 = G−1 (u1 ) .
(Whole life insurance of 1, premiums payable for life)
A1
1)
4. Accept x1 if u2 ≤ f (x1 )/g(x
.
c
(n-year term insurance of 1, premiums payable for n years)
x:n
= äx:n
Gamma distribution:
f (x) is the density function of gamma distribution with mean αθ.
(n-year endowment insurance of 1, premiums payable for n years)
A
x:n
Px:n = äx:n
g (x) is the density function of exponential distribution with mean θ∗ = αθ.
Use the general method.
1
Note that v = 1+i
, where i is the effective annual interest rate.
(x)
In step 1, c = max fg(x)
at x = θ∗ = αθ.
Simulation
Normal distribution:
f (x) is the density function of standard normal distribution.
INVERSE TRANSFORMATION METHOD
Simulation generator:
g (x) is the density function of exponential distribution with mean 1.
Use the following method:
Xn+1 = aXn + c (mod m)
1. Generate three uniform numbers u1 , u2 and u3 .
To generate uniform numbers:
1. Specify an initial integer x0 called the “seed”.
2. Calculate x1 = − log u1 and x2 = − log u2 .
2. Calculate X1 = ax0 + c.
3. Accept x1 if x2 ≥ (x1 −1)
.
2
3. Divide X1 by m, obtain the first remainder x1 .
4. If u3 < 0.5 → use x1
2
If u3 ≥ 0.5 → use − x1
4. The first uniform number is u1 = xm1 .
5. Repeat steps 2-4 using x1 to obtain the second remainder x2 and the second uniform number u2 = xm2 . And so on…
Inverse transformation method:
Note that x1 or −x1 is standard normal number.
Estimation
KERNEL DENSITY ESTIMATION
1. Generate uniform numbers u1 , . . . , un .
2. Specify a distribution function FY (y) = Pr (Y ≤ y).
Suppose we are given n observations and the base distribution function β (x) .
3. Calculate yi = FY−1 (ui ) .
For each observation xi , we set kxi (x) = 1b β
. x−x /
i
b
.
"n
The kernel distribution is an equally weighted mixture of the n distributions: fˆ (x) = n1 i=1 kxi (x)
F̂ (x) = n1
www.ACTEXLearning.com
www.actuarialbookstore.com
ACTEX Learning
www.studymanuals.com
Exam MAS-I
Page 15
www.ACTEXLearning.com
"n
i=1 Kxi (x)
www.actuarialbookstore.com
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 16
PERCENTILE MATCHING
Rectangular Kernel
β (x) =
⎧
⎪
⎨1,
−1 ≤ x ≤ 1
2
⎪
⎩0,
kxi (x) =
Triangle Kernel
⎧
⎪
⎪
⎪
x + 1, −1 ≤ x ≤ 0
⎪
⎪
⎪
⎪
⎨
β (x) = 1 − x, 0 ≤ x ≤ 1
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩0,
otherwise
otherwise
Kxi (x) =
otherwise
⎧
⎪
⎪
⎪
0,
⎪
⎪
⎪
⎪
⎨
x ≤ xi − b
Kxi (x) =
x−(xi −b)
, x i − b ≤ x ≤ xi + b
⎪
⎪ 2b
⎪
⎪
⎪
⎪
⎪
⎩
1,
x > xi + b
E [X|Xi = xi ] = xi
2
"
xi
n
2
V ar (X) = b3 +
To estimate 1 parameter, use one percentile. To estimate 2 parameters, use two percentiles.
x2
2
x i − b ≤ x ≤ xi
⎪
2
⎪
⎪
⎪
1 − ((xi +b)−x)
, x i < x ≤ xi + b
⎪
2b2
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩1,
otherwise
2
"
xi
n
*"
x2i
n −
("
xi
n
)2 +
.
/
X i ∼ N µ = x i , σ 2 = b2
kxi (x) = √
*"
x2i
n −
2
1
2πb2
e−
Pr (X ≤ π̂p ) = p
where π̂p is calculated using complete losses data.
Left-truncated data:
Pr (X ≤ π̂p |X > d) = p
where π̂p is calculated using incomplete losses data.
Pr (X − d ≤ π̂p |X > d) = p
where π̂p is calculated using claims data.
Pr (X ≤ π̂p ) = p
where π̂p is calculated using incomplete losses data,
(x−xi )2
2b2
Right-censored data:
Kxi (x) = N
b
Maximize the likelihood function L. The value(s) of parameter(s) that maximizes the likelihood function is called the
maximum likelihood estimate(s).
V ar (X|Xi = xi ) = b2
("
xi
n
)2 +
"
xi
n
V ar (X) = b2 +
*"
x2i
n −
("
xi
n
)2 +
METHOD OF MOMENTS
Complete data:
L = f (x1 ) · · · f (xn )
f = Pr for discrete distribution.
Grouped data:
L = (F (c1 ) − F (c0 )) · · · (F (cn ) − F (cn−1 ))
where c0 < c1 < . . . < cn are interval
Left-truncated data:
Right-censored data:
Set the theoretical moments equal to the sample moments.
To estimate 1 parameter, use the first moment. To estimate 2 parameters, use the first and second moments.
Complete data:
E [X] =
,
E X
Left-truncated data:
2
-
"
where xi ’s are complete losses data.
xi
n
=
"
x2i
n
E [X|X > d] =
E [X ∧ u] =
"
Left-truncated and
Right-censored data:
They are all greater than d.
L = f (x1 ) · · · f (xn ) S (u)
(x1 )
(xn )
L = fS(d)
· · · fS(d)
boundaries.
There are n observations with exact values.
(x1 )
(xn )
L = fS(d)
· · · fS(d)
(
S(u)
S(d)
m
)m
There are n observations with exact values
and m observations greater than u.
There are n observations with exact values
and m observations greater than u. These
n + m observations are all greater than d.
For distributions that belong to the exponential family:
"
where xi ’s are incomplete losses data.
xi
n
E [X − d|X > d] =
Right-censored data:
which are the claims data.
MAXIMUM LIKELIHOOD ESTIMATION
. x−xi /
E [X|Xi = xi ] = xi
E [X] =
V ar (X) = b6 +
where k = p (n + 1).
Complete data:
x ≤ xi − b
V ar (X|Xi = xi ) = · · · = b6
E [X] =
Smoothed empirical percentile: π̂p = x(k)
If k is not an integer, use interpolation.
E [X|Xi = xi ] = xi
b
V ar (X|Xi = xi ) = (2b)
12 = 3
E [X] =
⎧
⎪
⎪
⎪
⎪
0,
⎪
⎪
⎪
⎪
⎪
⎪
2
⎪
⎪
⎨ (x−(x2bi2−b)) ,
Set the theoretical percentiles equal to the smoothed empirical percentile.
β (x) = √12π e− 2
⎧
⎪
⎪
x−(xi −b)
⎪
, x i − b ≤ x ≤ xi
⎪
b2
⎪
⎪
⎪
⎨
kxi (x) = (xi +b)−x
, x i < x ≤ xi + b
⎪ b2
⎪
⎪
⎪
⎪
⎪
⎪
⎩0
otherwise
⎧
⎪
⎨ 1 , x i − b ≤ x ≤ xi + b
2b
⎪
⎩0,
Gaussian Kernel
xi
n =
"
yi
n
where yi ’s are claims data.
"
where xi ’s are incomplete losses data,
yi
n
1. Determine L (θ).
2. Apply natural logarithm, obtain l (θ) = log L (θ).
3. Take the first derivative with respect to the parameter, obtain l′ (θ) .
and yi ’s are claims data.
4. Find the value of θ such that l′ (θ) = 0.
www.ACTEXLearning.com
www.actuarialbookstore.com
www.studymanuals.com
www.ACTEXLearning.com
www.actuarialbookstore.com
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 17
ACTEX Learning
Exam MAS-I
RAO-CRAMER LOWER BOUND
With complete data:
Distribution
Maximum likelihood estimate(s)
Method of moment estimate(s)
Exponential
θ̂ = x̄
Same.
Normal
µ̂ = x̄
σ̂ 2 = n1
Lognormal
µ̂ = n1
Fisher information matrix:
I = −E
Rao-Cramer lower bound:
(xi − µ̂)
"
(log xi − µ̂)
log xi
E=1
parameter.
λ̂ = x̄
Same.
Rao-Blackwell Theorem:
Prior to knowing the observations, one must first obtain an estimator using MOM, PM,
MLE, etc.
An estimator is a random variable, and an estimate is just an outcome of the estimator.
4 5
The bias of an estimator is Bias = E θ̂ − θ.
Maximum likelihood function: L (x; θ) = g (T (x) ; θ) h (x)
pend on θ.
1. Determine the MLE.
2. Determine the mean of MLE.
n→∞
n→∞
If the MLE is not unbiased, obtain an unbiased estimator by adjusting the MLE.
This new estimator is the MVUE.
Exponential family:
An estimator is more efficient than another estimator of its variance is lower.
f (x; θ) = ep(θ)q(x)+r(θ)+s(x) for a < x < b
It is a regular case if:
• a and b do not depend on θ.
• p (θ) is nontrivial and continuous.
A uniformly minimum variance unbiased estimator (MVUE) is an unbiased estimator that has lower variance than any
• q (x) is nontrivial.
other unbiased estimator for all possible values of the parameter.
www.actuarialbookstore.com
www.studymanuals.com
Exam MAS-I
Page 19
www.ACTEXLearning.com
Normal distribution:
2. Repeat a total of m times to obtain α̂i for i = 1, 2, . . . , m.
1
where ᾱ = m
"m
χ2 distribution:
Student’s t distribution:
i=1 α̂i
H0
H1
Probability of Type I error
Size of critical region
•
Significance level
•
H0 true
H1 true
•
•
Accept H0
Reject H0
1−α
α
β
Pr(Type II error)
1−β
Power of test
P-value = Pr (Observations|H0 true)
•
Test statistic is beyond the critical point.
S2
2
σ 2 (n − 1) ∼ χ (n − 1)
∼ t (n)
→
X̄−µ
#
∼ t (n − 1)
Statistic
Confidence Interval
σ 2 known
# 0
z = x̄−µ
σ2
x̄ ± z
X ∼ N µ, σ
µ = µ0
# 0
t = x̄−µ
S2
x̄ ± t
z = $ x̄−ȳ
2
σ2
(x̄ − ȳ) ± z
n
σ 2 unknown
n
X ∼ Bin (n, p)
X ∼ Bin (nX , pX )
.
/
2
X ∼ N µX , σ X
.
/
Y ∼ N µY , σY2
9
9
σ2
n
s2
n
DOF = n − 1
µX − µY = 0
2
σX
known
σY2 known
µX − µY = 0
2
σX
unknown
σY2
σ2
n
nY − 1)
Conditions
/
2
X̄−µ
#
∼ N (0, 1)
S2
n
2
2
SX
/σX
2 /σ 2 ∼ F (nX − 1,
SY
Y
→
n2 )
µ = µ0
.
/
X ∼ N µ, σ 2
Power = Pr (Reject H0 |H1 true)
Observations fall within the critical region.
∼ χ2 (n − 1)
unknown
2
σX
= σY2
σx
y
nx + ny
t= :
s2 =
s2
x̄ − ȳ
(
)
1
1
nx + ny
(x̄ − ȳ) ± t
0
n
0
p̂x − p̂y
z=:
(
)
p; (1 − p;) n1x + n1y
pX − p Y = 0
σ 2 = σ02
µ unknown
2
σX
2 = 1
σY
µX unknown
p; = nx+y
x +ny
χ2 =
s2
(n − 1)
σ02
DOF = n − 1
µY unknown
F =
9 2
σy2
σx
n x + ny
: (
)
S 2 n1x + n1y
(nx − 1) s2x + (ny − 1) s2y
nx + ny − 2
0
z = # pp̂−p
(1−p )
p = p0
Y ∼ Bin (nY , pY )
α = Pr(Reject H0 |H0 true)
•
•
Page 20
∼ χ2 (n)
DOF = nx + ny − 2
Pr(Type I error)
P-value = 2 Pr (Observations|H0 true) for two-tailed test.
•
χ2 (n)
n
→
→
Null hypothesis
.
/
2
X ∼ N µX , σ X
.
/
Y ∼ N µY , σY2
α
N (0,1)
#
)2
X−µ
∼ N (0, 1)
σ
.
/
X ∼ N µ, σ 2
.
/
2
X ∼ N µX , σ X
.
/
Y ∼ N µY , σY2
Terminology:
Xi −X̄
σ
→
Distribution
.
•
i=1
n (
"
σ
χ2 (n1 )/n1
χ2 (n2 )/n2 ∼ F (n1 ,
F distribution:
HYPOTHESES
•
.
/
X ∼ N µ, σ 2
i=1
Hypothesis Testing
The following are the same thing:
www.studymanuals.com
Exam MAS-I
)2
n (
"
Xi −µ
3. Estimate the standard error of α̂.
2
www.actuarialbookstore.com
HYPOTHESIS TESTS FOR MEANS, PROPORTIONS AND VARIANCES
Suppose we have n observations: 1. Select n observations, with replacement, at random. Calculate the statistic α̂.
i=1 (α̂i − ᾱ)
q (xi ) is a sufficient statistic.
ACTEX Learning
BOOTSTRAP METHOD
A method for estimating the standard error of an estimator.
"m
"
For such a family, θ̂ =
ACTEX Learning
Reject H0 if:
If the MLE is unbiased, the MLE is the MVUE.
•
n→∞
V ar (θ̂2 )
The relative efficiency of θ̂1 with respect to θ̂2 is RE = V ar θ̂ .
( 1)
7(
)2 8
( )
The mean square error of an estimator is MSE = E θ̂ − θ
= V ar θ̂ + Bias2 .
Probabilities:
•
n→∞
( )
( )
If lim Bias θ̂ = 0 and lim V ar θ̂ = 0, then the estimator is consistent. However the
converse is not true.
where h (x) is a function that does not de-
The MLE (if unique) is a function of a sufficient statistic T (x):
6
(6
)
6
6
An estimator is (weakly) consistent if lim Pr 6θ̂ − θ6 < ϵ = 1.
Alternative hypothesis:
In this case, this estimator is the MVUE.
4 6
5
6
This means that E θ̂6T (x) is at least as efficient as θ̂.
An estimator is asymptotically unbiased if lim Bias = 0.
1
m−1
log f (xi ; θ) is the log-
For any unbiased estimator θ̂ and sufficient statistic T (x),
4 6
5
6
the estimator E θ̂6T (x) is the MVUE.
An estimator is unbiased if Bias = 0.
SE (α̂) =
"
likelihood function.
A sufficient statistic is a function T (x) whose value contains all the information needed to compute any estimate of the
ESTIMATOR QUALITY
9
where l (x; θ) =
SUFFICIENT STATISTICS
Poisson
Null hypothesis:
)2 8
This is the lowest possible variance of an
An unbiased estimator is efficient if:
Same.
Estimate of standard error:
d l(x;θ)
dθ
I −1
2
x̄
q̂ = m
www.ACTEXLearning.com
7(
=E
The efficiency of an unbiased estimator is:
Binomial
Efficiency:
5
I −1
E = V ar
(θ̂)
b̂ = max (x1 , . . . , xn )
Consistency:
d l(x;θ)
dθ 2
unbiased estimator θ̂.
2
Uniform
Bias:
4 2
Same.
"
"
σ̂ 2 = n1
Estimator vs estimate:
Page 18
s2x
s2y
DOF = (nx − 1, ny − 1)
p̂ ± z
9
p̂(1−p̂)
n
(p̂x − p̂y ) ± z
*
9
p̂ (1−p̂ )
p̂x (1−p̂x )
+ y ny y
nx
s2
χ2 n−1, 1− α (n − 1) ,
*
2
1
s2x
,
Fnx −1,ny −1,1− α2 s2y
s2
χ2 n−1, α (n − 1)
2
+
1
s2x
Fnx −1,ny −1, α2 s2y
+
1
= Fny −1,nx −1, α2
Fnx −1,ny −1,1− α2
p-value ≤ α
Note that α increases → Power increases → β decreases. One can decrease both α and β by increasing the sample size.
www.ACTEXLearning.com
www.actuarialbookstore.com
www.studymanuals.com
www.ACTEXLearning.com
www.actuarialbookstore.com
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 21
ACTEX Learning
Exam MAS-I
KOLMOGOROV-SMIRNOV TEST
Page 22
Q-Q PLOT
Null hypothesis: The parametric model fits its data well
QQ plot:
x-coordinates for quantiles of fitted distribution.
Example:
y-coordinates for quantiles of observations.
Observations are 1.7, 1.6, 1.6 and 1.9.
Example:
Parametric CDF is Fθ (x) = x4 .
2
x
−
Fe (x ) − Fe (x)
Fθ (x)
MAX Difference
1.6
0.00-0.50
0.64
0.64
1.7
0.50-0.75
0.72
0.22
1.9
0.75-1.00
0.90
0.15
So, D = 0.64.
1-tailed test.
Reject H0 if D > c.
CHI-SQUARE TEST
Null hypothesis:
SCORE TEST
The parametric model fits its data well / The mean is the same across categories
Suppose there are k categories:
Where:
Test statistic:
Q=
k (O
"
i − Ei )
i=1
1-tailed test.
2
∼ χ2 (k − 1)
Ei
Reject H0 if Q > c.
Oi is the observed value for category i.
Ei is the observed value for category i.
Suppose there are k1 × k2 categories:
Test statistic:
Q=
R=
U (θ) = l′ (θ) = 0
→
The MLE is θ̂.
Information:
I = −E [U ′ (θ)] = −E [l′′ (θ)]
→
V ar (U (θ)) = I
vs
#
CI for θ :
U (θ0 )
θ̂ ± z
:
( )
V ar θ̂
Normal Linear Model
where θM LE is the maximum likelihood estimate of θ.
Test statistic:
−2 log R ∼ χ2 (1)
1-tailed test.
Reject H0 if −2 log R > c.
Normal linear model:
x′ s are the explanatory variables
.
/
ε ∼ N 0, σ 2
www.actuarialbookstore.com
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 23
1 Y λ −1
λ
, λ ̸= 0
log Y,
Thus, Y ∗ = β1 + β2 x2 + · · · + βp xp + ε.
λ=0
www.ACTEXLearning.com
ACTEX Learning
Model:
E [Y] = xβ
To minimize:
"
Estimates:
Continuous variables
Fitted values:
Count variables
Properties:
Categorical variables
Base category coefficient is 0.
ESTIMATING PARAMETERS
⎛
⎜ x11
⎜
⎜
⎜
⎜x
⎜ 21
We have n observations of p explanatory variables: x = ⎜
⎜
⎜
⎜...
⎜
⎜
⎝
xn1
www.studymanuals.com
Exam MAS-I
x12
...
x22
...
...
...
xn2 . . .
⎛ ⎞
⎜ y1 ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜y ⎟
⎜ 2⎟
⎜ ⎟
which correspond to n observations of the response variables: y = ⎜
⎟
⎜ .. ⎟
⎜.⎟
⎜ ⎟
⎜ ⎟
⎝ ⎠
yn
⎛ ⎞
(y − xb)
Page 24
2
.
/−1 . T /
b = xT x
x y
ŷ = xb
E [b] = β
V COV (b) = I −1
E [ŷ] = xβ
V ar (ŷ) = σ 2 H
⎞
.
/−1 T
H = x xT x
x is the hat matrix.
E [Y ] = β1 + β2 x
n
"
To minimize:
x1p ⎟
⎟
⎟
⎟
x2p ⎟
⎟
⎟
⎟
⎟
...⎟
⎟
⎟
⎠
xnp
i=1
(yi − (b1 + b2 xi ))
Estimates:
b1 = ȳ − b2 x̄
Fitted values:
ŷ = b1 + b2 x
Properties:
E [b1 ] = β1
2
b2 =
"
V ar (b2 ) = σ 2
V ar (ŷi ) = σ 2
E [ŷi ] = β1 + β2 xi
"
i −nx̄ȳ
"xi y
x2i −nx̄2
(xi −x̄)(yi −ȳ)
"
=
(xi −x̄)2
V ar (b1 ) = σ 2
E [b2 ] = β2
(
(
(
2
1
" x̄
n +
(xi −x̄)2
"
1
(xi −x̄)2
)
)
2
(x −x̄)
1
"n i
2
n +
k=1 (xk −x̄)
Cov (b1 , b2 ) = σ 2
)
(
" −x̄ 2
(xi −x̄)
)
MEASURES OF FIT
ANOVA table:
⎜ β1 ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜ β2 ⎟
⎟
We want to estimate p coefficients: beta = ⎜
⎜.⎟
⎜.⎟
⎜.⎟
⎜ ⎟
⎜ ⎟
⎝ ⎠
βp
⎛ ⎞
Source
Sum of Squares
Error
SSE=
n
"
i=1
Regression
SSR=
n
"
i=1
Total
(yi − ŷi )
(ŷi − ȳ)
Total SST=
n
"
i=1
⎜ b1 ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜b ⎟
⎜ 2⎟
⎟
The estimates are: b = ⎜
⎜.⎟
⎜ ⎟
⎜ .. ⎟
⎜ ⎟
⎜ ⎟
⎝ ⎠
bp
Error sum of squares:
SSE=
n
"
i=1
2
DF
2
n−p
2
(yi − ȳ)
p−1
2
n−1
Mean square
SSE
MSE= n−p
MSR= SSR
p−1
SST
MST= n−1
T
(yi − ŷi ) = (y − ŷ) (y − ŷ) = y T y − bT xT y
SSE is also called the residual sum of squares, RSS=
SSE is also called the scaled deviance, σ D.
2
www.actuarialbookstore.com
.
/−1
I −1 = σ 2 xT x
is the inverse of information matrix.
Linear regression for two-variable model:
Model:
Define the following matrices:
www.actuarialbookstore.com
Linear regression for “full” model:
E [Y ] = β1 + β2 x2 + · · · + βp xp
V ar (Y ) = V ar (ε) = σ 2
www.ACTEXLearning.com
Y is the response variable
Y = β1 + β2 x2 + · · · + βp xp + ε
The DOF depends on the number of constraints in H0 and H1 .
www.ACTEXLearning.com
Types of variables:
( )
V ar θ̂ can be estimated using θ̂.
NORMAL LINEAR MODEL
L (X|θ0 )
L (X|θM LE )
Box-Cox transformation: Y ∗ =
.
∼ N (0, 1)
V ar (U (θ0 ))
H1 : θ ̸= θ0
Properties:
( )
V ar θ̂ = I −1
H1 : θ ̸= θ0
Reject H0 if Q > c.
LIKELIHOOD RATIO TEST
vs
l (θ)
Score:
Test statistic:
Note: Subtract additional 1 degree of freedom for each parameter fitted from the data.
H0 : θ = θ 0
Loglikelihood function:
H0 : θ = θ 0
k2 "
k1 (O − E )2
"
ij
ij
∼ χ2 ((k1 − 1) (k2 − 1))
Eij
j=1 i=1
1-tailed test.
Suppose X follows a parametric distribution with parameter θ.
www.studymanuals.com
www.ACTEXLearning.com
" 2
ε̂i .
where ε̂i = yi − ŷi is the residual.
where D is the deviance.
www.actuarialbookstore.com
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Standard error of the regression:
9 "n
9
√
2
SSE
i=1 (yi −ŷi )
s = M SE = n−p
=
n−p
Page 25
1-tailed test.
s2 is an unbiased estimator of σ 2 = V ar (Y ) = V ar (ε) .
Coefficient of determination:
R
Exam MAS-I
Page 26
Reject H0 if F > Fq,n−p,1−α .
DOF of SSER is n − (p − q) .
s is also called the residual standard error.
2
ACTEX Learning
DOF of SSEF is n − p.
Thus, DOF of SSER − SSEF is (n − (p − q)) − (n − p) = q.
SSE
= SSR
SST = 1 − SST
Collinearity of explanatory variables:
R2 is the proportion of variation in y that was explained by variation in x.
1
V IF = 1−R
2
For full model: R = (Corr (y, ŷ))
2
where R(j)
is the coefficient of determination from regression of xj =
2
(j)
2
For two-variable model: R2 = (Corr (y, x))
2
The larger the VIF, the more collinear the variable j is.
In general, the higher the R2 , the better the model fits your data.
"
βk xk .
k̸=j
ANOVA & ANCOVA
However, adding more variables to the model always increases R2 .
Here, the ANOVA is for normal linear model with categorical variables.
Also, R2 cannot be evaluated statistically.
In ANCOVA, continuous variables are added to the model.
t test:
Note that if the model has an intercept, the base coefficient of a categorial variable is 0.
H0 : β j = β
vs
H1 : βj ̸= β
Test statistic:
b −β
j
t = √%
V ar(bj )
CI for βj :
bj ± tn−p
F test (minimal model):
One-factor ANOVA:
.
/−1
where V!
COV (b) = s2 xT x
∼ t (n − p)
9
VB
ar (bj )
If σ 2 is known, then use √ j
b −β
In one-factor ANOVA, one is given observations of J treatments.
V ar(bj )
∼ N (0, 1) .
Number of observations are n1 , n2 , . . . , nJ . Total number of observations is n = n1 + · · · + nJ .
The full model is: Y = µj + ε
H0 : Minimal model, with β2 = · · · = βp = 0.
Source
µ1 ̸= 0 is the base parameter.
j = 1, 2, . . . , J
Sum of Squares
DF
1-tailed test.
SSR/(p−1)
R /(p−1)
F = (SST−SSE)/(p−1)
= SSE/(n−p)
= (1−R
2 )/(n−p) ∼ F (p − 1, n − p)
SSE/(n−p)
Error
Reject H0 if F > Fp−1,n−p,1−α .
Within treatment
F =
SST =
H0 : Y = µ + ε
SSEF /(n−p)
www.ACTEXLearning.com
(ȳj − ȳ)
nj
J "
"
(yij − ȳ)
2
2
J −1
2
n−1
ACTEX Learning
www.studymanuals.com
Exam MAS-I
F =
(SSER −SSEF )/(J−1)
Page 27
www.ACTEXLearning.com
ACTEX Learning
Compared to α + β model.
Test statistic:
(SSEβ −SSEα+β )/(J−1)
k = 1, 2, . . . , K.
DF
Error
SSEF = SST − SStr − SSb
(J − 1) (K − 1)
Treatments α
SStr =
J "
K
"
j=1 k=1
Blocks β
J "
K
"
SSb =
j=1 k=1
Total
SST =
J "
K
"
j=1 k=1
H0 : Y = µ + β k + ε
(ȳj − ȳ)
(ȳk − ȳ)
2
SSb
MSb = K−1
K −1
2
SSE
M SE = (J−1)(K−1)
SStr
MStr = J−1
J −1
2
(yjk − ȳ)
Mean square
SST
MST = JK−1
JK − 1
H0 : Y = µ + α j + ε
Test statistic:
Reduced model with J − 1 dummy variables removed.
Reduced model with K − 1 dummy variables removed.
F =
(SSER −SSEF )/(K−1)
SSEF /((J−1)(K−1))
Two-factor ANOVA with replication:
j = 1, 2, . . . , J
k = 1, 2, . . . , K.
Sum of Squares
DF
Mean square
Error
SSEF
JK (L − 1)
MSE
Treatments α
SStr
J −1
MStr
Blocks β
SSb
K −1
MSb
Interaction γ
SSint
(J − 1) (K − 1)
MSint
Total
SST
JKL − 1
MST
www.ACTEXLearning.com
F =
(SSEα −SSEα+β )/(K−1)
H0 : Y = µ + ε
Compared to α model.
Test statistic:
F =
SSEF /(JK(L−1))
(SSEM −SSEα )/(J−1)
SSEF /(JK(L−1))
= SSESSb/(K−1)
F /(JK(L−1)) ∼ F (K − 1, JK (L − 1))
= SSESStr/(J−1)
F /(JK(L−1)) ∼ F (J − 1, JK (L − 1))
← This is the same as
the second one.
Model
SSE = Scaled Deviance
Deviance
SSE DF
Y = µ + αj + βk + γl + ε
SSEF
DF
JK (L − 1)
SSEα+β
Dα+β
JKL − J − K + 1
SSEα
Dα
JKL − J
SSEβ
Dβ
JKL − K
SSEM
DM
JKL − 1
Y = µ + αj + βk + ε
SSEα+β + SStr + SSb = SST
SSEα + SStr = SST
SSEβ + SSb = SST
SSEM = SST
Source
One-factor ANCOVA with a continuous variable added to the model:
.
/
The full model is: Y = β0 + β1 x + β2j + ε, where ε ∼ N 0, σ 2 .
Sum of Squares
DF
Mean square
Error
SSE
n−1−J
MSE
Regression
SSR
1 + (J − 1) = J
MSR
Total
SST
n−1
MST
Test statistic:
F =
= SSint/((J−1)(K−1))
∼ F ((J − 1) (K − 1) , JK (L − 1))
SSEF /(JK(L−1))
www.actuarialbookstore.com
www.studymanuals.com
j = 1, 2, . . . , J .
Source
H0 : Y = β 0 + β 1 x + ε
Compared to full model.
(SSEα+β −SSEF )/((J−1)(K−1))
SSEF /(JK(L−1))
Test statistic:
Y =µ+ε
So there is a total of n = JKL observations.
F =
Compared to α + β model.
Y = µ + βk + ε
In two-factor ANOVA, there are J treatments applied to K blocks, and there are L replications.
H0 : Y = µ + α j + β k + ε
= SSESStr/(J−1)
F /(JK(L−1)) ∼ F (J − 1, JK (L − 1))
H0 : Y = µ + α j + ε
Y = µ + αj + ε
= SSEFSSb/(K−1)
∼ F (K − 1, (J − 1) (K − 1))
/((J−1)(K−1))
The full model is: Y = µ + αj + βk + γjk + ε
SSEF /(JK(L−1))
Page 28
SSEF + SStr + SSb + SSint = SST
(SSER −SSEF )/(J−1)
F = SSEF /((J−1)(K−1)) = SSEFSStr/(J−1)
∼ F (J − 1, (J − 1) (K − 1))
/((J−1)(K−1))
Test statistic:
F =
www.studymanuals.com
Exam MAS-I
H0 : Y = µ + β k + ε
j = 1, 2, . . . , J
SST
MST = n−1
www.actuarialbookstore.com
So there is a total of n = JK observations.
Sum of Squares
SStr
MStr = J−1
SStr/(J−1)
= SSE
F /(n−J) ∼ F (J − 1, n − J)
SSEF /(n−J)
In two-factor ANOVA, there are J treatments applied to K blocks.
Source
SSE
MSE = n−J
Two-factor ANOVA without replication:
www.actuarialbookstore.com
The full model is: Y = µ + αj + βk + ε
n−J
Reduced model with J − 1 dummy variables removed.
Test statistic:
(RF 2 −RR 2 )/q
= 1−RF 2 /(n−p) ∼ F (q, n − p)
(
)
(yij − ȳj )
nj
J "
"
j=1 i=1
j=1 i=1
H0 : Reduced model, with q parameters equal to 0.
Test statistic:
SStr =
Total
F test (reduced model):
(SSER −SSEF )/q
nj
J "
"
j=1 i=1
Between treatment
V ar(b2 )
V ar(b2 )
SSE =
F
Treatment
For the two-variable model, the F statistic is the square of the t statistic for β2 = 0.
*
+2
(b2 )2
2 −0
That is: F = √ b%
= %
∼ F (1, n − 2)
Test statistic:
Mean square
2
Test statistic:
www.ACTEXLearning.com
Reduced model with J − 1 dummy variables removed.
(SSER −SSEF )/(J−1)
SSEF /(n−1−J)
∼ F (J − 1, n − 1 − J)
www.actuarialbookstore.com
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 29
ACTEX Learning
Exam MAS-I
SUBSET SELECTION
VALIDATION
ε̂ = y − ŷ
Residual:
Variance:
→
2
V COV (ε̂) = σ (I − H)
Leave One Out Cross-Validation (LOOCV) = CV(n) = n1
ε̂i = yi − ŷi
→ VB
ar (ε̂i ) = s (1 − hii )
2
→
V ar (ε̂i ) = σ (1 − hii )
ε̂i
ri = s√1−h
Plot ŷ or ε̂ against each xj .
2. Response is normal:
Plot r against Z ∼ N (0, 1).
3. Response has constant variance (homoscedasticity):
Plot r or ε̂ against ŷ.
4. Observations are independent:
Plot ε̂ in the order.
Influential points:
Some observations may have a strong influence on the predicted value of y.
Outliers:
Outliers have unusually high/low residuals.
Leverage:
6
6
6 εi 6
|ri | = 6 s√1−h
6
ii
Leverage for
hii = n1 +
2
hii > 2
k=1
Cook’s distance:
Di = ri2
(
(
hii
1−hii
hii
1−hii
) 12
n
or 3
.p/
n
→ Influential point
There is a total of k fits.
• The one with the higher R2 is better.
PREDICTIONS
Models with different number of predictors:
E [y∗ ] = β1 + β2 x∗
V ar (y∗ ) = σ 2
→
E [ŷ∗ ] = β1 + β2 x∗
(x∗ −x̄)
V ar (ŷ∗ ) = σ 2 ⎝ n1 + "
n
⎛
i=1
i=1
(xi −x̄)
i=1
www.ACTEXLearning.com
⎞
⎠
(xi −x̄)2
• The one with lower Mallow’s Cp is better.
• The one with higher adjusted R2 is better.
Thus, PI is wider than CI.
www.actuarialbookstore.com
ACTEX Learning
• The one with lower cross-validation is better.
• The one with lower AIC/BIC is better.
2
C ⎛
⎞
D
D
D
(x∗ −x̄)2 ⎠
ŷ∗ ± tn−1 Es2 ⎝1 + n1 + "
n
Prediction interval for y∗ :
2
(xi −x̄)2
Note that we use ŷ∗ to predict the mean E [y∗ ], but the actual value y∗ is normally distributed around the mean.
C ⎛
⎞
D
D
D
(x∗ −x̄)2 ⎠
Confidence interval for E [y∗ ]: ŷ∗ ± tn−1 Es2 ⎝ n1 + "
n
www.studymanuals.com
Exam MAS-I
Page 31
Suppose there are k possible predictors:
www.ACTEXLearning.com
candidates. There is a total of 2k fitted models.
select the best among these candidates. There is a total of 1 + k(k+1)
fitted models.
2
ACTEX Learning
Principal components regression:
Unsupervised method, higher bias, lower variance.
Partial least squares:
Supervised method, lower bias, higher variance.
SHRINKAGE AND DIMENSION REDUCTION METHODS
Shrinkage methods:
bj (x) = xj
Splines:
Cubic splines match function values, first derivatives, and second derivatives at knots.
n
"
i=1
subject to
p
"
j=2
βj2 ≤ s.
Minimize
n
"
i=1
F
F
yi − β 1 −
Or
yi − β 1 −
p
"
βj xij
j=2
p
"
j=2
βj xij
G2
G2
j βj bj (x) + ε
2
(yi − g (xi )) + λ
!
2
g ′′ (t) dt
When λ = 0 → smoothing spline is natural spline, with n effective DOF.
but the less important ones have small coefficients.
n
"
Y = β0 +
where λ is again a tuning parameter selected using cross-validation.
G2
Ridge regression shrinks the coefficients but does not set them equal to 0. Thus all variables are left in the regression,
i=1
→
Smoothing splines g (x) have knots for each point of training data and are fitted by minimizing:
j=2
βj xij
bj (x) = I{cj ≤x<cj+1 }
Regression splines have smaller number of knots and are fitted with least squares.
Ridge regression:
F
G2
p
p
n
"
"
"
Minimize
yi − β 1 −
βj xij
+λ
βj2
p
"
or
Natural splines have second derivative equal to 0 at endpoints.
Minimizing the modified function results in shrinking coefficients of less important variables.
j=2
"
Basis function:
Ridge regression and the lasso apply a specific penalty function to the sum of square differences.
yi − β 1 −
Page 32
EXTENSION TO THE LINEAR MODEL
select the best among these candidates. There is a total of 1 + k(k+1)
fitted models.
2
Or
Exam MAS-I
PCR and PLS create new variables that are linear combinations (but not necessarily weighted average) of the original
• Backward stepwise selection: Begin with full model. Each time, remove the worst predictor from model. Then,
j=2
www.studymanuals.com
variables. These new variables capture the most important information from the original variables.
• Forward stepwise selection: Begin with minimal model. Each time, add the best predictor to the model. Then,
i=1
www.actuarialbookstore.com
Dimension reduction methods:
• Best subset selection: For each number of predictors, find the best candidate. Then, select the best among these
Minimize
.
/ ( n−1 )
Adjusted R2 = 1 − 1 − R2 n−p
• The one with the lower SSE is better.
Di > 1 → outlier
→
The lasso:
σ 2 = V ar (ε) can be estimated using s2 = MSE of the full model.
Models with the same number of predictors:
ŷ∗ = b1 + b2 x∗
F
SSES is the SSE of the subset model.
SSE/(n−p)
Adjusted R2 = 1 − SST/(n−1)
y∗ = β 1 + β 2 x ∗ + ε
n
"
MSEi is calculated using the test data.
(
)
Alternative BIC = nσ1 2 SSES + (log n) (p − 1) σ 2
Predicted value:
i=1
MSEi
The data is randomly partitioned into k subsets, each subset as test data, the rest as training data.
p is the number of parameters including the intercept.
|ri | > 2 or 3 → outlier
)( )
1
p
.p/
k
"
(
)
Alternative AIC = nσ1 2 SSES + 2 (p − 1) σ 2
Two-variable model:
Minimize
)2
k-fold CV has higher bias but lower variance. k-fold CV has computational advantage over LOOCV.
|ri | > 2 or 3 → outlier
(xi − x̄)
n
"
2
(xk − x̄)
DFITSi = ri
ε̂i
1−hii
i=1
(
)
Mallow’s Cp = n1 SSES + 2 (p − 1) σ 2
hii
DFITS:
n (
"
i=1
One observation as test data, the rest as training data. There is a total of n fits.
k-fold cross-validation = CV(k) = k1
1. Response is linear:
2-variable model:
MSEi = n1
LOOCV minimizes bias but maximizes variance.
ii
We need to validate model assumptions by checking the pattern of residuals.
Absolute value of ri :
n
"
i=1
MSEi is calculated using the test data.
2
Cov (ε̂i , ε̂j ) = σ 2 (0 − hii ) → Cov (ε̂i , ε̂j ) = σ 2 (0 − hii )
Standardized residual:
Page 30
+λ
p
"
j=2
When λ = ∞ → it is least squares regression line, with 2 effective DOF.
Local regression:
The value of the predictor variable ŷ is calculated using a different linear regression at each point x:
1. Select 0 ≤ s ≤ 1, then span. This determines the number of points to use for each regression.
|βj |
2. Assign weights to these pints. The point furthest away gest a weight of 0.
3. Perform a weighted regression of y on x. The regression may be constant, linear, or quadratic.
subject to
p
"
j=2
|βj | ≤ s.
Generalized Additive Models:
The lasso shrinks the coefficients and forces them to equal 0. It drops the less important variables from the model.
We can generalize to models with any number of predictors:
Y = β0 + f1 (x1 ) + f2 (x2 ) + · · · + fp−1 (xp−1 ) + ε
In other words, the lasso performs feature selection.
The model is additive in that there is no interaction between the predictors.
Both ridge regression and the lasso decrease the MSE of the estimate on the test data.
λ is a tuning parameter selected using cross-validation. λ increases → βj decrease → Bias increases, variance decreases.
www.ACTEXLearning.com
www.actuarialbookstore.com
www.studymanuals.com
www.ACTEXLearning.com
www.actuarialbookstore.com
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 33
ACTEX Learning
Logit link
Generalized Linear Models
GENERALIZED LINEAR MODELS
Generalized linear model: g (µ) = β1 + β2 x2 + · · · + βp xp
Exponential family:
Exam MAS-I
We model µ = E [Y ] rather than Y . g is the link function.
Tweedie distributions:
A family of distributions for which V ar (Y ) = aE [Y ] ,
N
Complementary log-log link
log (− log (1 − q)) = η
(q) = η
q = N (η)
q = 1 − e−e
η
Odds
Or we can write q = 1+Odds
.
The GLM requires q (y) = y, which is called the canonical form.
The link function that makes the GLM estimates unbiased.
η
e
q = 1+e
η
Probit link
−1
q
Note that for Logit link, the odds of YES is Odds= 1−q
= eη .
Y may have a pdf in the form of f (y; θ) = ep(θ)q(y)+r(θ)+s(y) .
Canonical link function:
q
log 1−q
=η
Page 34
Nominal Response:
There are multiple categories with no particular order.
Probability is qj for category j.
b
Category 1 is the base category.
where a does not contain the GLM parameter.
Members of exponential family in canonical form:
Logit link
log q1j = ηj
q
1
q1 = 1+eη2 +e
η3 +···
j ̸= 1
Canonical Link Function
Exponential/Gamma
g (µ) = − µ1
Negative inverse link
b=2
For a binary explanatory variable x2 , assume all other variables are held constant.
Normal
g (µ) = µ
Identity link
b=0
The odds ratio of category j to the based category is:
Inverse Gaussian
g (µ) = µ12
Inverse squared link
b=3
Poisson
g (µ) = log µ
Binomial
g (µ) = log
Negative Binomial
(
Tweedie Distribution
ηj
Distribution
Log link
µ
1−µ
)
&
Odds
qj,x =1
2
qj,x =0
'
qj = 1+eη2e+eη3 +···
&
qj,x =1
2
q1,x =1
'
j,x2 =1
ORj = Odds1,x
= & q1,x2 =1 ' = & qj,x2=0 ' = e
=1
2
b=1
2
q1,x =0
2
2
q1,x =0
2
j ̸= 1
p
"
βij xi
i=3
p
"
βij xi
β1j +β2j +
e
β1j +0+
= eβ2j
i=3
It does not vary with the values of other explanatory variables.
Logit link
Ordinary Response:
There are multiple categories that follow a logical order.
Probability is qj for category j.
POISSON RESPONSE
Let Yi ∼ P oisson (λ)
→
With log link:
There is no base category.
1
j
log qj+1
+···+qJ = β1j +
q +···+q
Then S = Y1 + · · · + Yn ∼ P oisson (nλ)
"p
log E [S] = log n + β1 + β2 x2 + · · · + βp xp →
Cumulative logit link
i=1 βi xi produces an estimate of log λ.
1
j
log qj+1
+···+qJ = β1j +
q +···+q
log n is called an offset.
→
j
log qj+1
= β1j +
q
Adjacent categories logit link
p
"
i=2
p
"
log qj+1 = β1j +
Binomial Response:
j
log qj+1 +···+q
= β1j +
J
There are two categories, Yes or No.
q
Continuation ratio logit link
Treat E [Y ] = q.
qj
log qj+1 +···+q
J
www.actuarialbookstore.com
www.studymanuals.com
ACTEX Learning
Exam MAS-I
Page 35
The (cumulative) odds ratio for category j is:
(
C.Odds
q1 +···+qj
qj+1 +···+qJ
)
i
i
qj+1 +···+qJ
x =ki }
{xi =gi }
p
"
β1j +
= ei=2
βi (ki −gi )
ESTIMATING PARAMETERS
With link function:
g (µ) =
p
"
µ = E [Y ] v = V ar (Y ) Refer to Tweedie distributions.
βj xj
j=1
The initial estimation: b
Exam MAS-I
(
)
µ = g −1 xb(0)
g (µ) = xb(0)
We calculate:
z = g (µ) + G (y − µ)
G is diagonal with Gii =
w
w is diagonal with wii =
→
.
/−1 . T
/
b(1) = xT wx
x wz
The next estimate:
(
We can write:
b(1) = b(0) + I (0)
)−1
*
6 +
∂g 6
∂µ 6
Test statistic:
D = −2 (l − ls ) ∼ χ2 (n − p)
1-tailed test
Reject H0 if D > c
H0 : Minimal model with lm , with only an intercept.
F*
µi
6 +2
∂g 6
∂µ 6
U with Uj =
n
"
MEASURES OF FIT
& yi −µi'
|
∂g
∂µ µ
i
vi
Test statistic:
C = −2 (lm − l) ∼ χ2 (p − 1)
1-tailed test
Reject H0 if C > c
→
µi
× vi
G−1
xij is the score vector.
H1 : Model under consideration with l, with p parameters.
Test statistic:
C = −2 (lc − l) ∼ χ2 (q)
1-tailed test
Reject H0 if C > c
Pearson chi-square test:
H0 : No significant difference between the observed and the expected values.
H1 : Significant difference between the observed and the expected values.
The parameter θ can be estimated using MLE, GLM, etc.
Binomial:
Q=
Poisson:
Q=
1-tailed test
Reject H0 if Q > c
Using MLE, the loglikelihood function is: l =
Using GLM, the loglikelihood function is: l =
i=1
n
"
i=1
Response
Loglikelihood function
Poisson
l=
l=
i=1
−λ̂i + yi log λ̂i − log yi !
log
.mi /
yi
www.ACTEXLearning.com
(
log f yi ; θ̂
)
There is only one estimate, θ̂.
(
)
log f yi ; θ̂i There are n estimates, each θ̂i depends on the resulting ŷi .
)
+ yi log q̂i + (mi − yi ) log (1 − q̂i )
)
0 (Oi − Ei )2
Vi
0 (Oi − Ei )2
Vi
=
=
0 (yi − ŷi )2
ŷi
∼ χ2 (n − p) where q̂i = m
i
mi q̂i (1 − q̂i )
0 (yi − ŷi )2
Saturated model
Your model
Minimal model
with n parameters
with p parameters
with an intercept only
Set λ̂i = yi
Set λ̂i = ŷi .
Set λ̂i = ȳ.
H0 : β j = β
vs
yi
Set q̂i = m
i
ŷi
Set q̂i = m
i
"
yi
Set q̂i = "
mi
Test statistic:
(bj − β)
∼ χ2 (1)
V ar (bj )
1-tailed test
Reject H0 if statistic > c
www.actuarialbookstore.com
lm − l
l
=1− m
lm
l
H0 : Constrained model with lc , with q parameters removed from model under consideration.
Suppose Y follows a parametric distribution with parameter θ.
n
"
Pseudo R2 =
Likelihood ratio test (constrained model):
I = xT wx is the information matrix.
U (0)
i=1
Binomial
Page 36
H1 : Model under consideration with l, with p parameters.
(0)
So we have:
n (
"
www.studymanuals.com
Likelihood ratio test (minimal model):
Method of scoring:
n (
"
βi xi
i=2
www.actuarialbookstore.com
ACTEX Learning
βij xi
H1 : Saturated model with ls , with n parameters.
This is independent of category j.
i=1
= β1j +
p
"
i=2
p
"
H0 : Model under consideration with l, with p parameters.
p
"
β i ki
i=2
p
"
β1j +
βi gi
i=2
e
=e
www.ACTEXLearning.com
βi xi
i=2
Deviance test:
For proportional odds model, suppose there are two sets of values of xi .
i =ki
ORj = C.Oddsj,x
= ( q1 +···+qj ){ i
j,x =g
βi xi
i=2
βij xi
Let η = β1 + β2 x2 + · · · + βp xp .
qj
www.ACTEXLearning.com
βij xi
(Proportional odds model)
CATEGORICAL RESPONSE
Probabilities are q and 1 − q.
p
"
i=2
p
"
www.studymanuals.com
ŷi
∼ χ2 (n − p)
Wald test:
www.ACTEXLearning.com
H1 : β j =
̸ β
2
www.actuarialbookstore.com
.
/−1
where V COV (b) = I −1 = xT wx
www.studymanuals.com
ACTEX Learning
CI for βj :
bj ± z
Type I/Type III tests:
Exam MAS-I
#
V ar (bj )
Page 37
bj − β
since #
∼N
˙ (0, 1)
V ar (bj )
Type I tests are sequential, adding one variable (or a group of variables) at a time to the model in a prescribed order.
Type III tests check one variable (or a group of variables) assuming all other variables are in the model.
Penalized loglikelihood tests:
AIC=−2l + 2p
BIC=−2l + p log n
Lower AIC or BIC → Better model
VALIDATION
Pearson/chi-square residuals:
Binomial response:
Poisson response:
yi − ŷi
Xi = √
Vi
yi − ŷi
Xi = √
Vi
Standardized Pearson residuals:
√
± di
Poisson response:
√
Vi = λ̂i
ri = √
Deviance residuals:
Binomial response:
Vi = mi q̂i (1 − q̂i ) q̂i =
ŷi
mi
λ̂i = ŷi
Xi
1 − hii
where D = −2 (l − ls ) =
n
"
di is the deviance test statistic.
i=1
n
"
where D = −2 (l − ls ) =
di is the deviance test statistic.
i=1
√
√
√
± di
ri = √
Use + di if yi − ŷi > 0, and − di if yi − yˆii < 0.
1 − hii
± di
Standardized deviance residuals:
Validation:
• Standardized residuals can be plotted to check normality, serial correlation, and linearity.
• High deviance may indicate overdispersion.
www.ACTEXLearning.com
www.actuarialbookstore.com
www.studymanuals.com
Download