Uploaded by Ghadeer Ojaimi

srm formula sheet

advertisement
SRM
Updated 03/13/23
C LL
R
Statistical
Learning
Data
ode ing Prob e s
Types of ariables
Response
A variable of primary interest
Explanatory A variable used to study the response variable
Count
A uantitative variable usually valid on
non-negative integers
Continuous
A real-valued uantitative variable
Nominal
A categorical ualitative variable having categories
without a meaningful or logical order
Ordinal
A categorical ualitative variable having categories
with a meaningful or logical order
Notation
๐‘ฆ๐‘ฆ
๐‘ฅ๐‘ฅ
ubscript ๐‘–๐‘–
ubscript
!
"#
๐‘ฆ๐‘ฆ
(๐‘ฅ๐‘ฅ)
Training
Observations used
to train/obtain fฬ‚
Regression Problems
, ๐‘ฅ๐‘ฅ$ 5 + where [ ] =
= 2๐‘ฅ๐‘ฅ# ,
est
=
2 − 5
ar 2๐‘ฅ๐‘ฅ# ,
Regression
Quantitative
response variable
Method
Properties
, ๐‘ฅ๐‘ฅ$ the test M E is
%
ar[ ]
, ๐‘ฅ๐‘ฅ$ 5 5 +
๐ผ๐ผ2 ≠ 5
hich can e esti ated usin
∑'&(# ๐ผ๐ผ(๐‘ฆ๐‘ฆ& ≠ ๐‘ฆ๐‘ฆ& )
a es lassi ier
2๐‘ฅ๐‘ฅ# ,
Unsupervised
No response variable
No
output
, ๐‘ฅ๐‘ฅ$ 5 = ar
3
ax Pr2 = ๐‘๐‘
#
= ๐‘ฅ๐‘ฅ# ,
,
$
= ๐‘ฅ๐‘ฅ$ 5
ey deas
• The disadvantage to parametric methods is the danger of
choosing a form for that is not close to the truth.
• The disadvantage to non-parametric methods is the need for an
abundance of observations.
• lexibility and interpretability are typically at odds.
• As flexibility increases the training M E or error rate decreases
but the test M E or error rate follows a u-shaped pattern.
• Low flexibility leads to a method with low variance and high bias
high flexibility leads to a method with high variance and low bias.
Classification
Categorical response
variable
Parametric
Functional form
of f specified
Non-Parametric
Functional form
of f not specified
Prediction
Output of fˆ
Inference
Comprehension
of f
Flexibility
,
fˆ s ability to
follow the data
Interpretability
,
fˆ s ability to
be understood
© 2023 Coaching Actuaries. All Rights Reserved
∑'&(#(๐‘ฆ๐‘ฆ& − ๐‘ฆ๐‘ฆ& )%
, ๐‘ฅ๐‘ฅ$ 5 + 2 ias 2๐‘ฅ๐‘ฅ# ,
est rror ate =
Statistical Learning Problems
output
, ๐‘ฅ๐‘ฅ$ 5
Classification Problems
Contrasting tatistical Learning Elements
Supervised
Has response variable
so [ ] = 2๐‘ฅ๐‘ฅ# ,
%
hich can e esti ated usin
or fixed inputs ๐‘ฅ๐‘ฅ# ,
Response variable
Explanatory variable
ndex for observations
No. of observations
ndex for variables except response
No. of variables except response
Transpose of matrix
nverse of matrix
Error term
Estimate Estimator of (๐‘ฅ๐‘ฅ)
Test
Observations not used
to train/obtain fฬ‚
www.coachingactuaries.com
SRM Formula Sheet 1
L
R
Estimation – Ordinary Least
๐‘ฆ๐‘ฆ = + # ๐‘ฅ๐‘ฅ# + โ‹ฏ + $ ๐‘ฅ๐‘ฅ$
L
Linear
Models
i
e Linear Regression LR
pecial case of MLR where = 1
oโ‹ฎq=
๐‘’๐‘’
"
=
∑'&(#(๐‘ฅ๐‘ฅ&
๐‘’๐‘’
#$"
๐‘ฅ๐‘ฅ %
j
∑'&(#(๐‘ฅ๐‘ฅ& − ๐‘ฅ๐‘ฅ)%
− ๐‘ฅ๐‘ฅ)%
%
%
(๐‘ฅ๐‘ฅ − ๐‘ฅ๐‘ฅ)%
i + '
j
∑&(#(๐‘ฅ๐‘ฅ& − ๐‘ฅ๐‘ฅ)%
=
i1 +
1
+
Notation
๐›ฝ๐›ฝ
The
th
LR
regression coefficient
Estimate of ๐›ฝ๐›ฝ
%
M E
๐‘’๐‘’
T
R
E
๐‘’๐‘’
ariance of response
rreducible error
Estimate of %
esign matrix
Hat matrix
Residual
Total sum of s uares
Regression sum of s uares
Error sum of s uares
Assumptions
. & = ๐›ฝ๐›ฝ + ๐›ฝ๐›ฝ# ๐‘ฅ๐‘ฅ&,# + โ‹ฏ + ๐›ฝ๐›ฝ$ ๐‘ฅ๐‘ฅ&,$ +
=
=
#
d
#" ,
๐›ผ๐›ผ
๐‘˜๐‘˜
ndf
ddf
+
=1−
%
= 1 − (1 −
Estimated standard error
Null hypothesis
Alternative hypothesis
egrees of freedom
uantile of
a -distribution
ignificance level
Confidence level
Numerator degrees
of freedom
enominator degrees
of freedom
uantile of
an -distribution
Response of
new observation
Reduced model
ull model
%)
y
#" ,
−1
z
− −1
,
' #
(๐‘ฅ๐‘ฅ' # − ๐‘ฅ๐‘ฅ)%
j
∑'&(#(๐‘ฅ๐‘ฅ& − ๐‘ฅ๐‘ฅ)%
u ti e Linear Regression
= ๐›ฝ๐›ฝ + ๐›ฝ๐›ฝ# ๐‘ฅ๐‘ฅ# + โ‹ฏ + ๐›ฝ๐›ฝ$ ๐‘ฅ๐‘ฅ$ +
Estimator for [ ]
Other Numerical Results
= ( ! )"# !
=
๐‘’๐‘’ = ๐‘ฆ๐‘ฆ − ๐‘ฆ๐‘ฆ
= ∑'&(#(๐‘ฆ๐‘ฆ& − ๐‘ฆ๐‘ฆ)% = total variability
= ∑'&(#(๐‘ฆ๐‘ฆ& − ๐‘ฆ๐‘ฆ)% = explained
= ∑'&(#(๐‘ฆ๐‘ฆ& − ๐‘ฆ๐‘ฆ& )% = unexplained
1
๐‘’๐‘’ =
!
residual standard error =
# ๐‘ฅ๐‘ฅ
1
i +
=
)"#
( − − 1)
=
LR n erences
tandard Errors
๐‘’๐‘’
!
LR n erences
Notation
Estimator for ๐›ฝ๐›ฝ
๐›ฝ๐›ฝ
$
Estimation
∑'&(#(๐‘ฅ๐‘ฅ& − ๐‘ฅ๐‘ฅ)(๐‘ฆ๐‘ฆ& − ๐‘ฆ๐‘ฆ)
# =
∑'&(#(๐‘ฅ๐‘ฅ& − ๐‘ฅ๐‘ฅ)%
= ๐‘ฆ๐‘ฆ −
=(
uares OL
ey deas
• % is a poor measure for model
comparison because it will increase
simply by adding more predictors
to a model.
• Polynomials do not change consistently
by unit increases of its variable i.e. no
constant slope.
• Only ๐‘ค๐‘ค − 1 dummy variables are
needed to represent ๐‘ค๐‘ค classes of a
categorical predictor one of the classes
acts as a baseline.
• n effect dummy variables define a
distinct intercept for each class. ithout
the interaction between a dummy
variable and a predictor the dummy
variable cannot additionally affect that
predictor s regression coefficient.
&
tandard Errors
๐‘’๐‘’
=
ar ๐›ฝ๐›ฝ
ariance-Covariance Matrix
(
ar ๐œท๐œท =
!
)"# =
ar ๐›ฝ๐›ฝ
o ๐›ฝ๐›ฝ , ๐›ฝ๐›ฝ#
โ‹ฏ
o ๐›ฝ๐›ฝ , ๐›ฝ๐›ฝ$
o ๐›ฝ๐›ฝ , ๐›ฝ๐›ฝ#
โ‹ฎ
o ๐›ฝ๐›ฝ , ๐›ฝ๐›ฝ$
ar ๐›ฝ๐›ฝ#
โ‹ฎ
o ๐›ฝ๐›ฝ# , ๐›ฝ๐›ฝ$
โ‹ฏ
o ๐›ฝ๐›ฝ# , ๐›ฝ๐›ฝ$
โ‹ฎ
ar ๐›ฝ๐›ฝ$
. [ &] =
. ar[ & ] = %
. & s are independent
. & s are normally distributed
. The predictor ๐‘ฅ๐‘ฅ is not a linear
โ‹ฏ
Tests
esti ate − h pothesi ed alue
standard error
๐›ฝ๐›ฝ = h pothesi ed alue
statistic =
est
e
Two-tailed
e ection e ion
statistic
%,'"$"#
Left-tailed
statistic
Right-tailed
. ๐‘ฅ๐‘ฅ&, s are non-random
−
statistic
,'"$"#
,'"$"#
Tests
statistic =
=
( − − 1)
๐›ฝ๐›ฝ# = ๐›ฝ๐›ฝ% = โ‹ฏ = ๐›ฝ๐›ฝ$ =
Re ect
combination of the other predictors
for = , 1, ,
© 2023 Coaching Actuaries. All Rights Reserved
ubscript
ubscript
• nd =
• dd =
www.coachingactuaries.com
if
statistic
,
,
− −1
SRM Formula Sheet 2
Partial
ariance nflation actor
%(
− 1)
1
=
=
๐‘’๐‘’ %
%
1−
Tests
statistic =
−
2
−
5 2
2 −
5
− 15
o e ๐›ฝ๐›ฝ s =
if
Re ect
• nd =
statistic
,
Tolerance is the reciprocal of
• rees rule of thumb: any
,
−
• dd =
−1
−
or all hypothesis tests re ect
- alue ๐›ผ๐›ผ.
if
Confidence and Prediction ntervals
esti ate ( quantile)(standard error)
uantit
๐›ฝ๐›ฝ
nter al
[ ]
๐‘ฆ๐‘ฆ
๐‘ฆ๐‘ฆ'
' #
Linear ode
Leverage
โ„Ž& =
โ„Ž& =
๐ฑ๐ฑ&! ( !
1
+
#"F
%,'"$"#
#"F
%,'"$"#
๐‘’๐‘’
๐‘’๐‘’
#$"
tions
๐‘’๐‘’
๐ฑ๐ฑ& =
%
(๐‘ฅ๐‘ฅ& − ๐‘ฅ๐‘ฅ)%
or
%
(#(๐‘ฅ๐‘ฅ − ๐‘ฅ๐‘ฅ)
∑'
• 1
โ„Ž&
• ∑'&(# โ„Ž& =
•
%,'"$"#
ssu
)"#
ression
๐‘’๐‘’
#"F
#
1
+1
$ #
rees rule of thumb: โ„Ž&
'
tudentized and tandardized Residuals
๐‘’๐‘’&
๐‘’๐‘’ ,& =
& (1 − โ„Ž& )
๐‘’๐‘’&
๐‘’๐‘’
,&
•
rees rule of thumb: ๐‘’๐‘’
=
•
.
1
,
fit all
$
(1 − โ„Ž& )
,&
2
,
∑'(#2๐‘ฆ๐‘ฆ − ๐‘ฆ๐‘ฆ & 5
( + 1)
๐‘’๐‘’&% โ„Ž&
=
( + 1)(1 − โ„Ž& )%
๐ท๐ท& =
Plots of Residuals
• ๐‘’๐‘’ versus ๐‘ฆ๐‘ฆ
Residuals are well-behaved if
o Points appear to be randomly scattered
o Residuals seem to average to
o pread of residuals does not change
%
,
• ๐‘’๐‘’ versus ๐‘–๐‘–
etects dependence of error terms
plot of ๐‘’๐‘’
orward tepwise election
. it all simple linear regression models.
The model with the largest % is # .
. or = 2, , fit the models that add
one of the remaining predictors to $"# .
The model with the largest
%
is
$.
,
. Choose the best model among
,
using a selection criterion of choice.
Bac ward tepwise election
. it the model with all predictors
.
. or = − 1, , 1 fit the models that
drop one of the predictors from $ # .
The model with the largest
%
. Choose the best model among
is
−
+ 2( + 1)
%
alidation et
using a selection criterion of choice.
%
=
• Cross-validation error
models with
. Choose the best model among
$
• Ad usted
Best ubset election
= , 1,
=
• Bayesian information criterion
+ ln
=
ode e ection
Notation
Total no. of predictors
in consideration
No. of predictors for a
specific model
M E of the model that uses
all predictors
The best model with predictors
$
. or
$
$
+2
• A ai e information criterion
+2
=
ey deas
• As realizations of a -distribution
studentized residuals can help
identify outliers.
• hen residuals have a larger spread for
larger predictions one solution is to
transform the response variable with a
concave function.
• There is no universal approach to
handling multicollinearity it is even
possible to accept it such as when there
is a suppressor variable. On the other
hand it can be eliminated by using a set
of orthogonal predictors.
predictors. The model with the largest
is $ .
Coo s istance
•
election Criteria
• Mallows
• Randomly splits all available
observations into two groups: the
training set and the validation set.
• Only the observations in the training set
are used to attain the fitted model and
those in validation set are used to
estimate the test M E.
๐‘˜๐‘˜-fold Cross- alidation
. Randomly divide all available
observations into ๐‘˜๐‘˜ folds.
. or = 1, , ๐‘˜๐‘˜ obtain the th fit by
training with all observations except
those in the th fold.
. or = 1, , ๐‘˜๐‘˜ use ๐‘ฆ๐‘ฆ from the th fit to
calculate a test M E estimate with
observations in the th fold.
. To calculate C error average the ๐‘˜๐‘˜ test
M E estimates in the previous step.
Leave-one-out Cross- alidation LOOC
• Calculate LOOC error as a special case of
๐‘˜๐‘˜-fold cross-validation where ๐‘˜๐‘˜ = .
•
or MLR:
rror =
1
'
๐‘ฆ๐‘ฆ& − ๐‘ฆ๐‘ฆ& %
∞y
z
1 − โ„Ž&
&(#
ey deas on Cross- alidation
• The validation set approach has unstable
results and will tend to overestimate the
test M E. The two other approaches
mitigate these issues.
• ith respect to bias LOOC
๐‘˜๐‘˜-fold C
alidation et.
•
ith respect to variance LOOC
C
alidation et.
๐‘˜๐‘˜-fold
$.
,
,
using a selection criterion of choice.
© 2023 Coaching Actuaries. All Rights Reserved
www.coachingactuaries.com
SRM Formula Sheet 3
ey deas on Ridge and Lasso
t er Regression
roac es
tandardizing ariables
• ๐‘ฅ๐‘ฅ# ,
• A centered variable is the result of
subtracting the sample mean from
a variable.
• A scaled variable is the result of
dividing a variable by its sample
standard deviation.
• A standardized variable is the result of
first centering a variable then scaling it.
is inversely related to flexibility.
• E uivalent to running OL with ๐‘ค๐‘ค๐‘ฆ๐‘ฆ as
•
ith a finite none of the ridge
estimates will e ual but the lasso
estimates could e ual .
the response and ๐‘ค๐‘ค๐ฑ๐ฑ as the predictors
hence minimizing ∑'&(# ๐‘ค๐‘ค& (๐‘ฆ๐‘ฆ& − ๐‘ฆ๐‘ฆ& )% .
)"# !
=( !
where is the
diagonal matrix of the weights.
predictors in a multiple linear regression.
The number of directions is a measure
of flexibility.
e
Negative Binomial
fixed
Gamma
nverse Gaussian
esults or istri utions in t e
onential Famil
( )
Probability unction
1
exp −
2๐œ‹๐œ‹
(๐‘ฆ๐‘ฆ − ๐œ‡๐œ‡)%
2 %
๐œ‡๐œ‡
y z ๐œ‹๐œ‹ (1 − ๐œ‹๐œ‹)'"
๐‘ฆ๐‘ฆ
Poisson
๐‘ฆ๐‘ฆ!
(๐›ผ๐›ผ)
๐‘ฆ๐‘ฆ
ln
exp(− )
(๐‘ฆ๐‘ฆ + )
๐‘ฆ๐‘ฆ! ( )
2๐œ‹๐œ‹๐‘ฆ๐‘ฆ
. tarting from the center of the
neighborhood identify the ๐‘˜๐‘˜ nearest
training observations.
. or classification ๐‘ฆ๐‘ฆ is the most fre uent
category among the ๐‘˜๐‘˜ observations for
regression ๐‘ฆ๐‘ฆ is the average of the
response among the ๐‘˜๐‘˜ observations.
๐‘˜๐‘˜ is inversely related to flexibility.
• Every subse uent partial least s uares
direction is calculated iteratively as a
linear combination of updated
predictors which are the residuals of fits
with the previous predictors explained
by the previous direction.
• The directions # , , are used as
or e uivalently by minimizing the
expression
+ ∑$(# .
Binomial
fixed
๐‘˜๐‘˜-Nearest Neighbors NN
. dentify the center of the neighborhood
i.e. the location of an observation with
inputs ๐‘ฅ๐‘ฅ# , , ๐‘ฅ๐‘ฅ$ .
based on the relation between ๐‘ฅ๐‘ฅ and ๐‘ฆ๐‘ฆ.
Lasso Regression
Coefficients are estimated by minimizing
the E while constrained by ∑$(#
Normal
uares
• The first partial least s uares direction
# is a linear combination of standardized
predictors ๐‘ฅ๐‘ฅ# , , ๐‘ฅ๐‘ฅ$ with coefficients
or e uivalently by minimizing the
expression
+ ∑$(# % .
"#
๐œ‹๐œ‹
1 − ๐œ‹๐œ‹
ln21 + ๐‘’๐‘’ 5
๐‘’๐‘’
−
(๐‘ฆ๐‘ฆ − ๐œ‡๐œ‡)%
2๐œ‡๐œ‡% ๐‘ฆ๐‘ฆ
−
๐›ผ๐›ผ
1
2๐œ‡๐œ‡%
− ln21 − ๐‘’๐‘’ 5
1
๐›ผ๐›ผ
− ln(− )
1
www.coachingactuaries.com
− −2
M "# (๐œ‡๐œ‡)
๐œ‡๐œ‡
2
ln(1 − )
exp(−๐‘ฆ๐‘ฆ )
Canonical Lin
%
%
ln
(1 − )
exp −
© 2023 Coaching Actuaries. All Rights Reserved
eighted Least uares
ar[ & ] = % ๐‘ค๐‘ค&
•
Partial Least
Ridge Regression
Coefficients are estimated by minimizing
the E while constrained by ∑$(# %
istribution
•
, ๐‘ฅ๐‘ฅ$ are scaled predictors.
๐œ‡๐œ‡
z
− ๐œ‡๐œ‡
ln y
ln ๐œ‡๐œ‡
๐œ‡๐œ‡
z
+ ๐œ‡๐œ‡
ln y
−
−
1
๐œ‡๐œ‡
1
2๐œ‡๐œ‡%
SRM Formula Sheet 4
L
R
L Models
Non-Linear
enera i ed Linear ode s
Notation
,
Linear exponential family
parameters
[ ] ๐œ‡๐œ‡ Mean response
M( )
Mean function
(๐œ‡๐œ‡)
ariance function
โ„Ž(๐œ‡๐œ‡)
Lin function
Maximum li elihood estimate
of ๐œท๐œท
๐‘™๐‘™( )
Maximized log-li elihood
๐‘™๐‘™
Maximized log-li elihood for
null model
๐‘™๐‘™
Maximized log-li elihood for
saturated model
๐‘’๐‘’
Residual
๐ˆ๐ˆ
nformation matrix
%
uantile of a chi-s uare
#" ,
distribution
๐ท๐ท
caled deviance
๐ท๐ท
eviance statistic
Linear Exponential amily
๐‘ฆ๐‘ฆ − ( )
Pro n o = exp
+ (๐‘ฆ๐‘ฆ, )
[ ] = M( )
ar[ ] =
MM (
)=
(๐œ‡๐œ‡)
Model ramewor
• โ„Ž(๐œ‡๐œ‡) = ๐ฑ๐ฑ ! ๐œท๐œท
• Canonical lin is the lin function where
Numerical Results
๐ท๐ท = 2[๐‘™๐‘™ − ๐‘™๐‘™( )]
๐ท๐ท = ๐ท๐ท
or MLR ๐ท๐ท =
%
%
1 − exp 2[๐‘™๐‘™ − ๐‘™๐‘™( )]
1 − exp 2๐‘™๐‘™
๐‘™๐‘™( ) − ๐‘™๐‘™
=
๐‘™๐‘™ − ๐‘™๐‘™
=
= −2 ๐‘™๐‘™( ) + 2 ( + 1)
= −2 ๐‘™๐‘™( ) + ln ( + 1)
Assumes only ๐œท๐œท needs to be estimated. f
estimating is re uired replace + 1 with
+ 2.
Residuals
a
esidual
๐‘’๐‘’& = ๐‘ฆ๐‘ฆ& − ๐œ‡๐œ‡ฬ‚ &
earson esidual
๐‘ฆ๐‘ฆ& − ๐œ‡๐œ‡ฬ‚ &
๐‘’๐‘’& =
(๐œ‡๐œ‡ฬ‚ & )
•
The Pearson chi-s uare statistic is ∑'&(# ๐‘’๐‘’&% .
e iance esidual
๐‘’๐‘’& = •๐ท๐ท& whose sign follows the
๐‘–๐‘–th raw residual
nscom e esidual
(๐‘ฆ๐‘ฆ& ) − [ ( & )]
๐‘’๐‘’& =
• ar[ ( & )]
Parameter Estimation
'
๐‘™๐‘™(๐œท๐œท) = ∞
๐‘ฆ๐‘ฆ&
&
− ( &)
Li elihood Ratio Tests
%
statistic = 2 ๐‘™๐‘™2
)
5 − ๐‘™๐‘™(
o e ๐›ฝ๐›ฝ s =
if
Re ect
%
%
statistic
,$ "$
Goodness-of- it Tests
follows a distribution of choice with free
parameters whose domain is split into ๐‘ค๐‘ค
mutually exclusive intervals.
S
%
statistic = ∞
(
3
3
Re ect
=
if
%
3
3(#
3
3)
−
or all ๐‘๐‘ = 1,
, ๐‘ค๐‘ค
%
%
statistic
Tweedie istribution
[ ] = ๐œ‡๐œ‡,
ar[ ] =
,S" "#
๐œ‡๐œ‡
istri ution
Normal
M "# (๐œ‡๐œ‡).
โ„Ž(๐œ‡๐œ‡) =
nference
• Maximum li elihood estimators ๐œท๐œท
asymptotically have a multivariate
normal distribution with mean ๐œท๐œท
and asymptotic variance-covariance
matrix ๐ˆ๐ˆ"# .
• To address overdispersion change the
MM ( )
variance to ar[ & ] =
& and
estimate as the Pearson chi-s uare
statistic divided by − − 1.
Poisson
1
Tweedie
(1, 2)
Gamma
2
nverse Gaussian
+ (๐‘ฆ๐‘ฆ& , )
&(#
where
&
=
M "#
โ„Ž"# 2๐ฑ๐ฑ&! ๐œท๐œท5
The score e uations are the partial
derivatives of ๐‘™๐‘™(๐œท๐œท) with respect to each ๐›ฝ๐›ฝ
all set e ual to . The solution to the score
e uations is . Then ๐œ‡๐œ‡ฬ‚ = โ„Ž"#(๐ฑ๐ฑ ! ).
© 2023 Coaching Actuaries. All Rights Reserved
www.coachingactuaries.com
SRM Formula Sheet 5
Logistic and Probit Regression
• The odds of an event are the ratio of the
probability that the event will occur to
the probability that the event will
not occur.
• The odds ratio is the ratio of the odds
of an event with the presence of a
characteristic to the odds of the same
event without the presence of
that characteristic.
Logit
observation is classified as category ๐‘๐‘. The
reference category is ๐‘˜๐‘˜.
๐œ‹๐œ‹&,3
ln i
j = ๐ฑ๐ฑ&! ๐œท๐œท3
๐œ‹๐œ‹&,F
๐œ‹๐œ‹&,3
exp2๐ฑ๐ฑ&! ๐œท๐œท3 5
โŽง
โŽช1 + ∑UVF exp2๐ฑ๐ฑ ! ๐œท๐œทU 5 ,
&
=
1
โŽจ
,
โŽช
!
โŽฉ1 + ∑UVF exp2๐ฑ๐ฑ& ๐œท๐œทU 5
'
Binary Response
Function Name
Nominal Response – Generalized Logit
Let ๐œ‹๐œ‹&,3 be the probability that the ๐‘–๐‘–th
๐‘๐‘ ≠ ๐‘˜๐‘˜
ln y
๐œ‡๐œ‡
z
1 − ๐œ‡๐œ‡
Probit
Φ"# (๐œ‡๐œ‡)
Complementary
log-log
ln(− ln(1 − ๐œ‡๐œ‡))
๐‘๐‘ = ๐‘˜๐‘˜
S
&(# 3(#
'
&(#
'
&(#
'
๐‘ฆ๐‘ฆ&
๐ท๐ท = 2 ∞ ”๐‘ฆ๐‘ฆ& ln y z − 1À + ๐œ‡๐œ‡ฬ‚ & ‘
๐œ‡๐œ‡ฬ‚ &
&(#
Pearson residual, ๐‘’๐‘’& =
Poisson Regression with Exposures Model
ln ๐œ‡๐œ‡ = ln ๐‘ค๐‘ค + ๐ฑ๐ฑ ! ๐œท๐œท
'
'
&(#
1 − ๐‘ฆ๐‘ฆ&
๐‘ฆ๐‘ฆ&
๐ท๐ท = 2 ∞ ๐‘ฆ๐‘ฆ& ln y z + (1 − ๐‘ฆ๐‘ฆ& ) ln y
zÀ
๐œ‡๐œ‡ฬ‚ &
1 − ๐œ‡๐œ‡ฬ‚ &
&(#
๐‘ฆ๐‘ฆ& − ๐œ‡๐œ‡ฬ‚ &
•๐œ‡๐œ‡ฬ‚ & (1 − ๐œ‡๐œ‡ฬ‚ & )
'
&(#
(๐‘ฆ๐‘ฆ& − ๐œ‡๐œ‡ฬ‚ & )%
๐œ‡๐œ‡ฬ‚ & (1 − ๐œ‡๐œ‡ฬ‚ & )
© 2023 Coaching Actuaries. All Rights Reserved
www.coachingactuaries.com
(๐‘ฆ๐‘ฆ& − ๐œ‡๐œ‡ฬ‚ & )%
๐œ‡๐œ‡ฬ‚ &
Alternative Count Models
These models can incorporate a Poisson
distribution while letting the mean of
the response differ from the variance of
the response:
&(#
Pearson chi-square statistic = ∞
•๐œ‡๐œ‡ฬ‚ &
• Π3 = ๐œ‹๐œ‹# + โ‹ฏ + ๐œ‹๐œ‹3
๐‘ฅ๐‘ฅ&,#
๐›ฝ๐›ฝ#
• ๐ฑ๐ฑ& = — โ‹ฎ “ , ๐œท๐œท = o โ‹ฎ q
๐‘ฅ๐‘ฅ&,$
๐›ฝ๐›ฝ$
๐œ•๐œ•
๐œ‡๐œ‡&M
๐‘™๐‘™(๐œท๐œท) = ∞ ๐ฑ๐ฑ& (๐‘ฆ๐‘ฆ& − ๐œ‡๐œ‡& )
= ๐ŸŽ๐ŸŽ
๐œ•๐œ•๐œท๐œท
๐œ‡๐œ‡& (1 − ๐œ‡๐œ‡& )
Pearson residual, ๐‘’๐‘’& =
๐‘ฆ๐‘ฆ& − ๐œ‡๐œ‡ฬ‚ &
Pearson chi-square statistic = ∞
'
'
&(#
๐œ•๐œ•
๐‘™๐‘™(๐œท๐œท) = ∞ ๐ฑ๐ฑ& (๐‘ฆ๐‘ฆ& − ๐œ‡๐œ‡& ) = ๐ŸŽ๐ŸŽ
๐œ•๐œ•๐œท๐œท
Ordinal Response – Proportional Odds
Cumulative
โ„Ž(Π3 ) = ๐›ผ๐›ผ3 + ๐ฑ๐ฑ&! ๐œท๐œท where
๐‘™๐‘™(๐œท๐œท) = ∞[๐‘ฆ๐‘ฆ& ln ๐œ‡๐œ‡& + (1 − ๐‘ฆ๐‘ฆ& ) ln(1 − ๐œ‡๐œ‡& )]
&(#
'
๐‘™๐‘™(๐œท๐œท) = ∞[๐‘ฆ๐‘ฆ& ln ๐œ‡๐œ‡& − ๐œ‡๐œ‡& − ln(๐‘ฆ๐‘ฆ& !) ]
๐ˆ๐ˆ = ∞ ๐œ‡๐œ‡& ๐ฑ๐ฑ& ๐ฑ๐ฑ&!
๐‘™๐‘™(๐œท๐œท) = ∞ ∞ ๐ผ๐ผ(๐‘ฆ๐‘ฆ& = ๐‘๐‘) ln ๐œ‹๐œ‹&,3
โ„Ž(๐œ‡๐œ‡)
Poisson Count Regression
ln ๐œ‡๐œ‡ = ๐ฑ๐ฑ ! ๐œท๐œท
Models
Mean <
Variance
Mean >
Variance
Negative binomial
Yes
No
Zero-inflated
Yes
No
Hurdle
Yes
Yes
Heterogeneity
Yes
No
SRM Formula Sheet 6
R
Time Series
rend ode s
Notation
ubscript
ndex for observations
Trends in time
easonal trends
Random patterns
๐‘ฆ๐‘ฆ'
๐‘™๐‘™-step ahead forecast
๐‘’๐‘’
Estimated standard error
uantile of a -distribution
#" ,
Training sample size
Test sample size
#
%
Trends
Additive: =
Multiplicative:
+
=
+
+
utoregressi e ode s
Notation
Lag ๐‘˜๐‘˜ autocorrelation
F
Lag ๐‘˜๐‘˜ sample autocorrelation
F
%
ariance of white noise
%
Estimate of %
Estimate of ๐›ฝ๐›ฝ
Estimate of ๐›ฝ๐›ฝ#
#
๐‘ฆ๐‘ฆ"
ample mean of first
− 1 observations
๐‘ฆ๐‘ฆ
ample mean of last
− 1 observations
Autocorrelation
∑'(F #(๐‘ฆ๐‘ฆ "F − ๐‘ฆ๐‘ฆ)(๐‘ฆ๐‘ฆ − ๐‘ฆ๐‘ฆ)
F =
∑'(#(๐‘ฆ๐‘ฆ − ๐‘ฆ๐‘ฆ)%
Re ect
hite Noise
๐‘ฆ๐‘ฆ' = ๐‘ฆ๐‘ฆ
AR
Model
= ๐›ฝ๐›ฝ + ๐›ฝ๐›ฝ#
๐‘’๐‘’
#$)
=
๐‘˜๐‘˜
•1 + 1
prediction interval for ๐‘ฆ๐‘ฆ'
๐‘’๐‘’ #$)
#"F %,'"#
๐‘ฆ๐‘ฆ'
is
Random al
๐‘ค๐‘ค = ๐‘ฆ๐‘ฆ − ๐‘ฆ๐‘ฆ "#
๐‘ฆ๐‘ฆ' = ๐‘ฆ๐‘ฆ' + ๐‘™๐‘™๐‘ค๐‘ค
๐‘’๐‘’
#$)
=
๐‘™๐‘™
S
Approximate
๐‘ฆ๐‘ฆ' is ๐‘ฆ๐‘ฆ'
2
prediction interval for
๐‘’๐‘’ #$)
Model Comparison
=
'
1
∞ ๐‘’๐‘’
%
('" #
P =1
=
=
๐‘’๐‘’
∞
๐‘ฆ๐‘ฆ
%
1
%
1
%
P =1
∞ ๐‘’๐‘’ %
('" #
'
∞ ๐‘’๐‘’
('" #
1
%
=
'
∞
('" #
๐‘’๐‘’
๐‘ฆ๐‘ฆ
© 2023 Coaching Actuaries. All Rights Reserved
*
against # F ≠
if test statistic
"#
#"
%
=
๐‘˜๐‘˜
1+
%
#
+
#
+ โ‹ฏ+
% "#
#
prediction interval for ๐‘ฆ๐‘ฆ'
๐‘’๐‘’ #$)
#"F %,'"
๐‘ฆ๐‘ฆ'
is
t er i e eries ode s
Notation
๐‘˜๐‘˜
Moving average length
๐‘ค๐‘ค
moothing parameter
easonal base
No. of trigonometric functions
redictions
= 'ฬ‚
๐‘ฆ๐‘ฆ' =
+
ouble moothing with Moving Averages
= ๐›ฝ๐›ฝ + ๐›ฝ๐›ฝ# +
for ๐‘˜๐‘˜
• f ๐›ฝ๐›ฝ# =
follows a white noise process.
• f ๐›ฝ๐›ฝ# = 1 follows a random
wal process.
• f −1 ๐›ฝ๐›ฝ# 1 is stationary.
ro erties o tationar
๐›ฝ๐›ฝ
[ ]=
1 − ๐›ฝ๐›ฝ#
Model
%
stimation
∑'(%(๐‘ฆ๐‘ฆ "# − ๐‘ฆ๐‘ฆ" )(๐‘ฆ๐‘ฆ − ๐‘ฆ๐‘ฆ )
# =
∑'(%(๐‘ฆ๐‘ฆ "# − ๐‘ฆ๐‘ฆ")%
= ๐‘ฆ๐‘ฆ − # ๐‘ฆ๐‘ฆ" ๐‘ฆ๐‘ฆ(1 − # )
∑'(%(๐‘’๐‘’ − ๐‘’๐‘’)%
%
=
−
moot in
ฬ‚ + ฬ‚
%
ฬ‚ =
ฬ‚
%
+ โ‹ฏ+ ฬ‚
๐‘˜๐‘˜
ฬ‚ − ฬ‚ "F
%
= ฬ‚ "# +
,
๐‘˜๐‘˜
"#
"F #
๐‘˜๐‘˜ = 1, 2,
redictions
= 'ฬ‚
#
1 − ๐›ฝ๐›ฝ#%
๐‘ฆ๐‘ฆ'
= ๐›ฝ๐›ฝ#F
ar[ ] =
#$)
moot in
๐‘ฆ๐‘ฆ + ๐‘ฆ๐‘ฆ "# + โ‹ฏ + ๐‘ฆ๐‘ฆ "F #
ฬ‚ =
๐‘˜๐‘˜
๐‘ฆ๐‘ฆ − ๐‘ฆ๐‘ฆ "F
,
๐‘˜๐‘˜ = 1, 2,
ฬ‚ = ฬ‚ "# +
๐‘˜๐‘˜
=1
ssum tions
. [ ]=
. ar[ ] = %
. o [ F, ] =
F
('" #
'
F
ar[ ] =
'
1
where ๐‘’๐‘’
๐‘’๐‘’
moothing with Moving Averages
= ๐›ฝ๐›ฝ +
estin utocorrelation
test statistic = F ๐‘’๐‘’ *
tationarity
tationarity describes how something does
not vary with respect to time. Control charts
can be used to identify stationarity.
moot in and redictions
๐‘ฆ๐‘ฆ = + # ๐‘ฆ๐‘ฆ "# ,
2
+ # ๐‘ฆ๐‘ฆ' "# ,
๐‘™๐‘™ = 1
๐‘ฆ๐‘ฆ' = ”
+ # ๐‘ฆ๐‘ฆ' "# ,
๐‘™๐‘™ 1
=
%
2
ฬ‚' − ฬ‚'
=
๐‘˜๐‘˜ − 1
+ # ๐‘™๐‘™
#
%
1−
%
#
www.coachingactuaries.com
SRM Formula Sheet 7
Exponential moothing
= ๐›ฝ๐›ฝ +
moot in
ฬ‚ = (1 − ๐‘ค๐‘ค)(๐‘ฆ๐‘ฆ + ๐‘ค๐‘ค๐‘ฆ๐‘ฆ "# + โ‹ฏ + ๐‘ค๐‘ค ๐‘ฆ๐‘ฆ )
๐‘ค๐‘ค 1
ฬ‚ = (1 − ๐‘ค๐‘ค)๐‘ฆ๐‘ฆ + ๐‘ค๐‘ค ฬ‚ "# ,
The value of ๐‘ค๐‘ค is determined by minimizing
(๐‘ค๐‘ค) = ∑'(#(๐‘ฆ๐‘ฆ − ฬ‚ "# )% .
easonal Time eries Models
Fi ed easonal ects ri onometric
Functions
= ∞ ๐›ฝ๐›ฝ#,& sin(
ฬ‚
) + ๐›ฝ๐›ฝ%,& cos(
&
)
&(#
•
•
redictions
= 'ฬ‚
๐‘ฆ๐‘ฆ' =
ฬ‚
&
&
= 2๐œ‹๐œ‹๐‘–๐‘–
2
nit Root Test
• A unit root test is used to evaluate the fit
of a random wal model.
• A random wal model is a good fit if the
time series possesses a unit root.
• The ic ey- uller test and augmented
ic ey- uller test are two examples of
unit root tests.
easonal utore ressi e Models
= ๐›ฝ๐›ฝ + ๐›ฝ๐›ฝ# " + โ‹ฏ + ๐›ฝ๐›ฝ$ "$ +
olt inter easonal dditi e Model
= ๐›ฝ๐›ฝ + ๐›ฝ๐›ฝ# + +
ouble Exponential moothing
= ๐›ฝ๐›ฝ + ๐›ฝ๐›ฝ# +
•
moot in
• ∑
%
= (1 − ๐‘ค๐‘ค)( ฬ‚ + ๐‘ค๐‘ค ฬ‚
%
= (1 − ๐‘ค๐‘ค) ฬ‚ + ๐‘ค๐‘ค ฬ‚
"#
%
"# ,
=
(#
"
$
( , ) Model
+ # %"# + โ‹ฏ +
$
%
=
=
%
# "#
+ โ‹ฏ + ๐‘ค๐‘ค ฬ‚ )
๐‘ค๐‘ค
olatility Models
( ) Model
%
= + # %"# + โ‹ฏ +
ar[ ] =
1
redictions
+ โ‹ฏ+
1 − ∑$(#
−∑
%
"$
%
"$
%
"
+
(#
ssum tions
%
ฬ‚
'
•
•
= 2 'ฬ‚ −
1 − ๐‘ค๐‘ค
%
ฬ‚ − 'ฬ‚
# =
'
๐‘ค๐‘ค
๐‘ฆ๐‘ฆ' = + # ๐‘™๐‘™
•
• ∑$(#
+∑
(#
1
ey deas for moothing
• t is only appropriate for time series data
without a linear trend.
• t is related to weighted least s uares.
• A double smoothing procedure can be
used to forecast time series data with a
linear trend.
• Holt- inter double exponential
smoothing is a generalization of the
double exponential smoothing.
© 2023 Coaching Actuaries. All Rights Reserved
www.coachingactuaries.com
SRM Formula Sheet 8
C
R
Decision
Trees
Regression and C assi ication rees
Notation
Region of predictor space
No. of observations in node
U
No. of category ๐‘๐‘ observations in
U,3
node
๐ผ๐ผ
mpurity
Classification error rate
Gini index
๐ท๐ท
Cross entropy
ubtree
No. of terminal nodes in
Tuning parameter
Algorithm
. Construct a large tree with terminal
nodes using recursive binary splitting.
. Obtain a se uence of best subtrees
as a function of using cost
complexity pruning.
. Choose by applying ๐‘˜๐‘˜-fold cross
validation. elect the that results in the
lowest cross-validation error.
. The best subtree is the subtree created in
step with the selected value.
Recursive Binary plitting
e ression
ini i e ∞ ∞ 2๐‘ฆ๐‘ฆ& − ๐‘ฆ๐‘ฆ
Cost Complexity Pruning
e ression
!
ini i e ∞ ∞ 2๐‘ฆ๐‘ฆ& − ๐‘ฆ๐‘ฆ
%
5 +
U(# &
lassi ication
ini i e
1
!
∞
U
๐ผ๐ผU +
U(#
ey deas
• Terminal nodes or leaves represent the
partitions of the predictor space.
• nternal nodes are points along the tree
where splits occur.
• Terminal nodes do not have child nodes
but internal nodes do.
• Branches are lines that connect any
two nodes.
• A decision tree with only one internal
node is called a stump.
Advantages of Trees
• Easy to interpret and explain
• Can be presented visually
• Manage categorical variables without the
need of dummy variables
• Mimic human decision-ma ing
%
5
• Not robust
• o not have the same degree of predictive
accuracy as other statistical methods
lassi ication
1
∞
U
๐ผ๐ผU
U(#
More nder lassi ication
ฬ‚U,3 = U,3 U
U
=1−
ax ฬ‚U,3
3
= ∑S
3(# ฬ‚U,3 21 − ฬ‚ U,3 5
๐ท๐ทU = − ∑S
3(# ฬ‚U,3 ln ฬ‚ U,3
U
de iance = −2 ∑U(# ∑S
3(#
residual
ro erties
• Bagging is a special case of
random forests.
• ncreasing does not cause overfitting.
• ecreasing ๐‘˜๐‘˜ reduces the correlation
between predictions.
Boosting
Let # be the actual response variable ๐‘ฆ๐‘ฆ.
. or ๐‘˜๐‘˜ = 1, 2, , :
• se recursive binary splitting to fit a
tree with splits to the data with F as
the response.
•
U,3 ln ฬ‚ U,3
de iance
ean de iance =
−
u ti e rees
Bagging
. Create bootstrap samples from the
original training dataset.
. Construct a decision tree for each
bootstrap sample using recursive
binary splitting.
. Predict the response of a new observation
by averaging the predictions regression
trees or by using the most fre uent
category classification trees across
all trees.
pdate
F
by subtracting
F (๐ฑ๐ฑ)
i.e.
let F # = F −
F (๐ฑ๐ฑ).
. Calculate the boosted model prediction as
(๐ฑ๐ฑ) = ∑F(#
F (๐ฑ๐ฑ).
isadvantages of Trees
U(# &
ini i e
Random orests
. Create bootstrap samples from the
original training dataset.
. Construct a decision tree for each
bootstrap sample using recursive binary
splitting. At each split a random subset of
๐‘˜๐‘˜ variables are considered.
. Predict the response of a new observation
by averaging the predictions regression
trees or by using the most fre uent
category classification trees across
all trees.
ro erties
• ncreasing can cause overfitting.
• Boosting reduces bias.
• controls complexity of the
boosted model.
•
controls the rate at which
boosting learns.
ro erties
• ncreasing does not cause overfitting.
• Bagging reduces variance.
• Out-of-bag error is a valid estimate of
test error.
© 2023 Coaching Actuaries. All Rights Reserved
www.coachingactuaries.com
SRM Formula Sheet 9
P Unsupervised
R
L R Learning
Princi a Co
Notation
onents na sis
Principal component
score
ndex for principal
components
Principal component
loading
Centered explanatory
variable
ubscript
๐‘ฅ๐‘ฅ
Principal Components
$
U
$
=∞
,U ๐‘ฅ๐‘ฅ ,
&,U
=∞
(#
,U ๐‘ฅ๐‘ฅ&,
(#
• ∑$(#
%
,U
=1
$
• ∑ (#
,U
,
= ,
≠
Proportion of ariance Explained P E
$
$
∞
%
'
1
=∞
∞ ๐‘ฅ๐‘ฅ&,%
−1
(#
(#
&(#
'
%
=
1
∞
−1
%
&,U
C uster na sis
Notation
Cluster containing indices
( )
ithin-cluster variation
of cluster
No. of observations in cluster
uclidean istance =
%
∑$(#2๐‘ฅ๐‘ฅ&, − ๐‘ฅ๐‘ฅU, 5
๐‘˜๐‘˜-Means Clustering
. Randomly assign a cluster to each
observation. This serves as the initial
cluster assignments.
. Calculate the centroid of each cluster.
. or each observation identify the closest
centroid and reassign to that cluster.
. Repeat steps and until the cluster
assignments stop changing.
(
)=
1
$
∞ ∞2๐‘ฅ๐‘ฅ&, − ๐‘ฅ๐‘ฅU, 5
(#
= 2 ∞ ∞2๐‘ฅ๐‘ฅ&, − ๐‘ฅ๐‘ฅ
%
,
5
(#
&(#
%
P
=
∑$(#
• Examine all 2F%5 pairwise
dissimilarities. The two clusters with
the lowest inter-cluster dissimilarity
are fused. The dissimilarity indicates
the height in the dendrogram at which
these two clusters oin.
in a e
nter cluster dissimilarit
Complete
The largest dissimilarity
ingle
%
ey deas
• The variance explained by each
subse uent principal component is
always less than the variance explained
by the previous principal component.
• All principal components are
uncorrelated with one another.
• A dataset has in( − 1, ) distinct
principal components.
The smallest dissimilarity
Average
The arithmetic mean
Centroid
The dissimilarity between
the cluster centroids
%
&,U
$
&
Hierarchical Clustering
. elect the dissimilarity measure and
lin age to be used. Treat each
observation as its own cluster.
. or ๐‘˜๐‘˜ = , − 1, , 2:
• Compute the inter-cluster dissimilarity
between all ๐‘˜๐‘˜ clusters.
ey deas
• or ๐‘˜๐‘˜-means clustering the algorithm
needs to be repeated for each ๐‘˜๐‘˜.
• or hierarchical clustering the algorithm
only needs to be performed once for any
number of clusters.
• The result of clustering depends on many
parameters such as:
o Choice of ๐‘˜๐‘˜ in ๐‘˜๐‘˜-means clustering
o Choice of number of clusters lin age
and dissimilarity measure in
hierarchical clustering
o Choice to standardize variables
• The first ๐‘˜๐‘˜ principal component scores
and loadings approximate the original
∑FU(# &,U ,U .
dataset ๐‘ฅ๐‘ฅ&,
Principal Components Regression
= + # # + โ‹ฏ+ F F +
• f ๐‘˜๐‘˜ = then ๐›ฝ๐›ฝ = ∑FU(# U ,U .
© 2023 Coaching Actuaries. All Rights Reserved
www.coachingactuaries.com
SRM Formula Sheet 10
Download