SRM Updated 03/13/23 C LL R Statistical Learning Data ode ing Prob e s Types of ariables Response A variable of primary interest Explanatory A variable used to study the response variable Count A uantitative variable usually valid on non-negative integers Continuous A real-valued uantitative variable Nominal A categorical ualitative variable having categories without a meaningful or logical order Ordinal A categorical ualitative variable having categories with a meaningful or logical order Notation ๐ฆ๐ฆ ๐ฅ๐ฅ ubscript ๐๐ ubscript ! "# ๐ฆ๐ฆ (๐ฅ๐ฅ) Training Observations used to train/obtain fฬ Regression Problems , ๐ฅ๐ฅ$ 5 + where [ ] = = 2๐ฅ๐ฅ# , est = 2 − 5 ar 2๐ฅ๐ฅ# , Regression Quantitative response variable Method Properties , ๐ฅ๐ฅ$ the test M E is % ar[ ] , ๐ฅ๐ฅ$ 5 5 + ๐ผ๐ผ2 ≠ 5 hich can e esti ated usin ∑'&(# ๐ผ๐ผ(๐ฆ๐ฆ& ≠ ๐ฆ๐ฆ& ) a es lassi ier 2๐ฅ๐ฅ# , Unsupervised No response variable No output , ๐ฅ๐ฅ$ 5 = ar 3 ax Pr2 = ๐๐ # = ๐ฅ๐ฅ# , , $ = ๐ฅ๐ฅ$ 5 ey deas • The disadvantage to parametric methods is the danger of choosing a form for that is not close to the truth. • The disadvantage to non-parametric methods is the need for an abundance of observations. • lexibility and interpretability are typically at odds. • As flexibility increases the training M E or error rate decreases but the test M E or error rate follows a u-shaped pattern. • Low flexibility leads to a method with low variance and high bias high flexibility leads to a method with high variance and low bias. Classification Categorical response variable Parametric Functional form of f specified Non-Parametric Functional form of f not specified Prediction Output of fˆ Inference Comprehension of f Flexibility , fˆ s ability to follow the data Interpretability , fˆ s ability to be understood © 2023 Coaching Actuaries. All Rights Reserved ∑'&(#(๐ฆ๐ฆ& − ๐ฆ๐ฆ& )% , ๐ฅ๐ฅ$ 5 + 2 ias 2๐ฅ๐ฅ# , est rror ate = Statistical Learning Problems output , ๐ฅ๐ฅ$ 5 Classification Problems Contrasting tatistical Learning Elements Supervised Has response variable so [ ] = 2๐ฅ๐ฅ# , % hich can e esti ated usin or fixed inputs ๐ฅ๐ฅ# , Response variable Explanatory variable ndex for observations No. of observations ndex for variables except response No. of variables except response Transpose of matrix nverse of matrix Error term Estimate Estimator of (๐ฅ๐ฅ) Test Observations not used to train/obtain fฬ www.coachingactuaries.com SRM Formula Sheet 1 L R Estimation – Ordinary Least ๐ฆ๐ฆ = + # ๐ฅ๐ฅ# + โฏ + $ ๐ฅ๐ฅ$ L Linear Models i e Linear Regression LR pecial case of MLR where = 1 oโฎq= ๐๐ " = ∑'&(#(๐ฅ๐ฅ& ๐๐ #$" ๐ฅ๐ฅ % j ∑'&(#(๐ฅ๐ฅ& − ๐ฅ๐ฅ)% − ๐ฅ๐ฅ)% % % (๐ฅ๐ฅ − ๐ฅ๐ฅ)% i + ' j ∑&(#(๐ฅ๐ฅ& − ๐ฅ๐ฅ)% = i1 + 1 + Notation ๐ฝ๐ฝ The th LR regression coefficient Estimate of ๐ฝ๐ฝ % M E ๐๐ T R E ๐๐ ariance of response rreducible error Estimate of % esign matrix Hat matrix Residual Total sum of s uares Regression sum of s uares Error sum of s uares Assumptions . & = ๐ฝ๐ฝ + ๐ฝ๐ฝ# ๐ฅ๐ฅ&,# + โฏ + ๐ฝ๐ฝ$ ๐ฅ๐ฅ&,$ + = = # d #" , ๐ผ๐ผ ๐๐ ndf ddf + =1− % = 1 − (1 − Estimated standard error Null hypothesis Alternative hypothesis egrees of freedom uantile of a -distribution ignificance level Confidence level Numerator degrees of freedom enominator degrees of freedom uantile of an -distribution Response of new observation Reduced model ull model %) y #" , −1 z − −1 , ' # (๐ฅ๐ฅ' # − ๐ฅ๐ฅ)% j ∑'&(#(๐ฅ๐ฅ& − ๐ฅ๐ฅ)% u ti e Linear Regression = ๐ฝ๐ฝ + ๐ฝ๐ฝ# ๐ฅ๐ฅ# + โฏ + ๐ฝ๐ฝ$ ๐ฅ๐ฅ$ + Estimator for [ ] Other Numerical Results = ( ! )"# ! = ๐๐ = ๐ฆ๐ฆ − ๐ฆ๐ฆ = ∑'&(#(๐ฆ๐ฆ& − ๐ฆ๐ฆ)% = total variability = ∑'&(#(๐ฆ๐ฆ& − ๐ฆ๐ฆ)% = explained = ∑'&(#(๐ฆ๐ฆ& − ๐ฆ๐ฆ& )% = unexplained 1 ๐๐ = ! residual standard error = # ๐ฅ๐ฅ 1 i + = )"# ( − − 1) = LR n erences tandard Errors ๐๐ ! LR n erences Notation Estimator for ๐ฝ๐ฝ ๐ฝ๐ฝ $ Estimation ∑'&(#(๐ฅ๐ฅ& − ๐ฅ๐ฅ)(๐ฆ๐ฆ& − ๐ฆ๐ฆ) # = ∑'&(#(๐ฅ๐ฅ& − ๐ฅ๐ฅ)% = ๐ฆ๐ฆ − =( uares OL ey deas • % is a poor measure for model comparison because it will increase simply by adding more predictors to a model. • Polynomials do not change consistently by unit increases of its variable i.e. no constant slope. • Only ๐ค๐ค − 1 dummy variables are needed to represent ๐ค๐ค classes of a categorical predictor one of the classes acts as a baseline. • n effect dummy variables define a distinct intercept for each class. ithout the interaction between a dummy variable and a predictor the dummy variable cannot additionally affect that predictor s regression coefficient. & tandard Errors ๐๐ = ar ๐ฝ๐ฝ ariance-Covariance Matrix ( ar ๐ท๐ท = ! )"# = ar ๐ฝ๐ฝ o ๐ฝ๐ฝ , ๐ฝ๐ฝ# โฏ o ๐ฝ๐ฝ , ๐ฝ๐ฝ$ o ๐ฝ๐ฝ , ๐ฝ๐ฝ# โฎ o ๐ฝ๐ฝ , ๐ฝ๐ฝ$ ar ๐ฝ๐ฝ# โฎ o ๐ฝ๐ฝ# , ๐ฝ๐ฝ$ โฏ o ๐ฝ๐ฝ# , ๐ฝ๐ฝ$ โฎ ar ๐ฝ๐ฝ$ . [ &] = . ar[ & ] = % . & s are independent . & s are normally distributed . The predictor ๐ฅ๐ฅ is not a linear โฏ Tests esti ate − h pothesi ed alue standard error ๐ฝ๐ฝ = h pothesi ed alue statistic = est e Two-tailed e ection e ion statistic %,'"$"# Left-tailed statistic Right-tailed . ๐ฅ๐ฅ&, s are non-random − statistic ,'"$"# ,'"$"# Tests statistic = = ( − − 1) ๐ฝ๐ฝ# = ๐ฝ๐ฝ% = โฏ = ๐ฝ๐ฝ$ = Re ect combination of the other predictors for = , 1, , © 2023 Coaching Actuaries. All Rights Reserved ubscript ubscript • nd = • dd = www.coachingactuaries.com if statistic , , − −1 SRM Formula Sheet 2 Partial ariance nflation actor %( − 1) 1 = = ๐๐ % % 1− Tests statistic = − 2 − 5 2 2 − 5 − 15 o e ๐ฝ๐ฝ s = if Re ect • nd = statistic , Tolerance is the reciprocal of • rees rule of thumb: any , − • dd = −1 − or all hypothesis tests re ect - alue ๐ผ๐ผ. if Confidence and Prediction ntervals esti ate ( quantile)(standard error) uantit ๐ฝ๐ฝ nter al [ ] ๐ฆ๐ฆ ๐ฆ๐ฆ' ' # Linear ode Leverage โ& = โ& = ๐ฑ๐ฑ&! ( ! 1 + #"F %,'"$"# #"F %,'"$"# ๐๐ ๐๐ #$" tions ๐๐ ๐ฑ๐ฑ& = % (๐ฅ๐ฅ& − ๐ฅ๐ฅ)% or % (#(๐ฅ๐ฅ − ๐ฅ๐ฅ) ∑' • 1 โ& • ∑'&(# โ& = • %,'"$"# ssu )"# ression ๐๐ #"F # 1 +1 $ # rees rule of thumb: โ& ' tudentized and tandardized Residuals ๐๐& ๐๐ ,& = & (1 − โ& ) ๐๐& ๐๐ ,& • rees rule of thumb: ๐๐ = • . 1 , fit all $ (1 − โ& ) ,& 2 , ∑'(#2๐ฆ๐ฆ − ๐ฆ๐ฆ & 5 ( + 1) ๐๐&% โ& = ( + 1)(1 − โ& )% ๐ท๐ท& = Plots of Residuals • ๐๐ versus ๐ฆ๐ฆ Residuals are well-behaved if o Points appear to be randomly scattered o Residuals seem to average to o pread of residuals does not change % , • ๐๐ versus ๐๐ etects dependence of error terms plot of ๐๐ orward tepwise election . it all simple linear regression models. The model with the largest % is # . . or = 2, , fit the models that add one of the remaining predictors to $"# . The model with the largest % is $. , . Choose the best model among , using a selection criterion of choice. Bac ward tepwise election . it the model with all predictors . . or = − 1, , 1 fit the models that drop one of the predictors from $ # . The model with the largest % . Choose the best model among is − + 2( + 1) % alidation et using a selection criterion of choice. % = • Cross-validation error models with . Choose the best model among $ • Ad usted Best ubset election = , 1, = • Bayesian information criterion + ln = ode e ection Notation Total no. of predictors in consideration No. of predictors for a specific model M E of the model that uses all predictors The best model with predictors $ . or $ $ +2 • A ai e information criterion +2 = ey deas • As realizations of a -distribution studentized residuals can help identify outliers. • hen residuals have a larger spread for larger predictions one solution is to transform the response variable with a concave function. • There is no universal approach to handling multicollinearity it is even possible to accept it such as when there is a suppressor variable. On the other hand it can be eliminated by using a set of orthogonal predictors. predictors. The model with the largest is $ . Coo s istance • election Criteria • Mallows • Randomly splits all available observations into two groups: the training set and the validation set. • Only the observations in the training set are used to attain the fitted model and those in validation set are used to estimate the test M E. ๐๐-fold Cross- alidation . Randomly divide all available observations into ๐๐ folds. . or = 1, , ๐๐ obtain the th fit by training with all observations except those in the th fold. . or = 1, , ๐๐ use ๐ฆ๐ฆ from the th fit to calculate a test M E estimate with observations in the th fold. . To calculate C error average the ๐๐ test M E estimates in the previous step. Leave-one-out Cross- alidation LOOC • Calculate LOOC error as a special case of ๐๐-fold cross-validation where ๐๐ = . • or MLR: rror = 1 ' ๐ฆ๐ฆ& − ๐ฆ๐ฆ& % ∞y z 1 − โ& &(# ey deas on Cross- alidation • The validation set approach has unstable results and will tend to overestimate the test M E. The two other approaches mitigate these issues. • ith respect to bias LOOC ๐๐-fold C alidation et. • ith respect to variance LOOC C alidation et. ๐๐-fold $. , , using a selection criterion of choice. © 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com SRM Formula Sheet 3 ey deas on Ridge and Lasso t er Regression roac es tandardizing ariables • ๐ฅ๐ฅ# , • A centered variable is the result of subtracting the sample mean from a variable. • A scaled variable is the result of dividing a variable by its sample standard deviation. • A standardized variable is the result of first centering a variable then scaling it. is inversely related to flexibility. • E uivalent to running OL with ๐ค๐ค๐ฆ๐ฆ as • ith a finite none of the ridge estimates will e ual but the lasso estimates could e ual . the response and ๐ค๐ค๐ฑ๐ฑ as the predictors hence minimizing ∑'&(# ๐ค๐ค& (๐ฆ๐ฆ& − ๐ฆ๐ฆ& )% . )"# ! =( ! where is the diagonal matrix of the weights. predictors in a multiple linear regression. The number of directions is a measure of flexibility. e Negative Binomial fixed Gamma nverse Gaussian esults or istri utions in t e onential Famil ( ) Probability unction 1 exp − 2๐๐ (๐ฆ๐ฆ − ๐๐)% 2 % ๐๐ y z ๐๐ (1 − ๐๐)'" ๐ฆ๐ฆ Poisson ๐ฆ๐ฆ! (๐ผ๐ผ) ๐ฆ๐ฆ ln exp(− ) (๐ฆ๐ฆ + ) ๐ฆ๐ฆ! ( ) 2๐๐๐ฆ๐ฆ . tarting from the center of the neighborhood identify the ๐๐ nearest training observations. . or classification ๐ฆ๐ฆ is the most fre uent category among the ๐๐ observations for regression ๐ฆ๐ฆ is the average of the response among the ๐๐ observations. ๐๐ is inversely related to flexibility. • Every subse uent partial least s uares direction is calculated iteratively as a linear combination of updated predictors which are the residuals of fits with the previous predictors explained by the previous direction. • The directions # , , are used as or e uivalently by minimizing the expression + ∑$(# . Binomial fixed ๐๐-Nearest Neighbors NN . dentify the center of the neighborhood i.e. the location of an observation with inputs ๐ฅ๐ฅ# , , ๐ฅ๐ฅ$ . based on the relation between ๐ฅ๐ฅ and ๐ฆ๐ฆ. Lasso Regression Coefficients are estimated by minimizing the E while constrained by ∑$(# Normal uares • The first partial least s uares direction # is a linear combination of standardized predictors ๐ฅ๐ฅ# , , ๐ฅ๐ฅ$ with coefficients or e uivalently by minimizing the expression + ∑$(# % . "# ๐๐ 1 − ๐๐ ln21 + ๐๐ 5 ๐๐ − (๐ฆ๐ฆ − ๐๐)% 2๐๐% ๐ฆ๐ฆ − ๐ผ๐ผ 1 2๐๐% − ln21 − ๐๐ 5 1 ๐ผ๐ผ − ln(− ) 1 www.coachingactuaries.com − −2 M "# (๐๐) ๐๐ 2 ln(1 − ) exp(−๐ฆ๐ฆ ) Canonical Lin % % ln (1 − ) exp − © 2023 Coaching Actuaries. All Rights Reserved eighted Least uares ar[ & ] = % ๐ค๐ค& • Partial Least Ridge Regression Coefficients are estimated by minimizing the E while constrained by ∑$(# % istribution • , ๐ฅ๐ฅ$ are scaled predictors. ๐๐ z − ๐๐ ln y ln ๐๐ ๐๐ z + ๐๐ ln y − − 1 ๐๐ 1 2๐๐% SRM Formula Sheet 4 L R L Models Non-Linear enera i ed Linear ode s Notation , Linear exponential family parameters [ ] ๐๐ Mean response M( ) Mean function (๐๐) ariance function โ(๐๐) Lin function Maximum li elihood estimate of ๐ท๐ท ๐๐( ) Maximized log-li elihood ๐๐ Maximized log-li elihood for null model ๐๐ Maximized log-li elihood for saturated model ๐๐ Residual ๐๐ nformation matrix % uantile of a chi-s uare #" , distribution ๐ท๐ท caled deviance ๐ท๐ท eviance statistic Linear Exponential amily ๐ฆ๐ฆ − ( ) Pro n o = exp + (๐ฆ๐ฆ, ) [ ] = M( ) ar[ ] = MM ( )= (๐๐) Model ramewor • โ(๐๐) = ๐ฑ๐ฑ ! ๐ท๐ท • Canonical lin is the lin function where Numerical Results ๐ท๐ท = 2[๐๐ − ๐๐( )] ๐ท๐ท = ๐ท๐ท or MLR ๐ท๐ท = % % 1 − exp 2[๐๐ − ๐๐( )] 1 − exp 2๐๐ ๐๐( ) − ๐๐ = ๐๐ − ๐๐ = = −2 ๐๐( ) + 2 ( + 1) = −2 ๐๐( ) + ln ( + 1) Assumes only ๐ท๐ท needs to be estimated. f estimating is re uired replace + 1 with + 2. Residuals a esidual ๐๐& = ๐ฆ๐ฆ& − ๐๐ฬ & earson esidual ๐ฆ๐ฆ& − ๐๐ฬ & ๐๐& = (๐๐ฬ & ) • The Pearson chi-s uare statistic is ∑'&(# ๐๐&% . e iance esidual ๐๐& = •๐ท๐ท& whose sign follows the ๐๐th raw residual nscom e esidual (๐ฆ๐ฆ& ) − [ ( & )] ๐๐& = • ar[ ( & )] Parameter Estimation ' ๐๐(๐ท๐ท) = ∞ ๐ฆ๐ฆ& & − ( &) Li elihood Ratio Tests % statistic = 2 ๐๐2 ) 5 − ๐๐( o e ๐ฝ๐ฝ s = if Re ect % % statistic ,$ "$ Goodness-of- it Tests follows a distribution of choice with free parameters whose domain is split into ๐ค๐ค mutually exclusive intervals. S % statistic = ∞ ( 3 3 Re ect = if % 3 3(# 3 3) − or all ๐๐ = 1, , ๐ค๐ค % % statistic Tweedie istribution [ ] = ๐๐, ar[ ] = ,S" "# ๐๐ istri ution Normal M "# (๐๐). โ(๐๐) = nference • Maximum li elihood estimators ๐ท๐ท asymptotically have a multivariate normal distribution with mean ๐ท๐ท and asymptotic variance-covariance matrix ๐๐"# . • To address overdispersion change the MM ( ) variance to ar[ & ] = & and estimate as the Pearson chi-s uare statistic divided by − − 1. Poisson 1 Tweedie (1, 2) Gamma 2 nverse Gaussian + (๐ฆ๐ฆ& , ) &(# where & = M "# โ"# 2๐ฑ๐ฑ&! ๐ท๐ท5 The score e uations are the partial derivatives of ๐๐(๐ท๐ท) with respect to each ๐ฝ๐ฝ all set e ual to . The solution to the score e uations is . Then ๐๐ฬ = โ"#(๐ฑ๐ฑ ! ). © 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com SRM Formula Sheet 5 Logistic and Probit Regression • The odds of an event are the ratio of the probability that the event will occur to the probability that the event will not occur. • The odds ratio is the ratio of the odds of an event with the presence of a characteristic to the odds of the same event without the presence of that characteristic. Logit observation is classified as category ๐๐. The reference category is ๐๐. ๐๐&,3 ln i j = ๐ฑ๐ฑ&! ๐ท๐ท3 ๐๐&,F ๐๐&,3 exp2๐ฑ๐ฑ&! ๐ท๐ท3 5 โง โช1 + ∑UVF exp2๐ฑ๐ฑ ! ๐ท๐ทU 5 , & = 1 โจ , โช ! โฉ1 + ∑UVF exp2๐ฑ๐ฑ& ๐ท๐ทU 5 ' Binary Response Function Name Nominal Response – Generalized Logit Let ๐๐&,3 be the probability that the ๐๐th ๐๐ ≠ ๐๐ ln y ๐๐ z 1 − ๐๐ Probit Φ"# (๐๐) Complementary log-log ln(− ln(1 − ๐๐)) ๐๐ = ๐๐ S &(# 3(# ' &(# ' &(# ' ๐ฆ๐ฆ& ๐ท๐ท = 2 ∞ ”๐ฆ๐ฆ& ln y z − 1À + ๐๐ฬ & ‘ ๐๐ฬ & &(# Pearson residual, ๐๐& = Poisson Regression with Exposures Model ln ๐๐ = ln ๐ค๐ค + ๐ฑ๐ฑ ! ๐ท๐ท ' ' &(# 1 − ๐ฆ๐ฆ& ๐ฆ๐ฆ& ๐ท๐ท = 2 ∞ ๐ฆ๐ฆ& ln y z + (1 − ๐ฆ๐ฆ& ) ln y zÀ ๐๐ฬ & 1 − ๐๐ฬ & &(# ๐ฆ๐ฆ& − ๐๐ฬ & •๐๐ฬ & (1 − ๐๐ฬ & ) ' &(# (๐ฆ๐ฆ& − ๐๐ฬ & )% ๐๐ฬ & (1 − ๐๐ฬ & ) © 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com (๐ฆ๐ฆ& − ๐๐ฬ & )% ๐๐ฬ & Alternative Count Models These models can incorporate a Poisson distribution while letting the mean of the response differ from the variance of the response: &(# Pearson chi-square statistic = ∞ •๐๐ฬ & • Π3 = ๐๐# + โฏ + ๐๐3 ๐ฅ๐ฅ&,# ๐ฝ๐ฝ# • ๐ฑ๐ฑ& = — โฎ “ , ๐ท๐ท = o โฎ q ๐ฅ๐ฅ&,$ ๐ฝ๐ฝ$ ๐๐ ๐๐&M ๐๐(๐ท๐ท) = ∞ ๐ฑ๐ฑ& (๐ฆ๐ฆ& − ๐๐& ) = ๐๐ ๐๐๐ท๐ท ๐๐& (1 − ๐๐& ) Pearson residual, ๐๐& = ๐ฆ๐ฆ& − ๐๐ฬ & Pearson chi-square statistic = ∞ ' ' &(# ๐๐ ๐๐(๐ท๐ท) = ∞ ๐ฑ๐ฑ& (๐ฆ๐ฆ& − ๐๐& ) = ๐๐ ๐๐๐ท๐ท Ordinal Response – Proportional Odds Cumulative โ(Π3 ) = ๐ผ๐ผ3 + ๐ฑ๐ฑ&! ๐ท๐ท where ๐๐(๐ท๐ท) = ∞[๐ฆ๐ฆ& ln ๐๐& + (1 − ๐ฆ๐ฆ& ) ln(1 − ๐๐& )] &(# ' ๐๐(๐ท๐ท) = ∞[๐ฆ๐ฆ& ln ๐๐& − ๐๐& − ln(๐ฆ๐ฆ& !) ] ๐๐ = ∞ ๐๐& ๐ฑ๐ฑ& ๐ฑ๐ฑ&! ๐๐(๐ท๐ท) = ∞ ∞ ๐ผ๐ผ(๐ฆ๐ฆ& = ๐๐) ln ๐๐&,3 โ(๐๐) Poisson Count Regression ln ๐๐ = ๐ฑ๐ฑ ! ๐ท๐ท Models Mean < Variance Mean > Variance Negative binomial Yes No Zero-inflated Yes No Hurdle Yes Yes Heterogeneity Yes No SRM Formula Sheet 6 R Time Series rend ode s Notation ubscript ndex for observations Trends in time easonal trends Random patterns ๐ฆ๐ฆ' ๐๐-step ahead forecast ๐๐ Estimated standard error uantile of a -distribution #" , Training sample size Test sample size # % Trends Additive: = Multiplicative: + = + + utoregressi e ode s Notation Lag ๐๐ autocorrelation F Lag ๐๐ sample autocorrelation F % ariance of white noise % Estimate of % Estimate of ๐ฝ๐ฝ Estimate of ๐ฝ๐ฝ# # ๐ฆ๐ฆ" ample mean of first − 1 observations ๐ฆ๐ฆ ample mean of last − 1 observations Autocorrelation ∑'(F #(๐ฆ๐ฆ "F − ๐ฆ๐ฆ)(๐ฆ๐ฆ − ๐ฆ๐ฆ) F = ∑'(#(๐ฆ๐ฆ − ๐ฆ๐ฆ)% Re ect hite Noise ๐ฆ๐ฆ' = ๐ฆ๐ฆ AR Model = ๐ฝ๐ฝ + ๐ฝ๐ฝ# ๐๐ #$) = ๐๐ •1 + 1 prediction interval for ๐ฆ๐ฆ' ๐๐ #$) #"F %,'"# ๐ฆ๐ฆ' is Random al ๐ค๐ค = ๐ฆ๐ฆ − ๐ฆ๐ฆ "# ๐ฆ๐ฆ' = ๐ฆ๐ฆ' + ๐๐๐ค๐ค ๐๐ #$) = ๐๐ S Approximate ๐ฆ๐ฆ' is ๐ฆ๐ฆ' 2 prediction interval for ๐๐ #$) Model Comparison = ' 1 ∞ ๐๐ % ('" # P =1 = = ๐๐ ∞ ๐ฆ๐ฆ % 1 % 1 % P =1 ∞ ๐๐ % ('" # ' ∞ ๐๐ ('" # 1 % = ' ∞ ('" # ๐๐ ๐ฆ๐ฆ © 2023 Coaching Actuaries. All Rights Reserved * against # F ≠ if test statistic "# #" % = ๐๐ 1+ % # + # + โฏ+ % "# # prediction interval for ๐ฆ๐ฆ' ๐๐ #$) #"F %,'" ๐ฆ๐ฆ' is t er i e eries ode s Notation ๐๐ Moving average length ๐ค๐ค moothing parameter easonal base No. of trigonometric functions redictions = 'ฬ ๐ฆ๐ฆ' = + ouble moothing with Moving Averages = ๐ฝ๐ฝ + ๐ฝ๐ฝ# + for ๐๐ • f ๐ฝ๐ฝ# = follows a white noise process. • f ๐ฝ๐ฝ# = 1 follows a random wal process. • f −1 ๐ฝ๐ฝ# 1 is stationary. ro erties o tationar ๐ฝ๐ฝ [ ]= 1 − ๐ฝ๐ฝ# Model % stimation ∑'(%(๐ฆ๐ฆ "# − ๐ฆ๐ฆ" )(๐ฆ๐ฆ − ๐ฆ๐ฆ ) # = ∑'(%(๐ฆ๐ฆ "# − ๐ฆ๐ฆ")% = ๐ฆ๐ฆ − # ๐ฆ๐ฆ" ๐ฆ๐ฆ(1 − # ) ∑'(%(๐๐ − ๐๐)% % = − moot in ฬ + ฬ % ฬ = ฬ % + โฏ+ ฬ ๐๐ ฬ − ฬ "F % = ฬ "# + , ๐๐ "# "F # ๐๐ = 1, 2, redictions = 'ฬ # 1 − ๐ฝ๐ฝ#% ๐ฆ๐ฆ' = ๐ฝ๐ฝ#F ar[ ] = #$) moot in ๐ฆ๐ฆ + ๐ฆ๐ฆ "# + โฏ + ๐ฆ๐ฆ "F # ฬ = ๐๐ ๐ฆ๐ฆ − ๐ฆ๐ฆ "F , ๐๐ = 1, 2, ฬ = ฬ "# + ๐๐ =1 ssum tions . [ ]= . ar[ ] = % . o [ F, ] = F ('" # ' F ar[ ] = ' 1 where ๐๐ ๐๐ moothing with Moving Averages = ๐ฝ๐ฝ + estin utocorrelation test statistic = F ๐๐ * tationarity tationarity describes how something does not vary with respect to time. Control charts can be used to identify stationarity. moot in and redictions ๐ฆ๐ฆ = + # ๐ฆ๐ฆ "# , 2 + # ๐ฆ๐ฆ' "# , ๐๐ = 1 ๐ฆ๐ฆ' = ” + # ๐ฆ๐ฆ' "# , ๐๐ 1 = % 2 ฬ' − ฬ' = ๐๐ − 1 + # ๐๐ # % 1− % # www.coachingactuaries.com SRM Formula Sheet 7 Exponential moothing = ๐ฝ๐ฝ + moot in ฬ = (1 − ๐ค๐ค)(๐ฆ๐ฆ + ๐ค๐ค๐ฆ๐ฆ "# + โฏ + ๐ค๐ค ๐ฆ๐ฆ ) ๐ค๐ค 1 ฬ = (1 − ๐ค๐ค)๐ฆ๐ฆ + ๐ค๐ค ฬ "# , The value of ๐ค๐ค is determined by minimizing (๐ค๐ค) = ∑'(#(๐ฆ๐ฆ − ฬ "# )% . easonal Time eries Models Fi ed easonal ects ri onometric Functions = ∞ ๐ฝ๐ฝ#,& sin( ฬ ) + ๐ฝ๐ฝ%,& cos( & ) &(# • • redictions = 'ฬ ๐ฆ๐ฆ' = ฬ & & = 2๐๐๐๐ 2 nit Root Test • A unit root test is used to evaluate the fit of a random wal model. • A random wal model is a good fit if the time series possesses a unit root. • The ic ey- uller test and augmented ic ey- uller test are two examples of unit root tests. easonal utore ressi e Models = ๐ฝ๐ฝ + ๐ฝ๐ฝ# " + โฏ + ๐ฝ๐ฝ$ "$ + olt inter easonal dditi e Model = ๐ฝ๐ฝ + ๐ฝ๐ฝ# + + ouble Exponential moothing = ๐ฝ๐ฝ + ๐ฝ๐ฝ# + • moot in • ∑ % = (1 − ๐ค๐ค)( ฬ + ๐ค๐ค ฬ % = (1 − ๐ค๐ค) ฬ + ๐ค๐ค ฬ "# % "# , = (# " $ ( , ) Model + # %"# + โฏ + $ % = = % # "# + โฏ + ๐ค๐ค ฬ ) ๐ค๐ค olatility Models ( ) Model % = + # %"# + โฏ + ar[ ] = 1 redictions + โฏ+ 1 − ∑$(# −∑ % "$ % "$ % " + (# ssum tions % ฬ ' • • = 2 'ฬ − 1 − ๐ค๐ค % ฬ − 'ฬ # = ' ๐ค๐ค ๐ฆ๐ฆ' = + # ๐๐ • • ∑$(# +∑ (# 1 ey deas for moothing • t is only appropriate for time series data without a linear trend. • t is related to weighted least s uares. • A double smoothing procedure can be used to forecast time series data with a linear trend. • Holt- inter double exponential smoothing is a generalization of the double exponential smoothing. © 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com SRM Formula Sheet 8 C R Decision Trees Regression and C assi ication rees Notation Region of predictor space No. of observations in node U No. of category ๐๐ observations in U,3 node ๐ผ๐ผ mpurity Classification error rate Gini index ๐ท๐ท Cross entropy ubtree No. of terminal nodes in Tuning parameter Algorithm . Construct a large tree with terminal nodes using recursive binary splitting. . Obtain a se uence of best subtrees as a function of using cost complexity pruning. . Choose by applying ๐๐-fold cross validation. elect the that results in the lowest cross-validation error. . The best subtree is the subtree created in step with the selected value. Recursive Binary plitting e ression ini i e ∞ ∞ 2๐ฆ๐ฆ& − ๐ฆ๐ฆ Cost Complexity Pruning e ression ! ini i e ∞ ∞ 2๐ฆ๐ฆ& − ๐ฆ๐ฆ % 5 + U(# & lassi ication ini i e 1 ! ∞ U ๐ผ๐ผU + U(# ey deas • Terminal nodes or leaves represent the partitions of the predictor space. • nternal nodes are points along the tree where splits occur. • Terminal nodes do not have child nodes but internal nodes do. • Branches are lines that connect any two nodes. • A decision tree with only one internal node is called a stump. Advantages of Trees • Easy to interpret and explain • Can be presented visually • Manage categorical variables without the need of dummy variables • Mimic human decision-ma ing % 5 • Not robust • o not have the same degree of predictive accuracy as other statistical methods lassi ication 1 ∞ U ๐ผ๐ผU U(# More nder lassi ication ฬU,3 = U,3 U U =1− ax ฬU,3 3 = ∑S 3(# ฬU,3 21 − ฬ U,3 5 ๐ท๐ทU = − ∑S 3(# ฬU,3 ln ฬ U,3 U de iance = −2 ∑U(# ∑S 3(# residual ro erties • Bagging is a special case of random forests. • ncreasing does not cause overfitting. • ecreasing ๐๐ reduces the correlation between predictions. Boosting Let # be the actual response variable ๐ฆ๐ฆ. . or ๐๐ = 1, 2, , : • se recursive binary splitting to fit a tree with splits to the data with F as the response. • U,3 ln ฬ U,3 de iance ean de iance = − u ti e rees Bagging . Create bootstrap samples from the original training dataset. . Construct a decision tree for each bootstrap sample using recursive binary splitting. . Predict the response of a new observation by averaging the predictions regression trees or by using the most fre uent category classification trees across all trees. pdate F by subtracting F (๐ฑ๐ฑ) i.e. let F # = F − F (๐ฑ๐ฑ). . Calculate the boosted model prediction as (๐ฑ๐ฑ) = ∑F(# F (๐ฑ๐ฑ). isadvantages of Trees U(# & ini i e Random orests . Create bootstrap samples from the original training dataset. . Construct a decision tree for each bootstrap sample using recursive binary splitting. At each split a random subset of ๐๐ variables are considered. . Predict the response of a new observation by averaging the predictions regression trees or by using the most fre uent category classification trees across all trees. ro erties • ncreasing can cause overfitting. • Boosting reduces bias. • controls complexity of the boosted model. • controls the rate at which boosting learns. ro erties • ncreasing does not cause overfitting. • Bagging reduces variance. • Out-of-bag error is a valid estimate of test error. © 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com SRM Formula Sheet 9 P Unsupervised R L R Learning Princi a Co Notation onents na sis Principal component score ndex for principal components Principal component loading Centered explanatory variable ubscript ๐ฅ๐ฅ Principal Components $ U $ =∞ ,U ๐ฅ๐ฅ , &,U =∞ (# ,U ๐ฅ๐ฅ&, (# • ∑$(# % ,U =1 $ • ∑ (# ,U , = , ≠ Proportion of ariance Explained P E $ $ ∞ % ' 1 =∞ ∞ ๐ฅ๐ฅ&,% −1 (# (# &(# ' % = 1 ∞ −1 % &,U C uster na sis Notation Cluster containing indices ( ) ithin-cluster variation of cluster No. of observations in cluster uclidean istance = % ∑$(#2๐ฅ๐ฅ&, − ๐ฅ๐ฅU, 5 ๐๐-Means Clustering . Randomly assign a cluster to each observation. This serves as the initial cluster assignments. . Calculate the centroid of each cluster. . or each observation identify the closest centroid and reassign to that cluster. . Repeat steps and until the cluster assignments stop changing. ( )= 1 $ ∞ ∞2๐ฅ๐ฅ&, − ๐ฅ๐ฅU, 5 (# = 2 ∞ ∞2๐ฅ๐ฅ&, − ๐ฅ๐ฅ % , 5 (# &(# % P = ∑$(# • Examine all 2F%5 pairwise dissimilarities. The two clusters with the lowest inter-cluster dissimilarity are fused. The dissimilarity indicates the height in the dendrogram at which these two clusters oin. in a e nter cluster dissimilarit Complete The largest dissimilarity ingle % ey deas • The variance explained by each subse uent principal component is always less than the variance explained by the previous principal component. • All principal components are uncorrelated with one another. • A dataset has in( − 1, ) distinct principal components. The smallest dissimilarity Average The arithmetic mean Centroid The dissimilarity between the cluster centroids % &,U $ & Hierarchical Clustering . elect the dissimilarity measure and lin age to be used. Treat each observation as its own cluster. . or ๐๐ = , − 1, , 2: • Compute the inter-cluster dissimilarity between all ๐๐ clusters. ey deas • or ๐๐-means clustering the algorithm needs to be repeated for each ๐๐. • or hierarchical clustering the algorithm only needs to be performed once for any number of clusters. • The result of clustering depends on many parameters such as: o Choice of ๐๐ in ๐๐-means clustering o Choice of number of clusters lin age and dissimilarity measure in hierarchical clustering o Choice to standardize variables • The first ๐๐ principal component scores and loadings approximate the original ∑FU(# &,U ,U . dataset ๐ฅ๐ฅ&, Principal Components Regression = + # # + โฏ+ F F + • f ๐๐ = then ๐ฝ๐ฝ = ∑FU(# U ,U . © 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com SRM Formula Sheet 10