SRM Updated 03/13/23 STATISTICAL LEARNING Statistical Learning Data Modeling Problems Types of Variables Response A variable of primary interest Explanatory A variable used to study the response variable Count A quantitative variable usually valid on non-negative integers Continuous A real-valued quantitative variable Nominal A categorical/qualitative variable having categories without a meaningful or logical order Ordinal A categorical/qualitative variable having categories with a meaningful or logical order Notation ๐ฆ๐ฆ, ๐๐ ๐ฅ๐ฅ, ๐๐ Subscript ๐๐ ๐๐ Subscript ๐๐ ๐๐ ๐๐! ๐๐"# ๐๐ , ๐ฆ๐ฆ+, ๐๐, ๐๐.(๐ฅ๐ฅ) ∑'&(#(๐ฆ๐ฆ& − ๐ฆ๐ฆ+& )% ๐๐ For fixed inputs ๐ฅ๐ฅ# , … , ๐ฅ๐ฅ$ , the test MSE is which can be estimated using % Var[๐๐] VXY .))*+,-./0* *))2) Test Error Rate = ES๐ผ๐ผ2๐๐ ≠ ๐๐,5T, ∑'&(# ๐ผ๐ผ(๐ฆ๐ฆ& ≠ ๐ฆ๐ฆ+& ) which can be estimated using ๐๐ Bayes Classifier: ๐๐2๐ฅ๐ฅ# , … , ๐ฅ๐ฅ$ 5 = arg max Pr2๐๐ = ๐๐a๐๐# = ๐ฅ๐ฅ# , … , ๐๐$ = ๐ฅ๐ฅ$ 5 3 Unsupervised No response variable Key Ideas • The disadvantage to parametric methods is the danger of choosing a form for ๐๐ that is not close to the truth. • The disadvantage to non-parametric methods is the need for an abundance of observations. • Flexibility and interpretability are typically at odds. • As flexibility increases, the training MSE (or error rate) decreases, but the test MSE (or error rate) follows a u-shaped pattern. • Low flexibility leads to a method with low variance and high bias; high flexibility leads to a method with high variance and low bias. Classification Categorical response variable Parametric Functional form of f specified Non-Parametric Functional form of f not specified Prediction Output of fˆ Inference Comprehension of f Flexibility , fˆ s ability to follow the data Interpretability , fˆ s ability to be understood © 2023 Coaching Actuaries. All Rights Reserved % Test MSE = E B2๐๐ − ๐๐,5 D , Classification Problems Statistical Learning Problems Method Properties ๐๐ = ๐๐2๐ฅ๐ฅ# , … , ๐ฅ๐ฅ$ 5 + ๐๐ where E[๐๐] = 0, so E[๐๐] = ๐๐2๐ฅ๐ฅ# , … , ๐ฅ๐ฅ$ 5 )*+,-./0* *))2) Contrasting Statistical Learning Elements Regression Quantitative response variable Regression Problems Test Observations not used to train/obtain fฬ VarS๐๐.2๐ฅ๐ฅ# , … , ๐ฅ๐ฅ$ 5T + 2BiasS๐๐.2๐ฅ๐ฅ# , … , ๐ฅ๐ฅ$ 5T5 + VWWWWWWWWWWWWXWWWWWWWWWWWWY Response variable Explanatory variable Index for observations No. of observations Index for variables except response No. of variables except response Transpose of matrix ๐๐ Inverse of matrix ๐๐ Error term Estimate/Estimator of ๐๐(๐ฅ๐ฅ) Supervised Has response variable Training Observations used to train/obtain fฬ www.coachingactuaries.com SRM Formula Sheet 1 LINEAR MODELS Linear Models Simple Linear Regression (SLR) Special case of MLR where ๐๐ = 1 Estimation ∑' (๐ฅ๐ฅ& − ๐ฅ๐ฅฬ )(๐ฆ๐ฆ& − ๐ฆ๐ฆe) ๐๐# = &(# ' ∑&(#(๐ฅ๐ฅ& − ๐ฅ๐ฅฬ )% ๐๐4 = ๐ฆ๐ฆe − ๐๐# ๐ฅ๐ฅฬ SLR Inferences Standard Errors 1 ๐ฅ๐ฅฬ % ๐ ๐ ๐ ๐ 5! = hMSE i + ' j ๐๐ ∑&(#(๐ฅ๐ฅ& − ๐ฅ๐ฅฬ )% MSE ๐ ๐ ๐ ๐ 5" = h ' ∑&(#(๐ฅ๐ฅ& − ๐ฅ๐ฅฬ )% (๐ฅ๐ฅ − ๐ฅ๐ฅฬ )% 1 ๐ ๐ ๐ ๐ 67 = hMSE i + ' j ๐๐ ∑&(#(๐ฅ๐ฅ& − ๐ฅ๐ฅฬ )% ๐ ๐ ๐ ๐ 67#$" = hMSE i1 + (๐ฅ๐ฅ'8# − ๐ฅ๐ฅฬ )% 1 + ' j ๐๐ ∑&(#(๐ฅ๐ฅ& − ๐ฅ๐ฅฬ )% Multiple Linear Regression (MLR) ๐๐ = ๐ฝ๐ฝ4 + ๐ฝ๐ฝ# ๐ฅ๐ฅ# + โฏ + ๐ฝ๐ฝ$ ๐ฅ๐ฅ$ + ๐๐ Notation ๐ฝ๐ฝ9 ๐๐9 ๐๐ % MSE X ๐๐ ๐๐ SST SSR SSE The ๐๐th regression coefficient Estimate of ๐ฝ๐ฝ9 Variance of response / Irreducible error Estimate of ๐๐ % Design matrix Hat matrix Residual Total sum of squares Regression sum of squares Error sum of squares Assumptions 1. ๐๐& = ๐ฝ๐ฝ4 + ๐ฝ๐ฝ# ๐ฅ๐ฅ&,# + โฏ + ๐ฝ๐ฝ$ ๐ฅ๐ฅ&,$ + ๐๐& Estimation – Ordinary Least Squares (OLS) ๐ฆ๐ฆ+ = ๐๐4 + ๐๐# ๐ฅ๐ฅ# + โฏ + ๐๐$ ๐ฅ๐ฅ$ ๐๐4 o โฎ q = ๐๐ = (๐๐ ! ๐๐)"# ๐๐ ! ๐ฒ๐ฒ ๐๐$ MSE = SSE⁄(๐๐ − ๐๐ − 1) residual standard error = √MSE Other Numerical Results ๐๐ = ๐๐(๐๐ ! ๐๐)"# ๐๐ ! ๐ฒ๐ฒ+ = ๐๐๐๐ ๐๐ = ๐ฆ๐ฆ − ๐ฆ๐ฆ+ SST = ∑'&(#(๐ฆ๐ฆ& − ๐ฆ๐ฆe)% = total variability SSR = ∑'&(#(๐ฆ๐ฆ+& − ๐ฆ๐ฆe)% = explained SSE = ∑'&(#(๐ฆ๐ฆ& − ๐ฆ๐ฆ+& )% = unexplained SST = SSR + SSE ๐ ๐ % = SSR⁄SST ๐๐ − 1 MSE % = 1 − % = 1 − (1 − ๐ ๐ % ) y z ๐ ๐ ;+<. ๐๐ − ๐๐ − 1 ๐ ๐ 6 Key Ideas • ๐ ๐ % is a poor measure for model comparison because it will increase simply by adding more predictors to a model. • Polynomials do not change consistently by unit increases of its variable, i.e. no constant slope. • Only ๐ค๐ค − 1 dummy variables are needed to represent ๐ค๐ค classes of a categorical predictor; one of the classes acts as a baseline. • In effect, dummy variables define a distinct intercept for each class. Without the interaction between a dummy variable and a predictor, the dummy variable cannot additionally affect that predictor's regression coefficient. ๐๐, ๐ ๐ ๐ ๐ ๐ป๐ป4 ๐ป๐ป# df ๐ก๐ก#">,+? ๐ผ๐ผ ๐๐ ndf ddf ๐น๐น#">,@+?,++? ๐๐'8# Subscript ๐๐ Subscript ๐๐ Standard Errors Estimated standard error Null hypothesis Alternative hypothesis Degrees of freedom ๐๐ quantile of a ๐ก๐ก-distribution Significance level Confidence level Numerator degrees of freedom Denominator degrees of freedom ๐๐ quantile of an ๐น๐น-distribution Response of new observation Reduced model Full model Variance-Covariance Matrix á T = MSE(๐๐ ! ๐๐)"# = Ö S๐ท๐ท Var Ö S๐ฝ๐ฝ.4 T Ö S๐ฝ๐ฝ.4 , ๐ฝ๐ฝ.# T โฏ Cov Ö S๐ฝ๐ฝ.4 , ๐ฝ๐ฝ.$ T Var Cov โก โค Ö S๐ฝ๐ฝ.4 , ๐ฝ๐ฝ.# T Ö S๐ฝ๐ฝ.# T Ö S๐ฝ๐ฝ.# , ๐ฝ๐ฝ.$ Tโฅ โขCov Var โฏ Cov โข โฅ โฎ โฎ โฑ โฎ โข โฅ . . Ö Ö . . . Ö VarS๐ฝ๐ฝ$ T โฆ โฃCovS๐ฝ๐ฝ4 , ๐ฝ๐ฝ$ T CovS๐ฝ๐ฝ# , ๐ฝ๐ฝ$ T โฏ ๐ก๐ก Tests estimate − hypothesized value standard error ๐ป๐ป4 : ๐ฝ๐ฝ9 = hypothesized value ๐ก๐ก statistic = Test Type Two-tailed Right-tailed 3. E[๐๐& ] = 0 4. Var[๐๐& ] = ๐๐ % 5. ๐๐& ’s are independent 6. ๐๐& ’s are normally distributed 7. The predictor ๐ฅ๐ฅ9 is not a linear Estimator for E[๐๐] Ö S๐ฝ๐ฝ.9 T ๐ ๐ ๐ ๐ 5% = ÑVar Left-tailed 2. ๐ฅ๐ฅ&,9 ’s are non-random ๐น๐น Tests Rejection Region |๐ก๐ก statistic| ≥ ๐ก๐กA⁄%,'"$"# ๐ก๐ก statistic ≤ −๐ก๐กA,'"$"# ๐ก๐ก statistic ≥ ๐ก๐กA,'"$"# MSR SSR⁄๐๐ = MSE SSE⁄(๐๐ − ๐๐ − 1) ๐ป๐ป4 : ๐ฝ๐ฝ# = ๐ฝ๐ฝ% = โฏ = ๐ฝ๐ฝ$ = 0 ๐น๐น statistic = Reject ๐ป๐ป4 if ๐น๐น statistic ≥ ๐น๐นA,@+?,++? combination of the other ๐๐ predictors, for ๐๐ = 0, 1, … , ๐๐ © 2023 Coaching Actuaries. All Rights Reserved MLR Inferences Notation Estimator for ๐ฝ๐ฝ9 ๐ฝ๐ฝ.9 • ndf = ๐๐ • ddf = ๐๐ − ๐๐ − 1 www.coachingactuaries.com SRM Formula Sheet 2 Partial ๐น๐น Tests ๐น๐น statistic = 2SSEC − SSED 5ò2๐๐D − ๐๐C 5 SSED ⁄2๐๐ − ๐๐D − 15 ๐ป๐ป4 : Some ๐ฝ๐ฝ9 ′s = 0 Reject ๐ป๐ป4 if ๐น๐น statistic ≥ ๐น๐นA,@+?,++? • ndf = ๐๐D − ๐๐C • ddf = ๐๐ − ๐๐D − 1 For all hypothesis tests, reject ๐ป๐ป4 if ๐๐-value ≤ ๐ผ๐ผ. Confidence and Prediction Intervals estimate ± (๐ก๐ก quantile)(standard error) Quantity ๐ฝ๐ฝ9 E[๐๐] ๐๐'8# Interval Expression ๐๐9 ± ๐ก๐ก(#"F)⁄%,'"$"# ⋅ ๐ ๐ ๐ ๐ 5% ๐ฆ๐ฆ+ ± ๐ก๐ก(#"F)⁄%,'"$"# ⋅ ๐ ๐ ๐ ๐ 67 ๐ฆ๐ฆ+'8# ± ๐ก๐ก(#"F)⁄%,'"$"# ⋅ ๐ ๐ ๐ ๐ 67#$" Linear Model Assumptions Leverage ๐ ๐ ๐ ๐ 67& % โ& = ๐ฑ๐ฑ&! (๐๐ ! ๐๐)"# ๐ฑ๐ฑ& = MSE (๐ฅ๐ฅ& − ๐ฅ๐ฅฬ )% 1 for SLR โ& = + ' ๐๐ ∑H(#(๐ฅ๐ฅH − ๐ฅ๐ฅฬ )% • 1⁄๐๐ ≤ โ& ≤ 1 • ∑'&(# โ& = ๐๐ + 1 $8# • Frees rule of thumb: โ& > 3 £ ' § Studentized and Standardized Residuals ๐๐& ๐๐IJ,,& = ÑMSE(&) (1 − โ& ) ๐๐IJ;,& = ๐๐& •MSE(1 − โ& ) • Frees rule of thumb: a๐๐IJ;,& a > 2 Cook’s Distance % ∑'H(#2๐ฆ๐ฆ+H − ๐ฆ๐ฆ+(&)H 5 MSE(๐๐ + 1) ๐๐&% โ& = MSE(๐๐ + 1)(1 − โ& )% ๐ท๐ท& = Plots of Residuals • ๐๐ versus ๐ฆ๐ฆ+ Residuals are well-behaved if o Points appear to be randomly scattered o Residuals seem to average to 0 o Spread of residuals does not change • ๐๐ versus ๐๐ Detects dependence of error terms • ๐๐๐๐ plot of ๐๐ Variance Inflation Factor ๐ ๐ K%% (๐๐ − 1) 1 VIF9 = = ๐ ๐ ๐๐5%% MSE 1 − ๐ ๐ 9% Tolerance is the reciprocal of VIF. • Frees rule of thumb: any VIF9 ≥ 10 Key Ideas • As realizations of a ๐ก๐ก-distribution, studentized residuals can help identify outliers. • When residuals have a larger spread for larger predictions, one solution is to transform the response variable with a concave function. • There is no universal approach to handling multicollinearity; it is even possible to accept it, such as when there is a suppressor variable. On the other hand, it can be eliminated by using a set of orthogonal predictors. Model Selection Notation ๐๐ Total no. of predictors in consideration ๐๐ No. of predictors for a specific model MSEL MSE of the model that uses all ๐๐ predictors Μ$ The "best" model with ๐๐ predictors Best Subset Selection 1. For ๐๐ = 0, 1, … , ๐๐, fit all £L$§ models with ๐๐ predictors. The model with the largest ๐ ๐ % is Μ$ . 2. Choose the best model among Μ4 , … , ΜL using a selection criterion of choice. Forward Stepwise Selection 1. Fit all ๐๐ simple linear regression models. The model with the largest ๐ ๐ % is Μ# . 2. For ๐๐ = 2, … , ๐๐, fit the models that add one of the remaining predictors to Μ$"# . % The model with the largest ๐ ๐ is Μ$ . 3. Choose the best model among Μ4 , … , ΜL using a selection criterion of choice. Backward Stepwise Selection 1. Fit the model with all ๐๐ predictors, ΜL . 2. For ๐๐ = ๐๐ − 1, … , 1, fit the models that drop one of the predictors from Μ$8# . The model with the largest ๐ ๐ % is Μ$ . 3. Choose the best model among Μ4 , … , ΜL Selection Criteria • Mallows’ ๐ถ๐ถ$ SSE + 2๐๐ ⋅ MSEL ๐ถ๐ถ$ = ๐๐ SSE ๐ถ๐ถ$ = − ๐๐ + 2(๐๐ + 1) MSEL • Akaike information criterion SSE + 2๐๐ ⋅ MSEL AIC = ๐๐ ⋅ MSEL • Bayesian information criterion SSE + ln ๐๐ ⋅ ๐๐ ⋅ MSEL BIC = ๐๐ ⋅ MSEL • Adjusted ๐ ๐ % • Cross-validation error Validation Set • Randomly splits all available observations into two groups: the training set and the validation set. • Only the observations in the training set are used to attain the fitted model, and those in validation set are used to estimate the test MSE. ๐๐-fold Cross-Validation 1. Randomly divide all available observations into ๐๐ folds. 2. For ๐ฃ๐ฃ = 1, … , ๐๐, obtain the ๐ฃ๐ฃth fit by training with all observations except those in the ๐ฃ๐ฃth fold. 3. For ๐ฃ๐ฃ = 1, … , ๐๐, use ๐ฆ๐ฆ+ from the ๐ฃ๐ฃth fit to calculate a test MSE estimate with observations in the ๐ฃ๐ฃth fold. 4. To calculate CV error, average the ๐๐ test MSE estimates in the previous step. Leave-one-out Cross-Validation (LOOCV) • Calculate LOOCV error as a special case of ๐๐-fold cross-validation where ๐๐ = ๐๐. • For MLR: LOOCV Error = ' 1 ๐ฆ๐ฆ& − ๐ฆ๐ฆ+& % ∞y z 1 − โ& ๐๐ &(# Key Ideas on Cross-Validation • The validation set approach has unstable results and will tend to overestimate the test MSE. The two other approaches mitigate these issues. • With respect to bias, LOOCV < ๐๐-fold CV < Validation Set. • With respect to variance, LOOCV > ๐๐-fold CV > Validation Set. using a selection criterion of choice. © 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com SRM Formula Sheet 3 Other Regression Approaches Standardizing Variables Key Ideas on Ridge and Lasso Weighted Least Squares • Var[๐๐& ] = ๐๐ % ⁄๐ค๐ค& • ๐ฅ๐ฅ# , … , ๐ฅ๐ฅ$ are scaled predictors. • A centered variable is the result of subtracting the sample mean from a variable. • A scaled variable is the result of dividing a variable by its sample standard deviation. • Equivalent to running OLS with √๐ค๐ค๐ฆ๐ฆ as • ๐๐ is inversely related to flexibility. • With a finite ๐๐, none of the ridge estimates will equal 0, but the lasso estimates could equal 0. the response and √๐ค๐ค๐ฑ๐ฑ as the predictors, hence minimizing ∑'&(# ๐ค๐ค& (๐ฆ๐ฆ& − ๐ฆ๐ฆ+& )% . ๐๐ = (๐๐ ! ๐๐๐๐)"# ๐๐ ! ๐๐๐๐ where ๐๐ is the diagonal matrix of the weights. Partial Least Squares • The first partial least squares direction, ๐ง๐ง# , is a linear combination of standardized predictors ๐ฅ๐ฅ# , … , ๐ฅ๐ฅ$ , with coefficients • A standardized variable is the result of first centering a variable, then scaling it. Ridge Regression Coefficients are estimated by minimizing the SSE while constrained by ∑$9(# ๐๐9% ≤ ๐๐ ๐๐-Nearest Neighbors (KNN) 1. Identify the "center of the neighborhood", i.e. the location of an observation with inputs ๐ฅ๐ฅ# , … , ๐ฅ๐ฅ$ . based on the relation between ๐ฅ๐ฅ9 and ๐ฆ๐ฆ. 2. Starting from the "center of the neighborhood", identify the ๐๐ nearest training observations. 3. For classification, ๐ฆ๐ฆ+ is the most frequent category among the ๐๐ observations; for regression, ๐ฆ๐ฆ+ is the average of the response among the ๐๐ observations. ๐๐ is inversely related to flexibility. • Every subsequent partial least squares direction is calculated iteratively as a linear combination of "updated predictors" which are the residuals of fits with the "previous predictors" explained by the previous direction. • The directions ๐ง๐ง# , … , ๐ง๐งL are used as or equivalently, by minimizing the expression SSE + ๐๐ ∑$9(# ๐๐9% . Lasso Regression Coefficients are estimated by minimizing the SSE while constrained by ∑$9(#a๐๐9 a ≤ ๐๐ predictors in a multiple linear regression. The number of directions, ๐๐, is a measure of flexibility. or equivalently, by minimizing the expression SSE + ๐๐ ∑$9(#a๐๐9 a. Key Results for Distributions in the Exponential Family Distribution Normal Binomial (fixed ๐๐) Poisson Negative Binomial (fixed ๐๐) Gamma Inverse Gaussian Probability Function 1 ๐๐√2๐๐ exp π− ๐๐ (๐ฆ๐ฆ − ๐๐)% ∫ 2๐๐ % ๐๐ ln21 + ๐๐ N 5 ๐๐ ln y z ๐๐ − ๐๐ 1 −๐๐ ln21 − ๐๐ N 5 ๐๐ ln y z ๐๐ + ๐๐ 1 ๐๐ −√−2๐๐ 1 ln(1 − ๐๐) © 2023 Coaching Actuaries. All Rights Reserved 1 ln ๐๐ Γ(๐ฆ๐ฆ + ๐๐) C ๐๐ (1 − ๐๐)6 ๐ฆ๐ฆ! Γ(๐๐) ๐๐ ๐๐(๐ฆ๐ฆ − ๐๐)% h exp π− ∫ O 2๐๐๐ฆ๐ฆ 2๐๐% ๐ฆ๐ฆ Canonical Link, ๐๐M "# (๐๐) ๐๐ % ๐๐ ln £ § 1 − ๐๐ ๐พ๐พ A A"# ๐ฆ๐ฆ exp(−๐ฆ๐ฆ๐ฆ๐ฆ) Γ(๐ผ๐ผ) ๐๐(๐๐) ๐๐ ๐๐ y z ๐๐ 6 (1 − ๐๐)'"6 ๐ฆ๐ฆ ๐๐6 exp(−๐๐) ๐ฆ๐ฆ! ๐๐ − − ๐พ๐พ ๐ผ๐ผ 1 2๐๐% 1 ๐ผ๐ผ ๐๐ % 2 ๐๐ N − ln(−๐๐) www.coachingactuaries.com ๐๐ ln ๐๐ − − 1 ๐๐ 1 2๐๐% SRM Formula Sheet 4 NON-LINEAR MODELS Non-Linear Models Generalized Linear Models Notation ๐๐, ๐๐ Linear exponential family parameters E[๐๐], ๐๐ Mean response ๐๐M (๐๐) Mean function ๐ฃ๐ฃ(๐๐) Variance function โ(๐๐) Link function ๐๐ Maximum likelihood estimate of ๐ท๐ท ๐๐(๐๐) Maximized log-likelihood ๐๐4 Maximized log-likelihood for null model ๐๐I;J Maximized log-likelihood for saturated model ๐๐ Residual ๐๐ Information matrix % ๐๐#">,+? ๐๐ quantile of a chi-square distribution ๐ท๐ท∗ Scaled deviance ๐ท๐ท Deviance statistic Linear Exponential Family ๐ฆ๐ฆ๐ฆ๐ฆ − ๐๐(๐๐) Prob. fn. of ๐๐ = exp π + ๐๐(๐ฆ๐ฆ, ๐๐)∫ ๐๐ E[๐๐] = ๐๐M (๐๐) Var[๐๐] = ๐๐ ⋅ ๐๐MM (๐๐) = ๐๐ ⋅ ๐ฃ๐ฃ(๐๐) Model Framework • โ(๐๐) = ๐ฑ๐ฑ ! ๐ท๐ท • Canonical link is the link function where โ(๐๐) = ๐๐ Numerical Results ๐ท๐ท∗ = 2[๐๐I;J − ๐๐(๐๐)] ๐ท๐ท = ๐๐๐ท๐ท∗ For MLR, ๐ท๐ท = SSE 1 − exp{2[๐๐4 − ๐๐(๐๐)]⁄๐๐} 1 − exp{2๐๐4 ⁄๐๐} ๐๐(๐๐) − ๐๐4 = ๐๐I;J − ๐๐4 % ๐ ๐ QI = % ๐ ๐ RI*. AIC = −2 ⋅ ๐๐(๐๐) + 2 ⋅ (๐๐ + 1)* BIC = −2 ⋅ ๐๐(๐๐) + ln ๐๐ ⋅ (๐๐ + 1)* *Assumes only ๐ท๐ท needs to be estimated. If estimating ๐๐ is required, replace ๐๐ + 1 with ๐๐ + 2. Residuals Raw Residual ๐๐& = ๐ฆ๐ฆ& − ๐๐ฬ & Pearson Residual ๐ฆ๐ฆ& − ๐๐ฬ & ๐๐& = •๐๐ ⋅ ๐ฃ๐ฃ(๐๐ฬ & ) The Pearson chi-square statistic is ∑'&(# ๐๐&% . Deviance Residual ๐๐& = ±•๐ท๐ท&∗ whose sign follows the ๐๐th raw residual Anscombe Residual á[๐ก๐ก(๐๐& )] ๐ก๐ก(๐ฆ๐ฆ& ) − E ๐๐& = Ö •Var[๐ก๐ก(๐๐& )] Inference á • Maximum likelihood estimators ๐ท๐ท asymptotically have a multivariate normal distribution with mean ๐ท๐ท and asymptotic variance-covariance matrix ๐๐"# . • To address overdispersion, change the variance to Var[๐๐& ] = ๐ฟ๐ฟ ⋅ ๐๐ ⋅ ๐๐MM (๐๐& ) and estimate ๐ฟ๐ฟ as the Pearson chi-square statistic divided by ๐๐ − ๐๐ − 1. Likelihood Ratio Tests ๐๐ % statistic = 2S๐๐2๐๐D 5 − ๐๐(๐๐C )T ๐ป๐ป4 : Some ๐ฝ๐ฝ9 ′s = 0 % Reject ๐ป๐ป4 if ๐๐ % statistic ≥ ๐๐A,$ ' "$( Goodness-of-Fit Tests ๐๐ follows a distribution of choice with ๐๐ free parameters, whose domain is split into ๐ค๐ค mutually exclusive intervals. S ๐๐ % statistic = ∞ 3(# (๐๐3 − ๐๐๐๐3 )% ๐๐๐๐3 ๐๐3 for all ๐๐ = 1, … , ๐ค๐ค ๐ป๐ป4 : ๐๐3 = ๐๐ % Reject ๐ป๐ป4 if ๐๐ % statistic ≥ ๐๐A,S"L"# Tweedie Distribution E[๐๐] = ๐๐, Var[๐๐] = ๐๐ ⋅ ๐๐T Distribution Normal 0 Poisson M "# (๐๐). Parameter Estimation ๐ฆ๐ฆ& ๐๐& − ๐๐(๐๐& ) ๐๐(๐ท๐ท) = ∞ π + ๐๐(๐ฆ๐ฆ& , ๐๐) ∫ ๐๐ &(# where ๐๐& = ๐๐M "# Sโ"# 2๐ฑ๐ฑ&! ๐ท๐ท5T 1 Tweedie (1, 2) Inverse Gaussian 3 Gamma ' ๐๐ 2 The score equations are the partial derivatives of ๐๐(๐ท๐ท) with respect to each ๐ฝ๐ฝ9 all set equal to 0. The solution to the score equations is ๐๐. Then, ๐๐ฬ = โ"#(๐ฑ๐ฑ ! ๐๐). © 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com SRM Formula Sheet 5 Logistic and Probit Regression • The odds of an event are the ratio of the probability that the event will occur to the probability that the event will not occur. • The odds ratio is the ratio of the odds of an event with the presence of a characteristic to the odds of the same event without the presence of that characteristic. Binary Response Function Name Logit Probit Complementary log-log ' โ(๐๐) ln y ๐๐ z 1 − ๐๐ Φ"# (๐๐) ln(− ln(1 − ๐๐)) Nominal Response – Generalized Logit Let ๐๐&,3 be the probability that the ๐๐th observation is classified as category ๐๐. The reference category is ๐๐. ๐๐&,3 ln i j = ๐ฑ๐ฑ&! ๐ท๐ท3 ๐๐&,F ๐๐&,3 exp2๐ฑ๐ฑ&! ๐ท๐ท3 5 โง โช1 + ∑UVF exp2๐ฑ๐ฑ ! ๐ท๐ทU 5 , & = 1 โจ , โช ! โฉ1 + ∑UVF exp2๐ฑ๐ฑ& ๐ท๐ทU 5 ' S ๐๐(๐ท๐ท) = ∞ ∞ ๐ผ๐ผ(๐ฆ๐ฆ& = ๐๐) ln ๐๐&,3 ๐๐ ≠ ๐๐ ๐๐ = ๐๐ &(# 3(# Ordinal Response – Proportional Odds Cumulative โ(Π3 ) = ๐ผ๐ผ3 + ๐ฑ๐ฑ&! ๐ท๐ท where • Π3 = ๐๐# + โฏ + ๐๐3 ๐ฅ๐ฅ&,# ๐ฝ๐ฝ# • ๐ฑ๐ฑ& = — โฎ “ , ๐ท๐ท = o โฎ q ๐ฅ๐ฅ&,$ ๐ฝ๐ฝ$ ๐๐(๐ท๐ท) = ∞[๐ฆ๐ฆ& ln ๐๐& + (1 − ๐ฆ๐ฆ& ) ln(1 − ๐๐& )] &(# ' ๐๐ ๐๐&M ๐๐(๐ท๐ท) = ∞ ๐ฑ๐ฑ& (๐ฆ๐ฆ& − ๐๐& ) = ๐๐ ๐๐๐ท๐ท ๐๐& (1 − ๐๐& ) ' &(# 1 − ๐ฆ๐ฆ& ๐ฆ๐ฆ& ๐ท๐ท = 2 ∞ ๐ฆ๐ฆ& ln y z + (1 − ๐ฆ๐ฆ& ) ln y zÀ ๐๐ฬ & 1 − ๐๐ฬ & &(# Pearson residual, ๐๐& = ๐ฆ๐ฆ& − ๐๐ฬ & ' &(# (๐ฆ๐ฆ& − ๐๐ฬ & )% ๐๐ฬ & (1 − ๐๐ฬ & ) © 2023 Coaching Actuaries. All Rights Reserved ' ๐๐(๐ท๐ท) = ∞[๐ฆ๐ฆ& ln ๐๐& − ๐๐& − ln(๐ฆ๐ฆ& !) ] &(# ' ๐๐ ๐๐(๐ท๐ท) = ∞ ๐ฑ๐ฑ& (๐ฆ๐ฆ& − ๐๐& ) = ๐๐ ๐๐๐ท๐ท &(# ' ๐๐ = ∞ ๐๐& ๐ฑ๐ฑ& ๐ฑ๐ฑ&! &(# ' ๐ฆ๐ฆ& ๐ท๐ท = 2 ∞ ”๐ฆ๐ฆ& ln y z − 1À + ๐๐ฬ & ‘ ๐๐ฬ & &(# Pearson residual, ๐๐& = •๐๐ฬ & ' &(# (๐ฆ๐ฆ& − ๐๐ฬ & )% ๐๐ฬ & Poisson Regression with Exposures Model ln ๐๐ = ln ๐ค๐ค + ๐ฑ๐ฑ ! ๐ท๐ท Alternative Count Models These models can incorporate a Poisson distribution while letting the mean of the response differ from the variance of the response: Models Mean < Variance Mean > Variance Negative binomial Yes No Hurdle Yes Yes Heterogeneity www.coachingactuaries.com ๐ฆ๐ฆ& − ๐๐ฬ & Pearson chi-square statistic = ∞ Zero-inflated •๐๐ฬ & (1 − ๐๐ฬ & ) Pearson chi-square statistic = ∞ Poisson Count Regression ln ๐๐ = ๐ฑ๐ฑ ! ๐ท๐ท Yes Yes No No SRM Formula Sheet 6 TIME SERIESTime Series Trend Models Notation Subscript ๐ก๐ก Index for observations ๐๐W Trends in time ๐๐W Seasonal trends ๐๐W Random patterns ๐ฆ๐ฆ+'8X ๐๐-step ahead forecast ๐ ๐ ๐ ๐ Estimated standard error ๐ก๐ก#">,+? ๐๐ quantile of a ๐ก๐ก-distribution ๐๐# ๐๐% Training sample size Test sample size Trends Additive: ๐๐W = ๐๐W + ๐๐W + ๐๐W Multiplicative: ๐๐W = ๐๐W × ๐๐W + ๐๐W Stationarity Stationarity describes how something does not vary with respect to time. Control charts can be used to identify stationarity. White Noise ๐ฆ๐ฆ+'8X = ๐ฆ๐ฆe ๐ ๐ ๐ ๐ 67#$) = ๐ ๐ 6 •1 + 1⁄๐๐ 100๐๐% prediction interval for ๐ฆ๐ฆ'8X is ๐ฆ๐ฆ+'8X ± ๐ก๐ก(#"F)⁄%,'"# ⋅ ๐ ๐ ๐ ๐ 67#$) Random Walk ๐ค๐คW = ๐ฆ๐ฆW − ๐ฆ๐ฆW"# ๐ฆ๐ฆ+'8X = ๐ฆ๐ฆ' + ๐๐๐ค๐ค ÿ ๐ ๐ ๐ ๐ 67#$) = ๐ ๐ S √๐๐ Approximate 95% prediction interval for ๐ฆ๐ฆ'8X is ๐ฆ๐ฆ+'8X ± 2 ⋅ ๐ ๐ ๐ ๐ 67#$) Model Comparison ME = ' 1 ∞ ๐๐W ๐๐% W('" 8# ' ๐๐W 1 ∞ MPE = 100 ⋅ ๐ฆ๐ฆW ๐๐% ' W('" 8# 1 ∞ ๐๐W% MSE = ๐๐% MAE = W('" 8# ' 1 ∞ |๐๐W | ๐๐% W('" 8# ' ๐๐W 1 ∞ Ÿ Ÿ MAPE = 100 ⋅ ๐ฆ๐ฆW ๐๐% W('" 8# © 2023 Coaching Actuaries. All Rights Reserved Autoregressive Models Notation ๐๐F Lag ๐๐ autocorrelation ๐๐F Lag ๐๐ sample autocorrelation Variance of white noise ๐๐ % % ๐ ๐ Estimate of ๐๐ % ๐๐4 Estimate of ๐ฝ๐ฝ4 ๐๐# Estimate of ๐ฝ๐ฝ# ๐ฆ๐ฆe" Sample mean of first ๐๐ − 1 observations ๐ฆ๐ฆe8 Sample mean of last ๐๐ − 1 observations Autocorrelation ∑'W(F8#(๐ฆ๐ฆW"F − ๐ฆ๐ฆe)(๐ฆ๐ฆW − ๐ฆ๐ฆe) ๐๐F = ∑'W(#(๐ฆ๐ฆW − ๐ฆ๐ฆe)% Testing Autocorrelation test statistic = ๐๐F ⁄๐ ๐ ๐ ๐ C* where ๐ ๐ ๐ ๐ C* = 1⁄√๐๐ ๐ป๐ป4 : ๐๐F = 0 against ๐ป๐ป# : ๐๐F ≠ 0 Reject ๐ป๐ป4 if |test statistic| ≥ ๐ง๐ง#"A⁄% AR(1) Model ๐๐W = ๐ฝ๐ฝ4 + ๐ฝ๐ฝ# ๐๐W"# + ๐๐W Assumptions 1. E[๐๐W ] = 0 2. Var[๐๐W ] = ๐๐ % 3. Cov[๐๐W8F , ๐๐W ] = 0 for ๐๐ > 0 • If ๐ฝ๐ฝ# = 0, ๐๐W follows a white noise process. • If ๐ฝ๐ฝ# = 1, ๐๐W follows a random walk process. • If −1 < ๐ฝ๐ฝ# < 1, ๐๐W is stationary. Properties of Stationary AR(1) Model ๐ฝ๐ฝ4 E[๐๐W ] = 1 − ๐ฝ๐ฝ# ๐๐ % Var[๐๐W ] = 1 − ๐ฝ๐ฝ#% ๐๐F = ๐ฝ๐ฝ#F Estimation ∑'W(%(๐ฆ๐ฆW"# − ๐ฆ๐ฆe" )(๐ฆ๐ฆW − ๐ฆ๐ฆe8 ) ≈ ๐๐# ๐๐# = ∑'W(%(๐ฆ๐ฆW"# − ๐ฆ๐ฆe")% Smoothing and Predictions ๐ฆ๐ฆ+W = ๐๐4 + ๐๐# ๐ฆ๐ฆW"# , 2 ≤ ๐ก๐ก ≤ ๐๐ ๐๐4 + ๐๐# ๐ฆ๐ฆ'8X"# , ๐๐ = 1 ๐ฆ๐ฆ+'8X = ” ๐๐ > 1 ๐๐4 + ๐๐# ๐ฆ๐ฆ+'8X"# , %(X"#) ๐ ๐ ๐ ๐ 67#$) = ๐ ๐ Ñ1 + ๐๐#% + ๐๐#Y + โฏ + ๐๐# 100๐๐% prediction interval for ๐ฆ๐ฆ'8X is ๐ฆ๐ฆ+'8X ± ๐ก๐ก(#"F)⁄%,'"O ⋅ ๐ ๐ ๐ ๐ 67#$) Other Time Series Models Notation ๐๐ Moving average length ๐ค๐ค Smoothing parameter ๐๐ Seasonal base ๐๐ No. of trigonometric functions Smoothing with Moving Averages ๐๐W = ๐ฝ๐ฝ4 + ๐๐W Smoothing ๐ฆ๐ฆW + ๐ฆ๐ฆW"# + โฏ + ๐ฆ๐ฆW"F8# ๐ ๐ ฬW = ๐๐ ๐ฆ๐ฆW − ๐ฆ๐ฆW"F , ๐๐ = 1, 2, … ๐ ๐ ฬW = ๐ ๐ ฬW"# + ๐๐ Predictions ๐๐4 = ๐ ๐ ฬ' ๐ฆ๐ฆ+'8X = ๐๐4 Double Smoothing with Moving Averages ๐๐W = ๐ฝ๐ฝ4 + ๐ฝ๐ฝ# ๐ก๐ก + ๐๐W Smoothing ๐ ๐ ฬW + ๐ ๐ ฬW"# + โฏ + ๐ ๐ ฬW"F8# (%) ๐ ๐ ฬW = ๐๐ ๐ ๐ ฬW − ๐ ๐ ฬW"F (%) (%) , ๐๐ = 1, 2, … ๐ ๐ ฬW = ๐ ๐ ฬW"# + ๐๐ Predictions ๐๐4 = ๐ ๐ ฬ' ๐๐# = (%) 2 £๐ ๐ ฬ' − ๐ ๐ ฬ' § ๐๐ − 1 ๐ฆ๐ฆ+'8X = ๐๐4 + ๐๐# ⋅ ๐๐ ๐๐4 = ๐ฆ๐ฆe8 − ๐๐# ๐ฆ๐ฆe" ≈ ๐ฆ๐ฆe(1 − ๐๐# ) ∑'W(%(๐๐W − ๐๐ฬ )% ๐ ๐ % = ๐๐ − 3 ๐ ๐ % Ö [๐๐W ] = Var 1 − ๐๐#% www.coachingactuaries.com SRM Formula Sheet 7 Exponential Smoothing ๐๐W = ๐ฝ๐ฝ4 + ๐๐W Smoothing ๐ ๐ ฬW = (1 − ๐ค๐ค)(๐ฆ๐ฆW + ๐ค๐ค๐ฆ๐ฆW"# + โฏ + ๐ค๐ค W ๐ฆ๐ฆ4 ) ๐ ๐ ฬW = (1 − ๐ค๐ค)๐ฆ๐ฆW + ๐ค๐ค๐ ๐ ฬW"# , 0 ≤ ๐ค๐ค < 1 The value of ๐ค๐ค is determined by minimizing ๐๐๐๐(๐ค๐ค) = ∑'W(#(๐ฆ๐ฆW − ๐ ๐ ฬW"# )% . Predictions ๐๐4 = ๐ ๐ ฬ' ๐ฆ๐ฆ+'8X = ๐๐4 Smoothing (%) Predictions ๐๐4 = ๐๐# = ๐ฆ๐ฆ+'8X W = (1 − ๐ค๐ค)(๐ ๐ ฬW + ๐ค๐ค๐ ๐ ฬW"# + โฏ + ๐ค๐ค ๐ ๐ ฬ4 ) = (1 − ๐ค๐ค)๐ ๐ ฬW + ๐ค๐ค๐ ๐ ฬW"# , T ๐๐W = ∞S๐ฝ๐ฝ#,& sin(๐๐& ๐ก๐ก) + ๐ฝ๐ฝ%,& cos(๐๐& ๐ก๐ก)T &(# • ๐๐& = 2๐๐๐๐⁄๐๐ • ๐๐ ≤ ๐๐⁄2 Seasonal Autoregressive Models, SAR(p) ๐๐W = ๐ฝ๐ฝ4 + ๐ฝ๐ฝ# ๐๐W"L + โฏ + ๐ฝ๐ฝ$ ๐๐W"$L + ๐๐W Holt-Winter Seasonal Additive Model ๐๐W = ๐ฝ๐ฝ4 + ๐ฝ๐ฝ# ๐ก๐ก + ๐๐W + ๐๐W • ๐๐W = ๐๐W"L Double Exponential Smoothing ๐๐W = ๐ฝ๐ฝ4 + ๐ฝ๐ฝ# ๐ก๐ก + ๐๐W (%) ๐ ๐ ฬW (%) ๐ ๐ ฬW Seasonal Time Series Models Fixed Seasonal Effects – Trigonometric Functions L • ∑W(# ๐๐W = 0 0 ≤ ๐ค๐ค < 1 Unit Root Test • A unit root test is used to evaluate the fit of a random walk model. • A random walk model is a good fit if the time series possesses a unit root. • The Dickey-Fuller test and augmented Dickey-Fuller test are two examples of unit root tests. Volatility Models ๐ด๐ด๐ด๐ด๐ด๐ด๐ด๐ด(๐๐) Model % % + โฏ + ๐พ๐พ$ ๐๐W"$ ๐๐W% = ๐๐ + ๐พ๐พ# ๐๐W"# ๐บ๐บ๐บ๐บ๐บ๐บ๐บ๐บ๐บ๐บ(๐๐, ๐๐) Model % % + โฏ + ๐พ๐พ$ ๐๐W"$ + ๐๐W% = ๐๐ + ๐พ๐พ# ๐๐W"# % % ๐ฟ๐ฟ# ๐๐W"# + โฏ + ๐ฟ๐ฟ> ๐๐W"> ๐๐ Var[๐๐W ] = 1 − ∑$9(# ๐พ๐พ9 − ∑>9(# ๐ฟ๐ฟ9 Assumptions • ๐๐ > 0 • ๐พ๐พ9 ≥ 0 (%) 2๐ ๐ ฬ' − ๐ ๐ ฬ' 1 − ๐ค๐ค (%) £๐ ๐ ฬ' − ๐ ๐ ฬ' § ๐ค๐ค = ๐๐4 + ๐๐# ⋅ ๐๐ • ๐ฟ๐ฟ9 ≥ 0 • ∑$9(# ๐พ๐พ9 + ∑>9(# ๐ฟ๐ฟ9 < 1 Key Ideas for Smoothing • It is only appropriate for time series data without a linear trend. • It is related to weighted least squares. • A double smoothing procedure can be used to forecast time series data with a linear trend. • Holt-Winter double exponential smoothing is a generalization of the double exponential smoothing. © 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com SRM Formula Sheet 8 DECISION TREES Decision Trees Regression and Classification Trees Notation ๐ ๐ Region of predictor space ๐๐U No. of observations in node ๐๐ ๐๐U,3 No. of category ๐๐ observations in node ๐๐ ๐ผ๐ผ Impurity ๐ธ๐ธ Classification error rate ๐บ๐บ Gini index ๐ท๐ท Cross entropy ๐๐ Subtree |๐๐| No. of terminal nodes in ๐๐ ๐๐ Tuning parameter Algorithm 1. Construct a large tree with ๐๐ terminal nodes using recursive binary splitting. 2. Obtain a sequence of best subtrees, as a function of ๐๐, using cost complexity pruning. 3. Choose ๐๐ by applying ๐๐-fold cross validation. Select the ๐๐ that results in the lowest cross-validation error. 4. The best subtree is the subtree created in step 2 with the selected ๐๐ value. Recursive Binary Splitting Regression: L % Minimize ∞ ∞ 2๐ฆ๐ฆ& − ๐ฆ๐ฆeZ+ 5 U(# &:๐ฑ๐ฑ & ∈Z+ Classification: L 1 Minimize ∞ ๐๐U ⋅ ๐ผ๐ผU ๐๐ U(# More Under Classification: ๐๐ฬU,3 = ๐๐U,3 ⁄๐๐U ๐ธ๐ธU = 1 − max ๐๐ฬU,3 3 ๐บ๐บU = ∑S 3(# ๐๐ฬU,3 21 − ๐๐ฬ U,3 5 ๐ท๐ทU = − ∑S 3(# ๐๐ฬU,3 ln ๐๐ฬ U,3 deviance = −2 ∑LU(# ∑S 3(# ๐๐U,3 ln ๐๐ฬ U,3 deviance residual mean deviance = ๐๐ − ๐๐ Cost Complexity Pruning Regression: |!| % Minimize ∞ ∞ 2๐ฆ๐ฆ& − ๐ฆ๐ฆeZ+ 5 + ๐๐|๐๐| U(# &:๐ฑ๐ฑ & ∈Z+ Classification: |!| 1 Minimize ∞ ๐๐U ⋅ ๐ผ๐ผU + ๐๐|๐๐| ๐๐ Key Ideas U(# • Terminal nodes or leaves represent the partitions of the predictor space. • Internal nodes are points along the tree where splits occur. • Terminal nodes do not have child nodes, but internal nodes do. • Branches are lines that connect any two nodes. • A decision tree with only one internal node is called a stump. Advantages of Trees • Easy to interpret and explain • Can be presented visually • Manage categorical variables without the need of dummy variables • Mimic human decision-making Disadvantages of Trees • Not robust • Do not have the same degree of predictive accuracy as other statistical methods Multiple Trees Bagging 1. Create ๐๐ bootstrap samples from the original training dataset. 2. Construct a decision tree for each bootstrap sample using recursive binary splitting. 3. Predict the response of a new observation by averaging the predictions (regression trees) or by using the most frequent category (classification trees) across all ๐๐ trees. Random Forests 1. Create ๐๐ bootstrap samples from the original training dataset. 2. Construct a decision tree for each bootstrap sample using recursive binary splitting. At each split, a random subset of ๐๐ variables are considered. 3. Predict the response of a new observation by averaging the predictions (regression trees) or by using the most frequent category (classification trees) across all ๐๐ trees. Properties • Bagging is a special case of random forests. • Increasing ๐๐ does not cause overfitting. • Decreasing ๐๐ reduces the correlation between predictions. Boosting Let ๐ง๐ง# be the actual response variable, ๐ฆ๐ฆ. 1. For ๐๐ = 1, 2, … , ๐๐: • Use recursive binary splitting to fit a tree with ๐๐ splits to the data with ๐ง๐งF as the response. • Update ๐ง๐งF by subtracting ๐๐ ⋅ ๐๐.F (๐ฑ๐ฑ), i.e. let ๐ง๐งF8# = ๐ง๐งF − ๐๐ ⋅ ๐๐.F (๐ฑ๐ฑ). 2. Calculate the boosted model prediction as ๐๐.(๐ฑ๐ฑ) = ∑5F(# ๐๐ ⋅ ๐๐.F (๐ฑ๐ฑ). Properties • Increasing ๐๐ can cause overfitting. • Boosting reduces bias. • ๐๐ controls complexity of the boosted model. • ๐๐ controls the rate at which boosting learns. Properties • Increasing ๐๐ does not cause overfitting. • Bagging reduces variance. • Out-of-bag error is a valid estimate of test error. © 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com SRM Formula Sheet 9 UNSUPERVISED LEARNING Unsupervised Learning Principal Components Analysis Notation ๐ง๐ง, ๐๐ Principal component (score) Subscript ๐๐ Index for principal components ๐๐ Principal component loading ๐ฅ๐ฅ, ๐๐ Centered explanatory variable Principal Components $ ๐ง๐งU = ∞ ๐๐9,U ๐ฅ๐ฅ9 , $ ๐ง๐ง&,U = ∞ ๐๐9,U ๐ฅ๐ฅ&,9 9(# 9(# % • ∑$9(# ๐๐9,U =1 $ • ∑9(# ๐๐9,U ⋅ ๐๐9,H = 0, ๐๐ ≠ ๐ข๐ข Proportion of Variance Explained (PVE) $ $ 9(# 9(# ' 1 % ∞ ๐ ๐ K%% = ∞ ∞ ๐ฅ๐ฅ&,9 ๐๐ − 1 ๐ ๐ _%+ ' &(# 1 % = ∞ ๐ง๐ง&,U ๐๐ − 1 PVE = &(# Cluster Analysis Notation ๐ถ๐ถ Cluster containing indices ๐๐(๐ถ๐ถ) Within-cluster variation of cluster |๐ถ๐ถ| No. of observations in cluster $ % Euclidean Distance = Ñ∑9(#2๐ฅ๐ฅ&,9 − ๐ฅ๐ฅU,9 5 ๐๐-Means Clustering 1. Randomly assign a cluster to each observation. This serves as the initial cluster assignments. 2. Calculate the centroid of each cluster. 3. For each observation, identify the closest centroid and reassign to that cluster. 4. Repeat steps 2 and 3 until the cluster assignments stop changing. $ 1 % ๐๐(๐ถ๐ถH ) = ∞ ∞2๐ฅ๐ฅ&,9 − ๐ฅ๐ฅU,9 5 |๐ถ๐ถH | &,U∈`, 9(# $ % = 2 ∞ ∞2๐ฅ๐ฅ&,9 − ๐ฅ๐ฅฬ H,9 5 &∈`, 9(# ๐ ๐ _%+ ∑$9(# ๐ ๐ K%% Key Ideas • The variance explained by each subsequent principal component is always less than the variance explained by the previous principal component. • All principal components are uncorrelated with one another. • A dataset has min(๐๐ − 1, ๐๐) distinct principal components. • The first ๐๐ principal component scores and loadings approximate the original dataset, ๐ฅ๐ฅ&,9 ≈ ∑FU(# ๐ง๐ง&,U ๐๐9,U . Hierarchical Clustering 1. Select the dissimilarity measure and linkage to be used. Treat each observation as its own cluster. 2. For ๐๐ = ๐๐, ๐๐ − 1, … , 2: • Compute the inter-cluster dissimilarity between all ๐๐ clusters. • Examine all 2F%5 pairwise dissimilarities. The two clusters with the lowest inter-cluster dissimilarity are fused. The dissimilarity indicates the height in the dendrogram at which these two clusters join. Linkage Inter-cluster dissimilarity Complete The largest dissimilarity Average The arithmetic mean Single Centroid The smallest dissimilarity The dissimilarity between the cluster centroids Key Ideas • For ๐๐-means clustering, the algorithm needs to be repeated for each ๐๐. • For hierarchical clustering, the algorithm only needs to be performed once for any number of clusters. • The result of clustering depends on many parameters, such as: o Choice of ๐๐ in ๐๐-means clustering o Choice of number of clusters, linkage, and dissimilarity measure in hierarchical clustering o Choice to standardize variables Principal Components Regression ๐๐ = ๐๐4 + ๐๐# ๐ง๐ง# + โฏ + ๐๐F ๐ง๐งF + ๐๐ • If ๐๐ = ๐๐, then ๐ฝ๐ฝ9 = ∑FU(# ๐๐U ๐๐9,U . © 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com SRM Formula Sheet 10