lOMoARcPSD|9177240 Stat2008 Cheat sheet - Summary Regression Modelling Regression Modelling (Australian National University) Studocu is not sponsored or endorsed by any college or university Downloaded by Joshua Palframan (joshuapalframan@gmail.com) lOMoARcPSD|9177240 ππ = π − 1 (degrees of freedom) this many variables have the freedom Mallow’s Cp based on the idea that the model is mis-specified or over Abs.val of DFFITS check for influential points. Points over 2√π/π if fitted then variance of error terms is inflated. Poor model, MCP is large. data is large, or 1 is small – rought guide. = COOKS Don’t need both to vary if the mean remains the same. 2 2 πππΈ Residual SE = √ () k =length(model$coefficients)-1 (to ignore πΆπ = π + (π−π)(π π −πΜ ) . πΜ 2 = error variance of true model - So just use DFBETA – assess the change in coefficients due to observation I when π−(1+π) Μ2 π intercept) MSE from full model P = number of regressors in model. Prefer models that observation is removed from the model fit. π·πΉπ΅πΈππ΄π(π) = ππ −ππ(π) . πππ is diagonal element of (X’X)-1 matrix. π£ππ(π½Μ ) = where πΆ = π (but if use full model this will always be the case, so use Stat2008 Cheat sheet. Predictor – x-axis – independent Response – y-axis – dependent Homoscedasticity: error term is same across all independent variables. πππ π 2 = π 2 = (Coefficient of Determination OR Multiple R squared) r Constant VAR. Heteroscedasticity – increasing var πππ If p-value is less than πΌ reject the null. = correlation. Measures how close the data is the fitted regression line. Explained variation over total variation. 0 means the model explains none of the variability of the response data around its mean - 100 means all. Doesn’t comment on the model fit or identify bias – need residual plots. Think Anscombe’s Quartet. Adjusted R squared measures the proportion of variation explained by only those independent variables that really help in explaining the dependent. πππ = ∑(π¦π − π¦Μ )2 (total variability in Y) (1−π 2 )(π−1) 2 π ππππ’π π‘ππ =1− ::::: k = number of predictors πππΈ = ∑(π¦ − π¦Μ)2 (Unexplained variability) π πππ = ∑(π¦Μ − π¦Μ )2 = b12∑(ππ − πΜ )2 (Explained Variability) SST = SSE + SSR and in MLR SST = SSv1+SSv2 + .. + SSE Cobram’s Theorem πππΈ = πππ = πππΈ π−2 πππ ππ = = πππΈ ππ πππ 1 (Mean square error) = πππ (regression mean square) πΈ{πππΈ} = π 2 (Expected Mean square) πΈ{πππ } = π 2 + β12 ∑(ππ − πΜ )2 (Expected regression mean square) πΉ∗ = πππ (F Statistic) . Large πΉ∗ support π»π . Compute the critical f- π1 (t test). Large π‘ ∗ support π»π . Two tailed π»0 : π½1 = 0 π»π : π½1 ≠ πππΈ statistic. At the πΌ level of significance, and df numerator =1, df denominator = df. Don’t use f-test for hypothesis that aren’t =. Use ttest for <>. π‘∗ = ππΈ(π1 ) 0 else one. Types of T-test. π»0 : π½1 = π½10 π»π : π½1 ≠ π½10 ∴ π‘ ∗ = πΌ Confidence intervals: π½Μ1 ± π‘π−2 ( ) ππΈ(π½Μ1 ) 2 Prediction intervals: π1 −π½10 ππΈ(π1 ) π−π−1 Difference between π 2 and adjusted: More independent variables will increase π 2 no matter what, but adjusted R squared will increase only if the independent variable is significant. π ππΈ = √π Diagnostics Partial regression or added variable plots help isolate the effect on y of the different predictor variables (π₯π ). Check If an issue with nonlinearity and if transformation to predictors will help. Added variable plots regress y on all x except π₯π (π₯π ) and calculates the residuals(πΏ). Represents y with the other effects taken out. Regress π₯π on all x and calculate the residuals (πΎ). Plot πΏ and πΎ. y/xj vs xi/xj. Any nonlinear in the pattern suggests that that variable has a non-linear relation with y. Partial Regression plot better than avp for detecting non-linearity. A plot of response with the predicted effects of all x other than π₯π taken out – plotted against π₯π . Use termplot Transformation applied to response variable y-axis. Can use BOX-COX transformation. Only for positive y. then transformation π(π¦) = { π¦ π−1 π Μ0 ) ππΈ(π½ ( Standardised residuals have the same variance = 1. ππ = πΜπ Μ √1−βππ σ where πΜπ is the normal residual. σ Μ = √πππΈ, std.residual. βππ = leverage of ith observati. Gives a standard axis on Q-Q plot, most values should be in ±2 – anything outside of this could be outlier or influential. Outliers recap π ≠ 0} ππ{log(π¦) π = 0}. Just take π¦ π as the value. Round π to 1 1−π 12 . To get π π2 do regression on Studentised residuals. Predict y when case i is left out of the model. Μ 0.5 (π−1)−π π‘ππ=(π−1)−π ππ is the normal residual. Large π‘π = ππ ( 2) π₯π with relation to all others. High VIF (smallest is 1) investigate for Summary Table multicollinearity! Looks at all predictors, not just one (like correlation) πππ₯π¦ = ∑(ππ − πΜ )(ππ − πΜ ) Model selection 2 Μ πππ₯π₯ = ∑(ππ − π) Favour models with smallest error. Lowest π ππΈ = πΜ. But there are πππ₯π¦ πππ¦ π½Μ1 = = (Slope Coeff) r = correlation Sx = √covariance Sy other ways. Can do f-tests to compare if πΜ drops across models πππ₯π₯ ππ₯ (depends on scale). But can use π 2 as it is standardised. But – how big =√covariance. LOOK AT T VAL CAN CALC should it be? Does not protect from overfitting (too close to observed π½Μ0 = π¦Μ − π½Μ1 π₯Μ (Intercept Coeff) π₯Μ = mean of predictor, π¦Μ = mean of data, making predictions useless) & every additional x increases R2. response Table of best predictors – best subsets. Plots for SE and adjusted R Μ Μ π π = (Standard Error) πΜ = Residual SE, π = ππΈ(π½Μ1 ) = √(π−1)ππ₯2 √πππ₯π₯ squared for number of predictors. Other criterion: PRESSP. predicted πΜ number of samples, ππ₯2 = Sx = covariance sum of squares residuals are πΜ(π) = ππ − πΜ(π) = π (find influential 1−βππ Μ −0 π½ π‘πππ (π½1 ) = 1 Μ ( t-value = estimate/Std.Error) points). Press is sum of these for each residual. Smaller the press ππΈ(π½1 ) Μ0 −0 π½ √πππΈ(π) πππ make easy to interpret. Use boxcox (untransformed) don’t need to know math. Just find π that maximises the output plot (πΜ) – sometimes fit is bad, then don’t use it. Multicollinearity: in exploratory data analysis a thing seems positive, but summary table shows negative. This is multicolin. It relates to the degree of interrelation among the covariates values. Predictors are correlated. In cor(data) if any off-diagonals are large then they are highly correlated and thus result in multicollinearity. Remove a highly correlated variable to fix. Inflation of variance is due to correlation. Variance inflation factor VIF ππΌπΉπ = π‘πππ (π½0 ) = π small model with πΆπ as close to p as possible). π 2 (π ′ π)−1 = π 2 πππ . Plot DFBETAS and large points are influential. A AIC Akaike Inform Criteria. (same MCp for linear modes). π΄πΌπΆ = guide is values over 2/√π for large data or 1 otherwise. 1 −2 log(πΏ(πΜ )) + 2π = 2 (πππΈ + 2ππΜ 2 ) . Lowest AIC of various Once a variables is removed you must check all tests again. Μ ππ Interaction terms COME BACK TO THIS models is best. Pick it. BIC Bayesian Inform Criteria. π΅πΌπΆ = −2 log(πΏ(πΜ )) + ππππ(π) = How good a fit is our MLR. In multi 1 πππ πππ π’π πππ πππ£πππ ππ¦ πππππππ ππ πππππππ (πππΈ + log(π)ππΜ 2) BIC always makes a smaller model because it = πΉπππ = π ππππππππ π πππ πππΈ penalises larger models more strictly than AIC. Lowest BIC is best. (πΌπ‘π πππππππ ππ πππππππ)π − π Model Refinement Forward selection: Start at base model (null model). Choose the best If focus on a particular predicture have it last in the anova table predictor to add to the model – repeat until you reach the optimal model (as per r-squared, AIC, BIC). Backward Selection Start with the “full” model including all covariates. Re-order, select variables with smallest sequential F-stat or largest p-valu (doesn’t account for scientific importance). If not statistically significant, remove it. Repeat. Outliers and Influential Points Residuals are difference between actual and the fitted response item. Residuals vs fitted plots are overleaf. For MLR plot covariates vs resid – look for clear interest points, transform the variables to Reduce skew. statistic the better. good model > residuals small > SSE small > πΜ(π) small. π−π−ππ studentised residuals (rcode is rstudent) mean the point is potentially influential. Do this at each data point then need n tests. Bonferroni correction πΌ ∗ = πΌ/(π ∗ 2). Leverages over 2p/n are potentially highly influential. Half norm plots make it easier to see those highest values. Cook’s distance looks at difference in residuals between a fitted value where all observations are used, and where one is removed. Large cooks distances are potentially influential, the cut off line comes from the F-distribution – a rough guide. DFFITS – change in fitted value for observation when that observation is removed from the model fit. π·πΉπΉπΌππ = Downloaded by Joshua Palframan (joshuapalframan@gmail.com) π¦Μπ −π¦Μπ(π) √πππΈ(π) βππ = π‘π ( βππ 1−βππ 0.5 ) lOMoARcPSD|9177240 Residuals vs fitted – check homoscedasticity (constant variance). SSresiduals = (n−p)MSresiduals = (27)∗(3108.062) = 83917.67 (THE ROW ABOVE RESIDUALS Has same p val as sum table and f-value is special. See below) Pr(> F)Shots = 2.44e−05 F −valueShots = t^2 = 5.083^2 = 25.83689 MSShots = F −valueShots ∗MSresiduals = 25.83689∗3108.062 = 80302.66 DFShots = 1 SSShots = MSShots ∗DFShots = 80302.66 MSaddition = F −valueaddition ∗MSresiduals = 100.9∗3108.062 = 313603.5 SSaddition = DFaddition ∗MSaddition = 2∗313603.5 = 627207 SSPasses = SSaddition −SSShots = 627207−80302.66 = 546904.3 DFPasses = 1 MSPasses = SSPasses/DFPasses = 546904.3 F −valuepasses = MSPasses/MSresiduals = 546904.3/3108.062 = 175.96 Regression SS = (Overall F × MSE) × Regression df QQ plots – check for normality Shapiro Wilk Test – test of normality. Use only with Q-Q plots. Because it detects mild non-normality in lrg samples.. Null hypothesis ‘the residuals are normally distributed’. Gives test stat and p-value. b) Are Attend and Employ significant. Prediction interval example: 95% CI for Beta1 Exam Q’s: a) Calc values in ANOVA table MSresiduals = (residual standard error)^2 = 55.75^2 = 3108.062 DFresiduals = n−p = 30−3 = 27 Random Help Downloaded by Joshua Palframan (joshuapalframan@gmail.com)