lOMoARcPSD|9177240
Stat2008 Cheat sheet - Summary Regression Modelling
Regression Modelling (Australian National University)
Studocu is not sponsored or endorsed by any college or university
Downloaded by Joshua Palframan (joshuapalframan@gmail.com)
lOMoARcPSD|9177240
ππ = π − 1 (degrees of freedom) this many variables have the freedom Mallow’s Cp based on the idea that the model is mis-specified or over Abs.val of DFFITS check for influential points. Points over 2√π/π if
fitted then variance of error terms is inflated. Poor model, MCP is large. data is large, or 1 is small – rought guide. = COOKS Don’t need both
to vary if the mean remains the same.
2
2
πππΈ
Residual SE = √
() k =length(model$coefficients)-1 (to ignore πΆπ = π + (π−π)(π π −πΜ ) . πΜ 2 = error variance of true model - So just use DFBETA – assess the change in coefficients due to observation I when
π−(1+π)
Μ2
π
intercept)
MSE from full model P = number of regressors in model. Prefer models that observation is removed from the model fit. π·πΉπ΅πΈππ΄π(π) =
ππ −ππ(π)
. πππ is diagonal element of (X’X)-1 matrix. π£ππ(π½Μ ) =
where πΆ = π (but if use full model this will always be the case, so use
Stat2008 Cheat sheet.
Predictor – x-axis – independent
Response – y-axis – dependent
Homoscedasticity: error term is same across all independent variables.
πππ
π
2 = π 2 =
(Coefficient of Determination OR Multiple R squared) r
Constant VAR. Heteroscedasticity – increasing var
πππ
If p-value is less than πΌ reject the null.
= correlation. Measures how close the data is the fitted regression line.
Explained variation over total variation. 0 means the model explains
none of the variability of the response data around its mean - 100
means all. Doesn’t comment on the model fit or identify bias – need
residual plots. Think Anscombe’s Quartet.
Adjusted R squared measures the proportion of variation explained by
only those independent variables that really help in explaining the
dependent.
πππ = ∑(π¦π − π¦Μ
)2 (total variability in Y)
(1−π
2 )(π−1)
2
π
ππππ’π π‘ππ
=1−
::::: k = number of predictors
πππΈ = ∑(π¦ − π¦Μ)2 (Unexplained variability)
π
πππ
= ∑(π¦Μ − π¦Μ
)2 = b12∑(ππ − πΜ
)2 (Explained Variability)
SST = SSE + SSR and in MLR SST = SSv1+SSv2 + .. + SSE
Cobram’s Theorem
πππΈ =
πππ
=
πππΈ
π−2
πππ
ππ
=
=
πππΈ
ππ
πππ
1
(Mean square error)
= πππ
(regression mean square)
πΈ{πππΈ} = π 2 (Expected Mean square)
πΈ{πππ
} = π 2 + β12 ∑(ππ − πΜ
)2 (Expected regression mean square)
πΉ∗ =
πππ
(F Statistic) . Large πΉ∗ support π»π . Compute the critical f-
π1
(t test). Large π‘ ∗ support π»π . Two tailed π»0 : π½1 = 0 π»π : π½1 ≠
πππΈ
statistic. At the πΌ level of significance, and df numerator =1, df
denominator = df. Don’t use f-test for hypothesis that aren’t =. Use ttest for <>.
π‘∗ =
ππΈ(π1 )
0 else one.
Types of T-test. π»0 : π½1 = π½10 π»π : π½1 ≠ π½10 ∴ π‘ ∗ =
πΌ
Confidence intervals: π½Μ1 ± π‘π−2 ( ) ππΈ(π½Μ1 )
2
Prediction intervals:
π1 −π½10
ππΈ(π1 )
π−π−1
Difference between π
2 and adjusted: More independent variables will
increase π
2 no matter what, but adjusted R squared will increase only
if the independent variable is significant.
π
ππΈ =
√π
Diagnostics
Partial regression or added variable plots help isolate the effect on y of
the different predictor variables (π₯π ). Check If an issue with nonlinearity and if transformation to predictors will help. Added variable
plots regress y on all x except π₯π (π₯π ) and calculates the residuals(πΏ).
Represents y with the other effects taken out. Regress π₯π on all x and
calculate the residuals (πΎ). Plot πΏ and πΎ. y/xj vs xi/xj. Any nonlinear in
the pattern suggests that that variable has a non-linear relation with y.
Partial Regression plot better than avp for detecting non-linearity. A
plot of response with the predicted effects of all x other than π₯π taken
out – plotted against π₯π . Use termplot
Transformation applied to response variable y-axis. Can use BOX-COX
transformation. Only for positive y. then transformation π(π¦) =
{
π¦ π−1
π
Μ0 )
ππΈ(π½
(
Standardised residuals have the same variance = 1. ππ =
πΜπ
Μ √1−βππ
σ
where
πΜπ is the normal residual. σ
Μ = √πππΈ, std.residual. βππ = leverage of ith
observati. Gives a standard axis on Q-Q plot, most values should be in
±2 – anything outside of this could be outlier or influential.
Outliers recap
π ≠ 0} ππ{log(π¦) π = 0}. Just take π¦ π as the value. Round π to
1
1−π
12
. To get π
π2 do regression on Studentised residuals. Predict y when case i is left out of the model.
Μ 0.5
(π−1)−π
π‘ππ=(π−1)−π ππ is the normal residual. Large
π‘π = ππ (
2)
π₯π with relation to all others. High VIF (smallest is 1) investigate for
Summary Table
multicollinearity! Looks at all predictors, not just one (like correlation)
πππ₯π¦ = ∑(ππ − πΜ
)(ππ − πΜ
)
Model selection
2
Μ
πππ₯π₯ = ∑(ππ − π)
Favour models with smallest error. Lowest π
ππΈ = πΜ. But there are
πππ₯π¦
πππ¦
π½Μ1 =
=
(Slope Coeff) r = correlation Sx = √covariance Sy other ways. Can do f-tests to compare if πΜ drops across models
πππ₯π₯
ππ₯
(depends on scale). But can use π
2 as it is standardised. But – how big
=√covariance. LOOK AT T VAL CAN CALC
should
it be? Does not protect from overfitting (too close to observed
π½Μ0 = π¦Μ
− π½Μ1 π₯Μ
(Intercept Coeff) π₯Μ
= mean of predictor, π¦Μ
= mean of
data, making predictions useless) & every additional x increases R2.
response
Table of best predictors – best subsets. Plots for SE and adjusted R
Μ
Μ
π
π
=
(Standard Error) πΜ = Residual SE, π =
ππΈ(π½Μ1 ) =
√(π−1)ππ₯2
√πππ₯π₯
squared for number of predictors. Other criterion: PRESSP. predicted
πΜ
number of samples, ππ₯2 = Sx = covariance
sum of squares residuals are πΜ(π) = ππ − πΜ(π) = π (find influential
1−βππ
Μ −0
π½
π‘πππ (π½1 ) = 1 Μ ( t-value = estimate/Std.Error)
points). Press is sum of these for each residual. Smaller the press
ππΈ(π½1 )
Μ0 −0
π½
√πππΈ(π) πππ
make easy to interpret. Use boxcox (untransformed) don’t need to
know math. Just find π that maximises the output plot (πΜ) – sometimes
fit is bad, then don’t use it.
Multicollinearity: in exploratory data analysis a thing seems positive,
but summary table shows negative. This is multicolin. It relates to the
degree of interrelation among the covariates values. Predictors are
correlated. In cor(data) if any off-diagonals are large then they are
highly correlated and thus result in multicollinearity. Remove a highly
correlated variable to fix. Inflation of variance is due to correlation.
Variance inflation factor VIF ππΌπΉπ =
π‘πππ (π½0 ) =
π
small model with πΆπ as close to p as possible).
π 2 (π ′ π)−1 = π 2 πππ . Plot DFBETAS and large points are influential. A
AIC Akaike Inform Criteria. (same MCp for linear modes). π΄πΌπΆ = guide is values over 2/√π for large data or 1 otherwise.
1
−2 log(πΏ(πΜ )) + 2π = 2 (πππΈ + 2ππΜ 2 ) . Lowest AIC of various Once a variables is removed you must check all tests again.
Μ
ππ
Interaction terms COME BACK TO THIS
models is best. Pick it.
BIC Bayesian Inform Criteria. π΅πΌπΆ = −2 log(πΏ(πΜ )) + ππππ(π) = How good a fit is our MLR. In multi
1
πππ
πππ π’π πππ
πππ£πππ ππ¦ πππππππ ππ πππππππ
(πππΈ + log(π)ππΜ 2) BIC always makes a smaller model because it
=
πΉπππ =
π
ππππππππ π πππ
πππΈ
penalises larger models more strictly than AIC. Lowest BIC is best.
(πΌπ‘π πππππππ ππ πππππππ)π − π
Model Refinement
Forward selection: Start at base model (null model). Choose the best
If focus on a particular predicture have it last in the anova table
predictor to add to the model – repeat until you reach the optimal
model (as per r-squared, AIC, BIC). Backward Selection Start with the
“full” model including all covariates. Re-order, select variables with
smallest sequential F-stat or largest p-valu (doesn’t account for
scientific importance). If not statistically significant, remove it. Repeat.
Outliers and Influential Points
Residuals are difference between actual and the fitted response item.
Residuals vs fitted plots are overleaf. For MLR plot covariates vs resid –
look for clear interest points, transform the variables to Reduce skew.
statistic the better. good model > residuals small > SSE small > πΜ(π) small.
π−π−ππ
studentised residuals (rcode is rstudent) mean the point is potentially
influential. Do this at each data point then need n tests. Bonferroni
correction πΌ ∗ = πΌ/(π ∗ 2).
Leverages over 2p/n are potentially highly influential. Half norm plots
make it easier to see those highest values.
Cook’s distance looks at difference in residuals between a fitted value
where all observations are used, and where one is removed. Large
cooks distances are potentially influential, the cut off line comes from
the F-distribution – a rough guide.
DFFITS – change in fitted value for observation when that observation
is removed from the model fit. π·πΉπΉπΌππ =
Downloaded by Joshua Palframan (joshuapalframan@gmail.com)
π¦Μπ −π¦Μπ(π)
√πππΈ(π) βππ
= π‘π (
βππ
1−βππ
0.5
)
lOMoARcPSD|9177240
Residuals vs fitted – check homoscedasticity (constant variance).
SSresiduals = (n−p)MSresiduals = (27)∗(3108.062) = 83917.67
(THE ROW ABOVE RESIDUALS Has same p val as sum table and f-value
is special. See below)
Pr(> F)Shots = 2.44e−05
F −valueShots = t^2 = 5.083^2 = 25.83689
MSShots = F −valueShots ∗MSresiduals = 25.83689∗3108.062 =
80302.66
DFShots = 1
SSShots = MSShots ∗DFShots = 80302.66
MSaddition = F −valueaddition ∗MSresiduals = 100.9∗3108.062 =
313603.5
SSaddition = DFaddition ∗MSaddition = 2∗313603.5 = 627207
SSPasses = SSaddition −SSShots = 627207−80302.66 = 546904.3
DFPasses = 1
MSPasses = SSPasses/DFPasses = 546904.3
F −valuepasses = MSPasses/MSresiduals = 546904.3/3108.062 =
175.96
Regression SS = (Overall F × MSE) × Regression df
QQ plots – check for normality
Shapiro Wilk Test – test of normality. Use only with Q-Q plots. Because
it detects mild non-normality in lrg samples.. Null hypothesis ‘the
residuals are normally distributed’. Gives test stat and p-value.
b) Are Attend and Employ significant.
Prediction interval example:
95% CI for Beta1
Exam Q’s: a) Calc values in ANOVA table
MSresiduals = (residual standard error)^2 = 55.75^2 = 3108.062
DFresiduals = n−p = 30−3 = 27
Random Help
Downloaded by Joshua Palframan (joshuapalframan@gmail.com)