STAT 3A03 Applied Regression With SAS
Checking your understanding
Last updated: Nov 13, 2019
Q. 1 Explain why residual plots are used to check the linear regression assumptions on the random errors.
Q. 2 What information could be gained about the assumptions on by plotting the residuals versus
a) Fitted values
b) Covariates
c) Time
Give brief examples.
Q. 3 Explain the difference between outliers, leverage and influential points.
Q. 4 Why do we remove multiple observations and refit the model to check for influence after using plots
such as Cook’s Distance, DFFITS, and DFBETAS?
Q. 5 What are some reasons you might choose to remove an outlier and/or influential point from your
analysis?
Q. 6 Consider a categorical variable x with three levels labelled 1, 2, and 3. From class we know we can
include this variable as a covariate in a linear regression by using two dummy variables labelled
(
1 if ith observation falls in category j,
uij =
0 otherwise,
leading to the linear model
yi = β0 + β1 ui1 + β2 ui2 + i ,
for i = 1, . . . , n.
Rather than using dummy variables, why don’t we include the categorical variable directly as x?
I.e., by
yi = β0 + β1 xi + i , for i = 1, . . . , n.
Q. 7 When might we choose to use Weighted Least Squares regression?
Q. 8 Why might we choose to perform variable selection?
Q. 9 Why might you consider it unnecessary to plot residuals versus fitted values and the covariate in a
simple linear regression?
Q. 10 Explain extrapolation and why it is not usually a good idea.
1