Y
y | x
• Y represents a value of the response variable.
•
y | x represents the population mean response for a given value of the explanatory variable, x .
•
represents the random error
1
Linear Regression Model
Y
y | x
0
1 x
0
1
The Y-intercept parameter.
The slope parameter.
2
(Observed Y – Fitted Y)
Fit
Y
ˆ ˆ y | x
ˆ
0
ˆ
1 x
Residual
ˆ
Y
Y
ˆ
Y
ˆ
0
ˆ
1 x
3
The relationship is linear.
Independent
Identically distributed
Normally distributed with
4
Residual vs. Explanatory
0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3
300 350
CO2
400
5
Random scatter around the zero line indicates that the linear model is as good as we can do with these data.
6
Over/Under/Over or
Under/Over/Under
The linear model may not be adequate. We could do better by accounting for curvature with a different model.
7
Two, or more, groups
May require separate regression models for each group.
8
Independence.
Hard to check this but the fact that we obtained the data through a random sample of years assures us that the statistical methods should work.
9
Identically distributed.
Check using an outlier box plot.
Unusual points may come from a different distribution
Check using a histogram. Bimodal shape could indicate two different distributions.
10
Normally distributed.
Check with a histogram.
Symmetric and mounded in the middle.
Check with a normal quantile plot. Points falling close to a diagonal line.
11
Distributions
Residuals Temp
3
.99
.95
.90
.75
.50
.25
.10
.05
.01
2
1
0
-1
-2
-3
-0.2
-0.1
0 0.1
0.2
0.3
10
8
6
4
2
12
Histogram is skewed right and mounded to the left of zero.
Box plot is skewed right with no unusual points.
Normal quantile plot has points that do not follow the diagonal, normal model, line very well.
13
Constant variance.
Check the plot of residuals versus the explanatory variable values. Points should show the same spread for all values of the explanatory variable.
14
Residual vs. Explanatory
0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3
300 350
CO2
400
15
Points show about the same amount of spread for all values of the explanatory variable.
16
The independence, identically distributed and common variance conditions appear to be satisfied.
The normal distribution condition may not be met for these data.
17
The P-values for tests may not be correct.
The stated confidence level may not give the true coverage rate.
18