Uploaded by Kolade Arisekola

R01-Regression choosing explanatory variables

advertisement
R01a - Simple linear regression:
Choosing explanatory variables
STAT 587 (Engineering)
Iowa State University
March 30, 2021
(STAT587@ISU)
R01a - Simple linear regression:
March 30, 2021
1 / 11
Choosing explanatory variables
Simple linear regression
Let
ind
Yi ∼ N (β0 + β1 f (Xi ), σ 2 ).
Possible choices for f :
quadratic: f (x) = x2
logarithmic: f (x) = log(x)
centered: f (x) = x − m
scaled: f (x) = x/s
(STAT587@ISU)
R01a - Simple linear regression:
March 30, 2021
2 / 11
Choosing explanatory variables
Quadratic relationship
Quadratic relationship
4
y
3
2
1
0
0
1
2
3
4
x^2
4
y
3
2
1
0
−2
−1
0
1
2
x
(STAT587@ISU)
R01a - Simple linear regression:
March 30, 2021
3 / 11
Choosing explanatory variables
Logarithmic relationship
Logarithmic relationship
1
0
y
−1
−2
−3
−4
−3
−2
−1
0
log(x)
1
0
y
−1
−2
−3
−4
0.0
0.5
1.0
1.5
2.0
x
(STAT587@ISU)
R01a - Simple linear regression:
March 30, 2021
4 / 11
Choosing explanatory variables
Shifting the intercept
Shifting the intercept
The intercept is the expected response when the explanatory variable is zero. If we use
f (x) = x − m,
then the new intercept is the expected response when the explanatory variable is m.
E[Y |X = x] = β0 + β1 (x − m) = β̃0 + β̃1 x
so our new parameters for the mean are
slope β̃1 = β1 (unchanged) but
intercept β̃0 = (β0 − mβ1 ).
(STAT587@ISU)
R01a - Simple linear regression:
March 30, 2021
5 / 11
Choosing explanatory variables
Shifting the intercept
Telomere data
telomere.length
1.6
1.4
1.2
1.0
−2.5
0.0
2.5
5.0
7.5
7.5
10.0
12.5
years − 5
telomere.length
1.6
1.4
1.2
1.0
2.5
5.0
years
(STAT587@ISU)
R01a - Simple linear regression:
March 30, 2021
6 / 11
Choosing explanatory variables
Shifting the intercept
Telomere data: shifting the intercept
m0 = lm(telomere.length ~
years
, abd::Telomeres)
m4 = lm(telomere.length ~ I(years-5), abd::Telomeres)
coef(m0)
(Intercept)
years
1.36768207 -0.02637431
coef(m4)
(Intercept) I(years - 5)
1.23581049 -0.02637431
confint(m0)
2.5 %
97.5 %
(Intercept) 1.25176134 1.483602799
years
-0.04478579 -0.007962836
confint(m4)
2.5 %
97.5 %
(Intercept)
1.18136856 1.290252429
I(years - 5) -0.04478579 -0.007962836
(STAT587@ISU)
R01a - Simple linear regression:
March 30, 2021
7 / 11
Choosing explanatory variables
Rescaling the slope
Rescaling the slope
The slope is the expected increase in the response when the explanatory variable increases by
1. If we use
f (x) = x/s,
then the new slope is the expected increase in the response when the explanatory variable
increases by s.
E[Y |X = x] = β0 + β1 (x/s) = β̃0 + β̃1 x
so our new parameters are
intercept β̃0 = β0 (unchanged) but
slope β̃1 = β1 /s.
(STAT587@ISU)
R01a - Simple linear regression:
March 30, 2021
8 / 11
Choosing explanatory variables
Rescaling the slope
Telomere data: rescaling the slope
telomere.length
1.6
1.4
1.2
1.0
2
4
6
years/2
telomere.length
1.6
1.4
1.2
1.0
2.5
5.0
7.5
10.0
12.5
years
(STAT587@ISU)
R01a - Simple linear regression:
March 30, 2021
9 / 11
Choosing explanatory variables
Rescaling the slope
Telomere data: rescaling the slope
m0 = lm(telomere.length ~
years
, abd::Telomeres)
m4 = lm(telomere.length ~ I(years/2), abd::Telomeres)
coef(m0)
(Intercept)
years
1.36768207 -0.02637431
coef(m4)
(Intercept) I(years/2)
1.36768207 -0.05274863
confint(m0)
2.5 %
97.5 %
(Intercept) 1.25176134 1.483602799
years
-0.04478579 -0.007962836
confint(m4)
2.5 %
97.5 %
(Intercept) 1.25176134 1.48360280
I(years/2) -0.08957159 -0.01592567
(STAT587@ISU)
R01a - Simple linear regression:
March 30, 2021
10 / 11
Choosing explanatory variables
Summary
Summary
Let
ind
Yi ∼ N (β0 + β1 f (Xi ), σ 2 ).
Choose f based on
Scientific understanding
Interpretability
Diagnostics
(STAT587@ISU)
R01a - Simple linear regression:
March 30, 2021
11 / 11
Download