R01a - Simple linear regression: Choosing explanatory variables STAT 587 (Engineering) Iowa State University March 30, 2021 (STAT587@ISU) R01a - Simple linear regression: March 30, 2021 1 / 11 Choosing explanatory variables Simple linear regression Let ind Yi ∼ N (β0 + β1 f (Xi ), σ 2 ). Possible choices for f : quadratic: f (x) = x2 logarithmic: f (x) = log(x) centered: f (x) = x − m scaled: f (x) = x/s (STAT587@ISU) R01a - Simple linear regression: March 30, 2021 2 / 11 Choosing explanatory variables Quadratic relationship Quadratic relationship 4 y 3 2 1 0 0 1 2 3 4 x^2 4 y 3 2 1 0 −2 −1 0 1 2 x (STAT587@ISU) R01a - Simple linear regression: March 30, 2021 3 / 11 Choosing explanatory variables Logarithmic relationship Logarithmic relationship 1 0 y −1 −2 −3 −4 −3 −2 −1 0 log(x) 1 0 y −1 −2 −3 −4 0.0 0.5 1.0 1.5 2.0 x (STAT587@ISU) R01a - Simple linear regression: March 30, 2021 4 / 11 Choosing explanatory variables Shifting the intercept Shifting the intercept The intercept is the expected response when the explanatory variable is zero. If we use f (x) = x − m, then the new intercept is the expected response when the explanatory variable is m. E[Y |X = x] = β0 + β1 (x − m) = β̃0 + β̃1 x so our new parameters for the mean are slope β̃1 = β1 (unchanged) but intercept β̃0 = (β0 − mβ1 ). (STAT587@ISU) R01a - Simple linear regression: March 30, 2021 5 / 11 Choosing explanatory variables Shifting the intercept Telomere data telomere.length 1.6 1.4 1.2 1.0 −2.5 0.0 2.5 5.0 7.5 7.5 10.0 12.5 years − 5 telomere.length 1.6 1.4 1.2 1.0 2.5 5.0 years (STAT587@ISU) R01a - Simple linear regression: March 30, 2021 6 / 11 Choosing explanatory variables Shifting the intercept Telomere data: shifting the intercept m0 = lm(telomere.length ~ years , abd::Telomeres) m4 = lm(telomere.length ~ I(years-5), abd::Telomeres) coef(m0) (Intercept) years 1.36768207 -0.02637431 coef(m4) (Intercept) I(years - 5) 1.23581049 -0.02637431 confint(m0) 2.5 % 97.5 % (Intercept) 1.25176134 1.483602799 years -0.04478579 -0.007962836 confint(m4) 2.5 % 97.5 % (Intercept) 1.18136856 1.290252429 I(years - 5) -0.04478579 -0.007962836 (STAT587@ISU) R01a - Simple linear regression: March 30, 2021 7 / 11 Choosing explanatory variables Rescaling the slope Rescaling the slope The slope is the expected increase in the response when the explanatory variable increases by 1. If we use f (x) = x/s, then the new slope is the expected increase in the response when the explanatory variable increases by s. E[Y |X = x] = β0 + β1 (x/s) = β̃0 + β̃1 x so our new parameters are intercept β̃0 = β0 (unchanged) but slope β̃1 = β1 /s. (STAT587@ISU) R01a - Simple linear regression: March 30, 2021 8 / 11 Choosing explanatory variables Rescaling the slope Telomere data: rescaling the slope telomere.length 1.6 1.4 1.2 1.0 2 4 6 years/2 telomere.length 1.6 1.4 1.2 1.0 2.5 5.0 7.5 10.0 12.5 years (STAT587@ISU) R01a - Simple linear regression: March 30, 2021 9 / 11 Choosing explanatory variables Rescaling the slope Telomere data: rescaling the slope m0 = lm(telomere.length ~ years , abd::Telomeres) m4 = lm(telomere.length ~ I(years/2), abd::Telomeres) coef(m0) (Intercept) years 1.36768207 -0.02637431 coef(m4) (Intercept) I(years/2) 1.36768207 -0.05274863 confint(m0) 2.5 % 97.5 % (Intercept) 1.25176134 1.483602799 years -0.04478579 -0.007962836 confint(m4) 2.5 % 97.5 % (Intercept) 1.25176134 1.48360280 I(years/2) -0.08957159 -0.01592567 (STAT587@ISU) R01a - Simple linear regression: March 30, 2021 10 / 11 Choosing explanatory variables Summary Summary Let ind Yi ∼ N (β0 + β1 f (Xi ), σ 2 ). Choose f based on Scientific understanding Interpretability Diagnostics (STAT587@ISU) R01a - Simple linear regression: March 30, 2021 11 / 11