Stat 301 – Lecture 8 Regression model Y y| x •Y represents a value of the response variable. • y| x represents the population mean response for a given value of the explanatory variable, x. • represents the random error 1 Linear Regression Model Y y| x 0 1 x 0 The Y-intercept parameter. 1 The slope parameter. 2 Residual (Observed Y – Fitted Y) Fit Yˆ ˆ y| x ˆ0 ˆ1 x Residual ˆ Y Yˆ Y ˆ0 ˆ1 x 3 1 Stat 301 – Lecture 8 Conditions The relationship is linear. The random error term, , is Independent Identically distributed Normally distributed with standard deviation, . 4 Residual vs. Explanatory 0.3 0.2 Residual 0.1 0.0 -0.1 -0.2 -0.3 300 350 400 CO2 5 Residual vs. Predicted 6 2 Stat 301 – Lecture 8 Interpretation Random scatter around the zero line indicates that the linear model is adequate for the relationship between carbon dioxide and temperature. 7 Patterns Over/Under/Over or Under/Over/Under The linear model may not be adequate. We could do better by accounting for curvature with a different model. 8 Speed and Stopping Distance 9 3 Stat 301 – Lecture 8 Patterns Two, or more, groups May require separate regression models for each group. 10 Gas used vs. Temperature 11 Checking Conditions Independence. Hard to check this but the fact that we obtained the data through a random sample of years assures us that the statistical methods should work. 12 4 Stat 301 – Lecture 8 Checking Conditions Identically distributed. Check using an outlier box plot. Unusual points may come from a different distribution Check using a histogram. Bimodal shape could indicate two different distributions. 13 Checking Conditions Normally distributed. Check with a histogram. Symmetric and mounded in the middle. Check with a normal quantile plot. Points falling close to a diagonal line. 14 Distributions 3 .99 2 .95 .90 1 .75 Normal Quantile Plot Residuals Temp 0 .50 .25 -1 .10 .05 -2 .01 -3 10 6 Count 8 4 2 -0.2 -0.1 0 0.1 0.2 0.3 15 5 Stat 301 – Lecture 8 Residuals Histogram is skewed right and mounded to the left of zero. Box plot is skewed right with no unusual points. Normal quantile plot has points that do not follow the diagonal, normal model, line very well. 16 Checking Conditions Constant variance. Check the plot of residuals versus the explanatory or predicted. Points should show the same spread for all values of the explanatory variable. 17 Residual Non-constant variance Explanatory or Predicted 18 6 Stat 301 – Lecture 8 Residual vs. Explanatory 0.3 0.2 Residual 0.1 0.0 -0.1 -0.2 -0.3 300 350 400 CO2 19 Residual vs. Predicted 20 Constant Variance Points show about the same amount of spread for all values of the explanatory variable. 21 7 Stat 301 – Lecture 8 Conclusion The independence, identically distributed and common variance conditions appear to be satisfied. The normal distribution condition may not be met for these data. 22 Consequences The P-values for tests may not be correct. However, the P-value was so small, there is still strong evidence for a linear relationship between carbon dioxide and temperature. 23 Consequences The stated confidence level may not give the true coverage rate. We have confidence in the intervals, it may not be 95%. 24 8