Statistics 111 - Lecture 17 Testing Relationships between Variables July 1, 2008 Lecture 17 - Regression Testing 1 Administrative Notes • Homework 5 due tomorrow • Lecture on Wednesday will be review of entire course July 1, 2008 Lecture 17 - Regression Testing 2 Final Exam • Thursday from 10:40-12:10 • It’ll be right here in this room • Calculators are definitely needed! • Single 8.5 x 11 cheat sheet (two-sided) allowed • I’ve put a sample final on the course website July 1, 2008 Lecture 17 - Regression Testing 3 Outline • • • • Review of Regression coefficients Hypothesis Tests Confidence Intervals Examples July 1, 2008 Lecture 17 - Regression Testing 4 Two Continuous Variables • Visually summarize the relationship between two continuous variables with a scatterplot • Numerically, we focus on best fit line (regression) Education and Mortality Draft Order and Birthday Mortality = 1353.16 - 37.62 · Education Draft Order = 224.9 - 0.226 · Birthday July 1, 2008 Lecture 17 - Regression Testing 5 Best values for Regression Parameters • The best fit line has these values for the regression coefficients: Best estimate of slope Best estimate of intercept • Also can estimate the average squared residual: June 30, 2008 Stat 111 - Lecture 16 - Regression 6 Significance of Regression Line • Does the regression line show a significant linear relationship between the two variables? • If there is not a linear relationship, then we would expect zero correlation (r = 0) • So the slope b should also be zero • Therefore, our test for a significant relationship will focus on testing whether our slope is significantly different from zero H0 : = 0 July 1, 2008 versus Ha : 0 Lecture 17 - Regression Testing 7 Linear Regression • Best fit line is called Simple Linear Regression Model: • Coefficients:is the intercept and is the slope • Other common notation: 0 for intercept, 1 for slope • Our Y variable is a linear function of the X variable but we allow for error (εi) in each prediction • We approximate the error by using the residual Observed Yi Predicted Yi = + Xi June 30, 2008 Stat 111 - Lecture 16 - Regression 8 Test Statistic for Slope • Our test statistic for the slope is similar in form to all the test statistics we have seen so far: • The standard error of the slope SE(b) has a complicated formula that requires some matrix algebra to calculate • We will not be doing this calculation manually because the JMP software does this calculation for us! June 30, 2008 Stat 111 - Lecture 16 - Regression 9 Example: Education and Mortality June 30, 2008 Stat 111 - Lecture 16 - Regression 10 Confidence Intervals for Coefficients • JMP output also gives the information needed to make confidence intervals for slope and intercept • 100·C % confidence interval for slope : b +/- tn-2* SE(b) • The multiple t* comes from a t distribution with n-2 degrees of freedom • 100·C % confidence interval for intercept : a +/- tn-2* SE(a) • Usually, we are less interested in intercept but it might be needed in some situations July 1, 2008 Lecture 17 - Regression Testing 11 Confidence Intervals for Example • We have n = 60, so our multiple t* comes from a t distribution with d.f. = 58. For a 95% C.I., t* = 2.00 • 95 % confidence interval for slope : -37.6 ± 2.0*8.307 = (-54.2,-21.0) Note that this interval does not contain zero! • 95 % confidence interval for intercept : 1353± 2.0*91.42 = (1170,1536) July 1, 2008 Lecture 17 - Regression Testing 12 Another Example: Draft Lottery • Is the negative linear association we see between birthday and draft order statistically significant? p-value July 1, 2008 Lecture 17 - Regression Testing 13 Another Example: Draft Lottery • p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant linear relationship between birthday and draft order • Statistical evidence that the randomization was not done properly! • 95 % confidence interval for slope : -.23±1.98*.05 = (-.33,-.13) • Multiple t* = 1.98 from t distribution with n-2 = 363 d.f. • Confidence interval does not contain zero, which we expected from our hypothesis test July 1, 2008 Lecture 17 - Regression Testing 14 Education Example • Dataset of 78 seventh-graders: relationship between IQ and GPA • Clear positive association between IQ and grade point average July 1, 2008 Lecture 17 - Regression Testing 15 Education Example • Is the positive linear association we see between GPA and IQ statistically significant? p-value July 1, 2008 Lecture 17 - Regression Testing 16 Education Example • p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant positive relationship between IQ and GPA • 95 % confidence interval for slope : .101±1.99*.014 = (.073,.129) • Multiple t* = 1.99 from t distribution with n-2 = 76 d.f. • Confidence interval does not contain zero, which we expected from our hypothesis test July 1, 2008 Lecture 17 - Regression Testing 17 Next Class - Lecture 18 • Review of course material July 1, 2008 Lecture 17 - Regression Testing 18