Solution - Homework 2 Use the data from Homework 1 to complete this assignment and regress Y on X and store the residuals. 1. Make a Scatterplot with Regression. Does there appear to be a linear relationship? What two points appear to be potential outliers? Yes there does appear to be a linear relationship with two points, row 18 (X=9, Y=6) and row 19 (X=5, Y=9), representing possible outliers. Scatterplot of Y vs X 9 8 7 Y 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 X 2. Create boxplots for both X and Y. Are there any outliers? No outliers identified. See boxplot below. Boxplot of X, Y 0 X 0 2 4 2 4 6 8 Y 6 8 1 3. Do a check of normality by using a probability plot of the residuals. Include: a) the null and alternative hypotheses, b) the p-value of the test, c) your decision based on a 0.05 level of significance, and d) Minitab copy of your plot. a) Ho: The residuals come from a normal distribution Ha: The residuals do not come from a normal distribution b) p-value is 0.942 c) Since p-value is greater than 0.05 we fail to reject Ho and will conclude the assumption of normality is plausible. d) Probability Plot of RESI1 Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 -2.13163E-15 1.556 20 0.158 0.942 Percent 80 70 60 50 40 30 20 10 5 1 -5.0 -2.5 0.0 RESI1 2.5 5.0 4. Do a check of equal variances by performing a Modified Levene Test. Include: a) the null and alternative hypotheses, b) the p-value of the test, c) your decision based on a 0.05 level of significance, and d) Minitab copy of your plot. a) Ho: The variances are equal Ha: The variances are not equal b) The p-value is 0.533 NOTE: Remember that the Levene’s test is more robust against violations to normality than is the F-test making the Levene test a better overall test of equal variances. The only condition for the Levene test is that the variable being tested is continuous. c) Since the p-value is greater than 0.05 we conclude that the assumption of equal variances is plausible. d) 2 Test for Equal Variances for RESI1 F-Test Test Statistic P-Value 0 0.83 0.861 group Lev ene's Test Test Statistic P-Value 1 1.0 1.5 2.0 2.5 3.0 3.5 95% Bonferroni Confidence Intervals for StDevs 0.40 0.533 4.0 group 0 1 -4 -3 -2 -1 0 RESI1 1 2 3 5. Perform a Lack of Fit Test to check if linear regression function is appropriate. Include: a) the null and alternative hypotheses, b) the correct F-statistic and p-value of the test, c) your decision based on a 0.05 level of significance, and d) Minitab copy of your ANOVA output. a) Ho: The linear regression function is appropriate Ha: The linear regression function is not appropriate b) F-statistic is 0.95 and p-value is 0.507 c) Since p-value is greater than 0.05 we fail to reject Ho and conclude plausible that linear regression function is appropriate. d) Analysis of Variance Source Regression Residual Error Lack of Fit Pure Error Total DF 1 18 7 11 19 SS 70.769 46.031 17.364 28.667 116.800 MS 70.769 2.557 2.481 2.606 F 27.67 P 0.000 0.95 0.507 6. Even though you may not have found any assumption violations perform a Box-Cox analysis on Y to see if any transformation is suggested. Include the a) estimated and rounded lambda values, b) the interpretation of this value, and c) the Box-Cox plot. NOTE: This can only be done using Minitab Version 15 or higher – i.e. student version 14 does not contain Box-Cox program. a) Estimated value is 1.07 and rounded lambda is 1.00 b) The rounded value implies one raise Y to power of 1.00 which means no transformation necessary 3 c) Box-Cox Plot of Y Lower C L 9 Upper C L Lambda StDev (using 95.0% confidence) 8 Estimate 1.07 7 Lower C L Upper C L 0.34 1.89 Rounded Value 1.00 6 5 4 3 Limit 2 -2 -1 0 1 2 Lambda 3 4 5 7. Find Bonferroni joint confidence intervals for Bo and B1 with a 90% family confidence level and include your interpretation of these intervals. You can use the Minitab output to find s{bo} and s{b1} With sample size, n, of 20 the degrees of freedom are n-2 or 18. Since interested in two joint intervals, Bo and B1, g is equal to 2 for our Bonferroni correction. Using the equations bo Bs{bo } and b1 Bs{b1} where B t1n2/ 4 . From t-table the value for the Bonferrroni multiplier using DF of 18 and 1-α/4 for alpha of 0.10 results in a 2.101 t-statistic. Plugging into the equations: For Bo: 1.377 +/- 2.101*0.8442 = 1.377 +/- 1.774 = -0.397 <= Bo <= 3.151 For B1: 0.8652 +/- 2.101*0.1645 = 0.8652 +/- 0.3456 = 0.5196 <= B1 <= 1.2108 Interpretation: We are 90% confident that both intervals contain the true intercept and slope. 8. Use Minitab to find Bonferroni simultaneous confidence intervals for new X observations of 0 and 10 using a 95% family confidence level. Include your the output and interpretation of these intervals. Follow-up question 1: What is the interpretation of the level of confidence for the confidence intervals in the output? Follow-up question 2: Can you think of a reason why these new X values might not be reliable? Follow-up question 3: Show mathematically how one would use the Minitab output to get the simultaneous level of confidence for new observations. Interpretation: We are 95 percent confident in both of the following intervals being correct: that the reading achievement stanine for a reading readiness stanine of 0 would be from -0.687 to 3.441 and the reading achievement stanine for a reading readiness stanine of 10 would be from 7.706 to 12.351 Predicted Values for New Observations 4 New Obs 1 2 Fit 1.377 10.029 SE Fit 0.844 0.950 97.5% CI (-0.687, 3.441) ( 7.706, 12.351) 97.5% PI (-3.044, 5.798) ( 5.481, 14.576)X X denotes a point that is an outlier in the predictors. Values of Predictors for New Observations New Obs 1 2 X 0.0 10.0 Follow-up 1: The 97.5% level of confidence is how confident we are in any ONE of the intervals being correct. Follow-up 2: The range of x-values used in this analysis was from 1 to 9 bringing into consideration the possibility of improper extrapolation of applying the regression equation to values outside this range of x. Follow-up 3: This 97.5% level of confidence is found using 1 – α/g = 0.975. For this particular problem we are interested in two simultaneous intervals, or a g = 2. Using algebra to find alpha we would get α/g = 0.025 resulting in 0.05 alpha or a 95% simultaneous level of confidence. NOTE: Software systems by default use α/2 when constructing confidence intervals and is why when solving this equation we do not use α/2 but instead α/g. If one were to use α/2g based on the level of confidence in the output you would “double divide” by 2. 9. What is the value and interpretation of the coefficient of determination? Using the output and correct values show two ways this value can be calculated. From the output the coefficient of determination, or R-squared, is 60.6% meaning that 60.6 percent of the variation in reading achievement stanines can be explained by reading readiness stanines. S = 1.59914 R-Sq = 60.6% R-Sq(adj) = 58.4% Analysis of Variance Source Regression Residual Error Total DF 1 18 19 SS 70.769 46.031 116.800 MS 70.769 2.557 F 27.67 P 0.000 Two possible methods for calculating R-squared are: 1) (SSR/SST)*100% = (70.769/116.8)*100% = 60.6% 2) [1 – (SSE/SST)]*100% = [1 – (46.031/116.8)]*100% = 60.6% 10. From our in class example of Sales-Advertising, the tests results were as follows: the intercept had T = -0.16 and p-value of 0.885; the slope test had T = 3.66 and p-value of 0.035; and the ANOVA test had F = 13.66 and p-value of 0.035. Use Minitab to find this 5 p-values by going to Calc > Probability Distributions and selecting appropriately either T or F. Then select the radio button for “Cumulative Probability”, enter the appropriate degrees of freedom for the test, click the radio button for “Input Constant” and enter in the text box the appropriate value of the test statistic. Click OK. From the output show how one gets from this output to the p-value. Include a copy of the Minitab output for each test. Test of Intercept: From the output we would take 0.441524 and multiply by two to get 0.883 which is approximately 0.885 due to rounding. Cumulative Distribution Function Student's t distribution with 3 DF x -0.16 P( X <= x ) 0.441524 Test of Slope: From output we would subtract 0.982377 from 1 and then double this result getting 0.017623*2 = 0.035246 which is approximately 0.035 Cumulative Distribution Function Student's t distribution with 3 DF x 3.66 P( X <= x ) 0.982377 F-Test: From this output we would simply subtract 0.96526 from 1 to get 0.03474 which is approximately 0.035 Cumulative Distribution Function F distribution with 1 DF in numerator and 3 DF in denominator x 13.66 P( X <= x ) 0.965626 NOTE: When using T, we need to double the result since the hypothesis test is 2-sided and the T is symmetric. When our t-stat is negative we do not need to subtract from 1. For the F-test this is already run as a 2-sided test so no need to double the result, but since cumulative for a positive test statistic (which all F-statistics are) we must still subtract from 1. 6