Lectures #11 and #12 - Additional Practice Solution (1) A corporation administers an aptitude test to all new sales representatives. Management is interested in the extent to which this test is able to predict their eventual success. The accompanying table records average weekly sales (in ten thousands of dollars) and aptitude test scores for a random sample of five representatives. Use a 5% significance level wherever appropriate. 25 x 5 5 Weekly Sales (y) 1 5 4 7 3 20 xi x ( xi x ) 2 -1 1 0 1 -1 0 1 1 0 1 1 4 yi y ( yi y ) 2 ( xi x )( yi y ) -3 1 0 3 -1 0 9 1 0 9 1 20 3 1 0 3 1 8 is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m Test Score (x) 4 6 5 6 4 25 S y2 20 y 4 5 ( y i y ) 2 20 5 n 1 4 S x2 (x i x) 2 n 1 4 1 4 a. Calculate the covariance between test score and weekly sales and the correlation between test score and weekly sales. Cov(X,Y) = Sxy = (x i x )( y i y ) n 1 8 2 4 b. Using the least squares method, find the simple linear regression equation to predict weekly sales from test score. b1 s xy s 2 x 2 1 b0 y b1 x = 4 – (2)(5) = -6 Th Salesˆ 6.00 2.00Score sh c. Interpret the regression coefficients. The sample slope tells us that for each additional point attained on the test, average weekly sales are estimated to increase $20,000. The sample intercept tells us that when the test score is 0, average weekly sales are estimated to be -$60000. This is an extrapolation. This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00 https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/ d. Find the coefficient of determination and explain its meaning. R2 SSR 16 0.80 SST 20 SST ( yi y ) 2 (n 1)s y2 = (4)(5) = 20 SSR b12 ( xi x ) 2 b12 (n 1)s x2 = (2)2(4)(1)=16 We can explain 80% of the differences in weekly sales by relating it to the test score. e. Find the residuals from the regression. f. e i yi yˆ yˆ 6.00 2.00x is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m Test Score Weekly Sales (x) (y) 4 1 6 5 5 4 6 7 4 3 2 6 4 6 2 1 5 4 7 3 – – – – – 2 6 4 6 2 = = = = = -1 -1 0 1 1 Find a point estimate of σε. S MSE 1.333 1.15470 SSE = SST – SSR = 20 – 16 = 4 Or, SSE = MSE e 2 i (1) 2 (1) 2 (0) 2 (1) 2 (1) 2 4 SSE 4 n p 1 3 g. What are the required data conditions for the inference procedures we discussed to be reliable? Th εi iid N(0, σ). The error terms are independent at identically distributed according to the Normal distribution with a mean of 0 and some constant standard deviation, σ. sh h. Find a 95% confidence interval estimate of the population slope. Interpret it. b1 t , n k 1 S b1 2 3.182(0.5774) = (0.16, 3.84) 2 S b1 MSE ( xi x ) 2 MSE (n 1) s x2 1.333 0.5774 4 This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00 https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/ tc = ±t0.025,3 = ±3.182 We can be 95% confident that each additional point earned on the test is associated with between $1600 and $38,400 additional weekly sales, on average. i. What are the hypotheses for the test to determine if test score is a significant predictor of weekly sales. H0: β1 = 0 Ha: β1 ≠ 0 is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m Find the F test statistic and the F-critical point(s) for the test described in question #i. What is your conclusion? Scatterplot of C5 vs C4 16 MSR Fobs = 1 12 Reject H0 Do Not Reject H0 MSE 4 Reject H0 Do Not Reject H0 3 C5 j. Since Fobs > Fc, Reject H0 and conclude that using the linear model that relates weekly sales to test score provides significantly more explanation of the variation is weekly sales than y-bar does. α = 0.05 C4 Fc = 10.128 k. Find the t-test statistic and the t-critical point(s) for the test described in question #i. What is your conclusion? b 1 20 tobs = 1 3.464 s b1 0.5774 Reject Reject α/2 α/2 3.182 t Th -3.182 Since tobs > 3.182, Reject H0 and conclude that using the linear model that relates weekly sales to test score provides significantly more explanation of the variation is weekly sales than ybar does. l. Find 95% confidence interval for E(y) at x = 5. sh yˆ t , n k 1 S ˆ 2 yˆ 6 2(5) 4 1 (5 5) 2 4 3.182 1.333 4 5 = (2.36, 5.64) × $10,000 This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00 https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/ m. Find 95% prediction interval for y at x = 5. yˆ t , nk 1 S yˆ 2 1 (5 5) 2 4 3.182 1.3331 4 5 = (-0.025, 8.025) × $10,000 n. Should the model be used to predict y when x = 10? Explain. is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m No, the sample data includes values for X in the range 4 – 6. Predicting Y when X = 10 would be an extrapolation. (2) The Pearson coefficient of correlation r equals 1 when there is no: a. explained variation b. unexplained variation The unexplained variation is based on the residuals. The relationship is deterministic (all points fall on a straight line) when r = 1, so all residuals will be 0. c. y-intercept in the model d. outliers (3) In a regression problem, if the coefficient of determination is 0.95, this means that: a. b. c. d. 95% of the y values are positive 95% of the variation in y can be explained by the variation in x 95% of the x values are equal 95% of the variation in x can be explained by the variation in y (4) In simple linear regression, which of the following statements indicate no linear relationship between the variables x and y? a. Coefficient of determination is 1.0 b. Coefficient of correlation is 0.0 c. Sum of squares for error is 0.0 d. Sum of squares for regression is relatively large Th (5) A scatter diagram includes the following data points: 3 8 2 6 5 12 4 10 5 14 sh x y Two regression models are proposed: (1) ŷ 1.2 + 2.5x, and (2) ŷ 3 + 2.0x. Using the least squares method, which of these regression models provides the better fit to the data? Why? The better equation is (1). It is the one that results in the lower SSE. Find the residuals using both equations; square them; sum the squared residuals. This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00 https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/ (6) A simple linear regression between X and Y using 25 observations has SSR = 100. The sample variance for Y is 6.25. Construct the Regression ANOVA table. Source of Variation SS df MS F Regression 100 p=1 100 46 Error 50 n – p – 1 = 23 2.1739 Total 150 n – 1 = 24 sh Th is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m SST (n 1)s 2y (24)(6.25) 150 This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00 https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/ (7) Below are the (partial) simple linear regression results for the 12 observations of monthly advertising (in $100’s) and sales (in $1000’s) values given below. 18 40 88 50 82 53 36 18 86 32 61 63 Covariances: Sales, Advertising Advertising 8 20 30 20 32 21 18 10 32 13 23 28 Sales 606.3864 196.3864 Sales Advertising Advertising 67.2955 is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m Sales a) Fill in the missing information. (Note: This should not require a great deal of calculation.) Summary Multiple R R-Square Adjusted R-Square StErr of Estimate 0.9451 0.9396 6.050251124 Sum of Squares 6304.195 366.055 Mean of Squares 6304.195 36.6055 .9451 ANOVA Table Explained Unexplained Degrees of Freedom p=1 n – p – 1 = 10 F-Ratio p-Value 172.2197 < 0.0001 Standard Regression Table Constant Advertising Coefficient Error 5.0378 0.22237 -9.763255657 2.918270854 Confidence Interval 95% t-Value p-Value -1.9380 13.123 0.0814 < 0.0001 Th MSE = s2 = (6.0502)2 = 36.6055 sh Use the relationship t obs bj j sb j to find s b0 and t-value for b1. SSE = (dfE)(MSE) = (10)(36.6055) = 366.055 s 2y 606.3864-----→ SST = ( yi y ) 2 = (12 – 1)(606.3864) = 6,670.25 SSR = SST – SSE = 6670.25 – 366.055 = 6304.195 This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00 https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/ Lower -20.9884 2.4228 Upper 1.4619 3.4138 b) What is the least squares equation to predict sales given advertising? Salesˆ 9.763 2.919AD c) Discuss how strong this model is The model is fairly strong. The coefficient of determination = R2 = .9451. This tells us that 94.51% of the differences in (variability in) sales values can be explained by the amount of advertising. The standard error of the estimate = s = 6.05 ($1000’s). This must be considered in the context of the sales values. The range for sales is 18 to 88 with an average of 47.5. While s = 6.05 is not excessively large in this context, it isn’t small either. What is the 95% confidence interval estimate of the population slope? is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m d) This can be read directly from the StatTools output: (2.4228, 3.4138) It is calculated: b1 (t*)sb1 2.91827 (2.228)(0.22237) e) What does this model predict will happen to sales if advertising is decreased $1000? f) A decrease in advertising of $1000 is a decrease of 10 units of advertising. The model predicts that sales increase 2.9183 units of y for each additional unit of advertising. Therefore, a 10 unit decrease in advertising is predicted to be accompanied by a 29.183 ($1000) = $29,183 decrease in sales. g) What does this model predict will happen to sales if advertising is increased by $2000? Since the model is linear, every 1 unit change in x is accompanied by 2.9183 units change in y (in the same direction): Therefore, a 20 unit increase in advertising is predicted to be accompanied by a 2.9183 × sh Th 20 unit = $58,366 increase in sales. This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00 https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/ (8) For the following set of residual plots, discuss whether or not all of the assumptions required sh Th is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m for inference in regression have been satisfied This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00 https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/ Powered by TCPDF (www.tcpdf.org)