KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICS and STATISTICS DHAHRAN, SAUDI ARABIA STAT212: BUSINESS STATISTICS II (091) Quiz #5, Jan 03, 2010. Time: two days Instructor: Dr Abdulkadir Hussein; Student name:_________________________ ID:____________________________ section:____ Student Q1: A publisher wanted to figure out the variables that contribute to the number of books sold. They considered volume sold as Y and other variables such as competing books etc were considered as dependent variables. Answer the following questions. a. From the correlation matrix below, which variables seem to be significantly correlated with Y ? Justify your answer Correlations: Volumes Sold Y, Pages X1, Competing Books X2, Advertising Budget X3, Age of Author X4 Volumes Sold Y 0.622 0.013 Pages X1 Competing Books 0.355 0.194 0.501 0.057 Advertising Budg 0.620 0.014 0.091 0.746 0.384 0.158 Age of Author X4 0.485 0.067 -0.019 0.947 -0.113 0.687 Pages X1 Cell Contents: Pearson correlation P-Value MTB > Competing Books Advertising Budg 0.265 0.340 Q2: The publisher used best subset variable selection method and they produced the following MINITAB output. Use the output to decide the best set of variables to be used in a regression model for predicting the volumes sold. Justify your answer. Best Subsets Regression: Volumes Sold versus Pages X1, Competing Bo, ... Response is Volumes Sold Y C o m p e t i n g R-Sq 38.6 38.5 70.7 63.3 83.5 74.1 84.5 R-Sq(adj) 33.9 33.8 65.8 57.2 79.0 67.1 78.3 Mallows Cp 28.5 28.6 9.9 14.6 3.6 9.7 5.0 S 41.593 41.644 29.930 33.463 23.436 29.357 23.850 A g e o f B u d g e t A u t h o r X X X 1 2 3 X X X X X X X X X X X X X X 4 P a g e s Vars 1 1 2 2 3 3 4 A d v e r t i s i n g B o o k s X X X Q3: Finally, the publisher decided to use all the variable in the regression model …Use the MINITAB output to answer the question.. Regression Analysis: Volumes Sold versus Pages X1, Competing Bo, ... The regression equation is Volumes Sold Y = - 125 + 0.176 Pages X1 - 1.57 Competing Books X2 + 1.59 Advertising Budget X3 + 1.61 Age of Author X4 Predictor Constant Pages X1 Competing Books X2 Advertising Budget X3 Age of Author X4 S = 23.8497 Coef -125.31 0.17590 -1.574 1.5917 1.6137 R-Sq = 84.5% SE Coef 31.08 0.03977 1.996 0.4445 0.6250 T -4.03 4.42 -0.79 3.58 2.58 P 0.002 0.001 0.449 0.005 0.027 VIF 1.369 1.687 1.353 1.152 R-Sq(adj) = 78.3% Analysis of Variance Source Regression Residual Error Total DF 4 10 14 SS 30960.3 5688.1 36648.4 MS 7740.1 568.8 F 13.61 P 0.000 Residual Plots for Volumes Sold Y Normal Probability Plot Versus Fit s 99 40 Residual Percent 90 50 10 1 -50 -25 0 Residual 25 20 0 -20 -40 50 0 40 40 3 20 2 160 0 -20 1 0 120 Versus Order 4 Residual Frequency Histogram 80 Fitted Value -30 -20 -10 0 10 Residual 20 30 40 -40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Observation Order a. Test the hypothesis that, overall, the regression model is significant. Hypotheses: Critical region: Test statistic: Decision and conclusion: b. Test the hypothesis that the age of the author is significant. c. Interpret the coefficient of the age of the author .. d. Using the residual plots above, check the regression assumptions of: Normality, Independence, and variance constancy of the regression errors.. Q3: In the following regression, the independent variables X1 and X2 are continuous, whereas, X3 and X4 are dummy variables and X5=X1*X4 (that is, an interaction term between X1 and X4). Answer the following questions.. The regression equation is y = 9.45 + 2.98 x1 - 0.503 x2 + 1.54 x3 + 2.86 x4 + 2.98 x5 Predictor Constant x1 x2 x3 x4 x5 Coef 9.454 2.9842 -0.50282 1.5412 2.855 2.9770 S = 2.01791 SE Coef 5.136 0.2534 0.03741 0.4332 5.676 0.2810 R-Sq = 99.5% T 1.84 11.78 -13.44 3.56 0.50 10.59 P 0.069 0.000 0.000 0.001 0.616 0.000 VIF 5.300 1.042 1.034 152.238 153.019 R-Sq(adj) = 99.5% Analysis of Variance Source Regression Residual Error Total DF 5 94 99 SS 80021 383 80404 MS 16004 4 F 3930.33 P 0.000 a. Interpret the coefficient of X4 b. Identify the variables which exhibit multicollinearity c. In your opinion, do you suggest removing some of the variables which have multicollinearity problem ? Why or why not?