Lab Notes, Exam 2, MIS 131 Dr. Darrell Freeman EXAM2, Spring 2006 Instructions: The exam has 180 total points possible and will be graded against a 150 base score. The exam has two parts. Each part is associated with a separate data file. There are a total of 20 questions, each worth 9 points. Part 1 This part involves analysis of variance and you will need to evaluate multiple range tests to answer some of the questions. Analysis of variance (ANOVA) is found under the Compare menu in Statgraphics. Multifactor ANOVA will be appropriate for this problem since there are multiple factors to consider. Multiple range tests are an option under the tabular options (the yellow button). The file X2SalesForce.sf contains sales data by region for the Home Security Co (HSC). It is the policy of the company to rotate its four regional sales managers through the four sales regions on a quarterly basis so that over a one year period each manager visits each region one time. HSC would like to know if manager, region and quarter are significant factors in determining sales. Ignoring price for the moment, analyze whether manager, quarter and region are significant factors in determining sales. 1) The following are statistically significant factors in determining sales: a) Manager only b) Manager and Region only c) Region and Quarter only d) Quarter and Manager only e) Manager, Region and Quarter <= Look at the Analysis of Variance table. All three factors have p-values less than 0.05. 2) Sales by quarter breaks out into which homogeneous groups: a) 1, 2, 3, 4 b) 14, 3, 2 c) 12, 3, 4 <= Open the multifactor analysis pane and use pane options to select qtr. The X’s are in three columns. Also note that all the pair-wise comparisons except for 1 and 2 show significant differences in their least square means (LSmean). d) 12, 34 e) 13, 2, 4 3) Sales by manager breaks out into which homogeneous groups: a) ABC, D b) AB, CD c) BC, AD <= Switch the multifactor pane to mgr using pane options and note how the X’s line up. Also note that these two pairings do not show a significant difference in their LSmeans. d) A, B, CD e) A, DB,BC 4) Sales by region breaks out into which homogeneous groups a) 1, 2, 3, 4 <= Switch the pane to reg and note the X’s. Also note that all pairs of LSmeans have significant differences. b) 12, 23, 34, 24 c) 1, 23, 34, 24 d) 123, 4 e) 1, 2, 34 It turns out that the company has adjusted base selling prices each quarter, and sales managers have some discretion about the actual selling price. Include price in the analysis as a covariate and answer the following questions. Use the red button to bring up the model specification dialog box and add price as a covariate. Price is not entered as a factor in this problem because it is a continuous variable (quantitative) not a categorical variable (qualitative). Note that regions, although indicated by number, are not quantitative but qualitative. We could have as easily used NE, NW, SE and SW as region data. 5) Which factors are statistically significant for the determination of sales? a) Manager and Price b) Manager, Quarter and Price c) Region and Quarter d) Manager and Region <= With price included as a covariate we find that only mgr and reg are significant factors in sales. Note the p-values in the analyisis of variance table. e) All factors are statistically significant 6) Sales are not significantly different statistically for the following pairs of managers: a) A and D only b) B and C only c) All pairs are significantly different statistically d) No pairs are significantly different statistically e) A and D, B and C only <= Again, use multifactor analysis with mgr selected from pane options. 7) Sales are not significantly different for the following pairs of regions: a) 1 and 2, 3 and 4 only b) 1 and 3, 2 and 4, 3 and 4 only <= Switch the multifactor analysis pane to reg using pane options and see how the X’s align. Also note that none of these pairs shows a significant difference. c) All pairs are significantly different statistically d) No pairs are significantly different statistically e) None of the above 8) List Regions in order of descending mean sales a) 1234 b) 4321 c) 2143 d) 2431 <= In the multifactor analysis pane the regions are listed in ascending order (increasing mean). You need to read the column from the bottom up. e) 3214 9) List Managers in order of descending mean sales a) ACDB b) BDAC c) DABC <= Switch the multifactor window to mgr and read the column from the bottom up. d) CABD e) DBCA 10) In general, when performing analysis of variance, the higher the F ratio the lower the p-value. a) True <= This is true. Note that the F ratio here is based on the residual mean squared error. Covariates are used in a regression model and then the residual error is tested against the null hypothesis that each individual factor is insignificant in its effect on the dependent variable using chi squared tests. b) False c) Indeterminate Part 3 Sam has a growing business in specialized business systems. He competes on price, service and delivery time. Sam would like to model his order flow using price, service response time and promised delivery time as explanatory variables. The file Sam.sf has monthly data on order flow, pricing, service call response time (hours) and delivery time (days). Sam’s order flow is a combination of new customers and repeat customers. Most of his repeat customers are on a six month reorder cycle. Open the SAM.sf file and look at the order data in the special/time series/descriptive window. 11) Characterize the data: a) Trending up b) Trending up with regular cycles <= The data is clearly trending up with an overlay of cycles with a six month period. c) Stationary d) Random e) Random Walk Build a multiple regression model for orders with the independent variables price, hours and days. 12) Looking at the analysis of variance table, can we reject the idea that the model is not statistically significant? a) Yes <= The p-value says we can reject this idea at the 99% confidence level. b) No c) Impossible to tell 13) Looking at the multiple regression table, which variables are significant in the model? a) Price, Hours and Days b) Days only c) Price only <= With this model only price is a significant variable, as it is the only one with a p-value less than 0.10 in the multiple regression table. d) Hours only e) None of the above 14) About what percent of the variation in orders is explained by this model? a) 95% b) 22% <= Look at the R-squared value. c) 45% d) 75% e) 35% Based on your viewing of the Orders data and on the information given in the problem statement include an appropriate lag for orders in the model. The problem statement notes a six month reorder cycle. The data shows regular six month cycles on top of the trend. The autocorrelation for Orders is more subtle, having a blip above the trend at lag 6. You need to find the lag of six months. 15) Is the lag term statistically significant in the multiple regression analysis? a) Yes <= Note that if lag(orders,1) and lag(orders,6) are both included in the model then only lag(orders,6) is found to be statistically significant. b) No c) Can’t tell There is an obvious outlier. Create and use a dummy variable to remove its effect on the error. With lag(orders,6) in the model the outlier is at row 30. It shows up clearly in the graph of residuals vs X, and in the table of unusual residuals. 16) With the dummy variable in the model about what percent of the variation in orders is explained by the model? a) 90% b) 95% c) 98% d) 99% <= R squared =98.9 e) 16% Save the residuals using the save output button. Delete the blank cells in the first six rows and test the residuals. 17) Can we reject the idea that the residuals are random? a) Yes b) No <= Run tests for randomness from the Special/Time series/Descriptive Methods window. (Use the yellow button and select tests for randomness). The p-values for all of the tests are greater than 0.10 indicating that the null hypothesis should not be rejected (data is drawn at random from a probability distribution). c) Can’t tell 18) Can we reject the idea that the residuals are normal? a) Yes b) No <= Run tests for normality from the Describe/Distributions/Distributions Fitting (Uncensored Data) window (use the yellow button to select tests for normality). The p-values for all of the tests are greater than 0.10 indicating that the null hypothesis should not be rejected (data is drawn from a normal distribution). c) Can’t tell 19) Is the residual error of the common cause variety? a) Yes <= Since we can’t reject the idea that the error is both random and normal. b) No c) Can’t tell 20) Is the model complete? a) Yes <= R squared of 99% and common cause error for residuals, the model effort is complete (you may still need to simplify). b) No c) Can’t tell