Instructions: Open the exam file from your desktop

advertisement
Lab Notes, Exam 2, MIS 131
Dr. Darrell Freeman
EXAM2, Spring 2006
Instructions: The exam has 180 total points possible and will be graded against a 150
base score. The exam has two parts. Each part is associated with a separate data file.
There are a total of 20 questions, each worth 9 points.
Part 1
This part involves analysis of variance and you will need to evaluate multiple range tests
to answer some of the questions.
Analysis of variance (ANOVA) is found under the Compare menu in Statgraphics.
Multifactor ANOVA will be appropriate for this problem since there are multiple
factors to consider. Multiple range tests are an option under the tabular options
(the yellow button).
The file X2SalesForce.sf contains sales data by region for the Home Security Co (HSC).
It is the policy of the company to rotate its four regional sales managers through the four
sales regions on a quarterly basis so that over a one year period each manager visits each
region one time. HSC would like to know if manager, region and quarter are significant
factors in determining sales. Ignoring price for the moment, analyze whether manager,
quarter and region are significant factors in determining sales.
1) The following are statistically significant factors in determining sales:
a) Manager only
b) Manager and Region only
c) Region and Quarter only
d) Quarter and Manager only
e) Manager, Region and Quarter <= Look at the Analysis of Variance table. All
three factors have p-values less than 0.05.
2) Sales by quarter breaks out into which homogeneous groups:
a) 1, 2, 3, 4
b) 14, 3, 2
c) 12, 3, 4 <= Open the multifactor analysis pane and use pane options to
select qtr. The X’s are in three columns. Also note that all the pair-wise
comparisons except for 1 and 2 show significant differences in their least
square means (LSmean).
d) 12, 34
e) 13, 2, 4
3) Sales by manager breaks out into which homogeneous groups:
a) ABC, D
b) AB, CD
c) BC, AD <= Switch the multifactor pane to mgr using pane options and
note how the X’s line up. Also note that these two pairings do not show a
significant difference in their LSmeans.
d) A, B, CD
e) A, DB,BC
4) Sales by region breaks out into which homogeneous groups
a) 1, 2, 3, 4 <= Switch the pane to reg and note the X’s. Also note that all
pairs of LSmeans have significant differences.
b) 12, 23, 34, 24
c) 1, 23, 34, 24
d) 123, 4
e) 1, 2, 34
It turns out that the company has adjusted base selling prices each quarter, and sales
managers have some discretion about the actual selling price. Include price in the
analysis as a covariate and answer the following questions.
Use the red button to bring up the model specification dialog box and add price
as a covariate. Price is not entered as a factor in this problem because it is a
continuous variable (quantitative) not a categorical variable (qualitative). Note
that regions, although indicated by number, are not quantitative but qualitative.
We could have as easily used NE, NW, SE and SW as region data.
5) Which factors are statistically significant for the determination of sales?
a) Manager and Price
b) Manager, Quarter and Price
c) Region and Quarter
d) Manager and Region <= With price included as a covariate we find that
only mgr and reg are significant factors in sales. Note the p-values in the
analyisis of variance table.
e) All factors are statistically significant
6) Sales are not significantly different statistically for the following pairs of managers:
a) A and D only
b) B and C only
c) All pairs are significantly different statistically
d) No pairs are significantly different statistically
e) A and D, B and C only <= Again, use multifactor analysis with mgr selected
from pane options.
7) Sales are not significantly different for the following pairs of regions:
a) 1 and 2, 3 and 4 only
b) 1 and 3, 2 and 4, 3 and 4 only <= Switch the multifactor analysis pane to reg
using pane options and see how the X’s align. Also note that none of
these pairs shows a significant difference.
c) All pairs are significantly different statistically
d) No pairs are significantly different statistically
e) None of the above
8) List Regions in order of descending mean sales
a) 1234
b) 4321
c) 2143
d) 2431 <= In the multifactor analysis pane the regions are listed in
ascending order (increasing mean). You need to read the column from
the bottom up.
e) 3214
9) List Managers in order of descending mean sales
a) ACDB
b) BDAC
c) DABC <= Switch the multifactor window to mgr and read the column from
the bottom up.
d) CABD
e) DBCA
10) In general, when performing analysis of variance, the higher the F ratio the lower the
p-value.
a) True <= This is true. Note that the F ratio here is based on the residual
mean squared error. Covariates are used in a regression model and then
the residual error is tested against the null hypothesis that each individual
factor is insignificant in its effect on the dependent variable using chi
squared tests.
b) False
c) Indeterminate
Part 3
Sam has a growing business in specialized business systems. He competes on price,
service and delivery time. Sam would like to model his order flow using price, service
response time and promised delivery time as explanatory variables. The file Sam.sf has
monthly data on order flow, pricing, service call response time (hours) and delivery time
(days). Sam’s order flow is a combination of new customers and repeat customers. Most
of his repeat customers are on a six month reorder cycle.
Open the SAM.sf file and look at the order data in the special/time series/descriptive
window.
11) Characterize the data:
a) Trending up
b) Trending up with regular cycles <= The data is clearly trending up with an
overlay of cycles with a six month period.
c) Stationary
d) Random
e) Random Walk
Build a multiple regression model for orders with the independent variables price, hours
and days.
12) Looking at the analysis of variance table, can we reject the idea that the model is not
statistically significant?
a) Yes <= The p-value says we can reject this idea at the 99% confidence
level.
b) No
c) Impossible to tell
13) Looking at the multiple regression table, which variables are significant in the model?
a) Price, Hours and Days
b) Days only
c) Price only <= With this model only price is a significant variable, as it is
the only one with a p-value less than 0.10 in the multiple regression table.
d) Hours only
e) None of the above
14) About what percent of the variation in orders is explained by this model?
a) 95%
b) 22% <= Look at the R-squared value.
c) 45%
d) 75%
e) 35%
Based on your viewing of the Orders data and on the information given in the problem
statement include an appropriate lag for orders in the model.
The problem statement notes a six month reorder cycle. The data shows regular
six month cycles on top of the trend. The autocorrelation for Orders is more
subtle, having a blip above the trend at lag 6. You need to find the lag of six
months.
15) Is the lag term statistically significant in the multiple regression analysis?
a) Yes <= Note that if lag(orders,1) and lag(orders,6) are both included in
the model then only lag(orders,6) is found to be statistically significant.
b) No
c) Can’t tell
There is an obvious outlier. Create and use a dummy variable to remove its effect on the
error.
With lag(orders,6) in the model the outlier is at row 30. It shows up clearly in the
graph of residuals vs X, and in the table of unusual residuals.
16) With the dummy variable in the model about what percent of the variation in orders is
explained by the model?
a) 90%
b) 95%
c) 98%
d) 99% <= R squared =98.9
e) 16%
Save the residuals using the save output button. Delete the blank cells in the first six
rows and test the residuals.
17) Can we reject the idea that the residuals are random?
a) Yes
b) No <= Run tests for randomness from the Special/Time
series/Descriptive Methods window. (Use the yellow button and select
tests for randomness). The p-values for all of the tests are greater than
0.10 indicating that the null hypothesis should not be rejected (data is
drawn at random from a probability distribution).
c) Can’t tell
18) Can we reject the idea that the residuals are normal?
a) Yes
b) No <= Run tests for normality from the Describe/Distributions/Distributions
Fitting (Uncensored Data) window (use the yellow button to select tests for
normality). The p-values for all of the tests are greater than 0.10
indicating that the null hypothesis should not be rejected (data is drawn
from a normal distribution).
c) Can’t tell
19) Is the residual error of the common cause variety?
a) Yes <= Since we can’t reject the idea that the error is both random and
normal.
b) No
c) Can’t tell
20) Is the model complete?
a) Yes <= R squared of 99% and common cause error for residuals, the
model effort is complete (you may still need to simplify).
b) No
c) Can’t tell
Download