252x0333 11/18/03 ECO252 QBA2 Name

advertisement
252x0333 11/18/03
(Page layout view!)
ECO252 QBA2
THIRD HOUR EXAM
Nov 25 2003
Name
Hour of Class Registered (Circle)
I. (30+ points) Do all the following (2points each unless noted otherwise).
TABLE 11-0
Shiffler and Adams present the partially complete ANOVA table below that resulted from the analysis of a problem with 3 rows
and 3 columns.
ANOVA
Source of Variation
SS
Columns
18
Rows
40
df
MS
F
F
Interaction
Within (Error)
208
Total
296
62
1.
Complete the table. Assume a 5% significance level. You may not be able to get exactly the
degrees of freedom you are looking for, but you should be able to come close. (4)
2.
Is there significant interaction? Explain your answer.
TABLE 13-6
The following Minitab table (with many parts deleted) was obtained when "Score received on an exam
(measured in percentage points)" (Y) is regressed on "percentage attendance" (X) for 22 students in a
Statistics for Business and Economics course.
Regression Analysis: Orders versus Weight
The regression equation is
Score = …… + ….. Attendance
Predictor
Constant
Attendance
Coef
39.3927
0.34058
S = 20.2598
SE Coef
37.2435
0.52852
R-Sq = 2.034%
T
1.0576
0.6444
P
0.3028
0.5266
R-Sq(adj) = -2.864%
Analysis of Variance
Source
Regression
Residual Error
Total
3.
DF
1
20
21
SS
MS
F
P
0.523
Referring to Table 13-6, which of the following statements is true?
a) -2.86% of the total variability in score received can be explained by percentage
attendance.
b) -2.86% of the total variability in percentage attendance can be explained by score
received.
c) 2% of the total variability in score received can be explained by percentage attendance.
d) 2% of the total variability in percentage attendance can be explained by score received.
252x0333 11/18/03
4.
Referring to Table 13-6, which of the following statements is true?
a) If attendance increases by 0.341%, the estimated average score received will increase by
1 percentage point.
b) If attendance increases by 1%, the estimated average score received will increase by
39.39 percentage points.
c) If attendance increases by 1%, the estimated average score received will increase by
0.341 percentage points.
d) If the score received increases by 39.39%, the estimated average attendance will go up by
1%.
5.
(Text CD problem 12.51)The manager of a commercial mortgage department has collected data
over 104 weeks concerning the number of mortgages approved. The data is the x and O columns
below ( x is the number of mortgages approved and O is the number of weeks that happened, for
example there were 32 weeks in which 2 mortgages were approved) and the problem asks if it
follows a Poisson distribution.
x
O
Row
E
1
2
3
4
5
6
7
8
9
10
11
12
13
0
1
2
3
4
5
6
7
8
9
10
11
12
13
25
32
17
9
6
1
1
0
0
0
0
0
104
12.7355
26.7445
28.0817
19.6572
10.3200
4.3344
1.5170
0.4551
0.1195
0.0279
0.0059
0.0011
0.0002
104.000
Since we have no guide as to what the parameter of the distribution is, the x and O columns
were multiplied together to tell us that there were 219 mortgages approved over 104 weeks to
give us an average of 2.1 mortgages per week. The E above is the computer – generated Poisson
distribution multiplied by 104 .
In a Kolmogorov – Smirnov procedure we make the O and E into cumulative distributions and
compare them as is done below.
Row
Fo
Fe
D
1
2
3
4
5
6
7
8
9
10
11
12
13
0.12500
0.36538
0.67308
0.83654
0.92308
0.98077
0.99038
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
0.12246
0.37962
0.64963
0.83864
0.93787
0.97955
0.99414
0.99851
0.99966
0.99993
0.99999
1.00000
1.00000
0.0025435
0.0142304
0.0234453
0.0021047
0.0147973
0.0012180
0.0037536
0.0014857
0.0003369
0.0000689
0.0000126
0.0000019
0.0000000
Assume this is correct and explain how you would finish this analysis and why you would or
would not reject the null hypothesis. (4)
2
252x0333 11/18/03
6.
Referring to the previous problem, a more direct method of comparing the observed and expected
data is below. Answer the following questions.
a) What method is being used? (1)
b) How many degrees of freedom do we have? (1)
c) Why are the columns shorter here than in Problem 5? (1)
d) Do we reject our null hypothesis? Why? (3)
O2
Row
O
E
E
1
2
3
4
5
6
7
13
25
32
17
9
6
2
104
12.7355
26.7445
28.0817
19.6572
10.3200
4.3344
2.1267
104.0000
13.2700
23.3693
36.4650
14.7020
7.8488
8.3056
1.8808
105.8415
7.
In problems 5 and 6, one of the methods was used improperly. Which one? Why?
8.
Random samples of salaries (in thousands) for lawyers in 3 cities are presented by Dummeldinger.
They are repeated in the three left columns.
1
2
3
4
5
6
7
Atlanta
45.5
47.9
43.1
42.0
49.0
52.0
39.0
DC
41.5
40.1
39.0
56.5
37.0
49.0
43.0
LA
52.0
72.0
41.0
54.0
33.0
42.0
50.0
rank-At
12.0
13.0
11.0
8.5
14.5
17.5
3.5
80.0
rank-DC
7.0
5.0
3.5
20.0
2.0
14.5
10.0
62.0
rank-LA
17.5
21.0
6.0
19.0
1.0
8.5
16.0
89.0
You are asked to analyze them, which you do using a Kruskal – Wallis procedure. You are aware
that the tables you have are only appropriate for columns with 5 or fewer items in them, so you
drop the last two items in each column and after ranking the items from 1 to 15 get a Kruskal –
Wallis H of 1.82. If you use the tables, What did you test and what is the conclusion? (3)
9.
You remember how to work with column sizes that are too large for the table. You rank the data as
appears in the three right columns above. Compute the Kruskal – Wallis H and use it to test your
null hypothesis at the 5% significance level.(3)
10. The Kruskal – Wallis test above was done on the assumption that the underlying data did not
follow the Normal distribution. Let’s assume that you found out that the underlying distributions
were Normal and had a common variance. The method to use would be.
a) Friedman Test
b) Chi – squared test.
c) One way ANOVA
d) Two – way ANOVA
3
252x0333 11/18/03
TABLE 13-8
The regression equation is
GPA = 0.5681 + .1021 ACT
Predictor
Constant
ACT
S = 0.2691
Coef
.5681
.1021
SE Coef
0.9284
0.0356
R-Sq = 0.5774%
T
0.6119
2.8633
P
0.5630
0.0286
R-Sq(adj) = 0.5069%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
6
7
SS
0.5940
0.4347
1.0287
MS
0.5940
0.0724
F
8.1986
P
.0287
It is believed that GPA (grade point average, based on a four point scale) should have a positive linear
relationship with ACT scores. Given above is the Minitab output from regressing GPA on ACT scores
using a data set of 8 randomly chosen students from a Big Ten university.
11. Referring to Table 13-8, the interpretation of the coefficient of determination in this regression is
that
a) 57.74% of the total variation of ACT scores can be explained by GPA.
b) ACT scores account for 57.74% of the total fluctuation in GPA.
c) GPA accounts for 57.74% of the variability of ACT scores.
d) none of the above
12. Referring to Table 13-8, the value of the measured test statistic to test whether there is any linear
relationship between GPA and ACT is
a) 0.0356.
b) 0.1021.
c) 0.7598.
d) 2.8633.
13. Referring to Table 13-8, what is the predicted average value of GPA when ACT = 20?
a) 2.61
b) 2.66
c) 2.80
d) 3.12
14. Referring to Table 13-8, what are the decision and conclusion on testing whether there is any
linear relationship at the 1% level of significance between GPA and ACT scores?
a) Do not reject the null hypothesis; hence, there is not sufficient evidence to show that ACT
scores and GPA are linearly related.
b) Reject the null hypothesis; hence, there is not sufficient evidence to show that ACT scores
and GPA are linearly related.
c) Do not reject the null hypothesis; hence, there is sufficient evidence to show that ACT
scores and GPA are linearly related.
d) Reject the null hypothesis; hence, there is sufficient evidence to show that ACT scores
and GPA are linearly related.
4
252x0333 11/18/03
ECO252 QBA2
Third EXAM
Nov 25 2003
TAKE HOME SECTION
Name: _________________________
Social Security Number: _________________________
Please Note: computer problems 2 and 3 should be turned in with the exam. In problem 2, the 2 way
ANOVA table should be completed. The three F tests should be done with a 5% significance level and you
should note whether there was (i) a significant difference between drivers, (ii) a significant difference
between cars and (iii) significant interaction. In problem 3, you should show on your third graph where the
regression line is.
II. Do the following: (23+ points). Assume a 5% significance level. Show your work!
1. Assume that each column below represents a random sample of sales of the popular cereal brand,
‘Whee!’ As it was moved from shelf 1 (lowest) to shelf 4 (highest) of a group of supermarkets. Assume that
the underlying distribution is Normal and test the hypothesis 1   2   3   4 .
a) Before you start add the second to last digit of your social security number to the 451 in column 4 and
find the sample variance of sales from shelf 4. For example, Seymour Butz’s SS number is 123456789 and
he will change 451 to 459. This should not change the results by much. (2)
b) Test the hypothesis (6) Show your work – it is legitimate to check your results by running these problems
on the computer, but I expect to see hand computations for every part of them.
c) Compare means two by two, using any one appropriate statistical method, to find out which shelves are
significantly better than others. (3)
d) (Extra Credit) What if you found out that each row represented one store? If this changes your analysis,
redo the analysis. (5)
e) (Extra Credit) What if you found out that each row represented one store and that the underlying
distribution was not Normal? If this changes your analysis, redo the analysis. (5)
f) I did some subsequent analysis on this problem. The output, in part said
Levene's Test (any continuous distribution)
Test Statistic: 0.609
P-Value
: 0.613
What was I testing for and what should my conclusion be? (2)
Row
1
1
2
3
4
5
6
7
8
9
10
336
417
208
420
366
227
357
353
518
388
Sales of ‘Whee’ Cereal
Shelf
2
3
4
440
277
374
421
481
349
328
449
462
373
464
479
492
456
338
413
383
554
497
510
354
423
321
424
518
451
311
462
339
202
Sum of
Sum of
1362860
Sum of
Sum of
1602366
Sum of
Sum of
2140264
shelf 1 = 3590.0
squares of shelf 1 =
shelf 2 = 3954.0
squares of shelf 2 =
shelf 3 = 4586.0
squares of shelf 3 =
5
252x0333 11/18/03
2. A company, operating in 12 regions, gives us its advertising expenses as a percent of those of its leading
competitor, and its sales as a percent of those of its leading competitor.
Row
1
2
3
4
5
6
7
8
9
10
11
12
Ad
77
110
110
93
90
95
100
85
96
83
100
95
Sales
85
103
102
109
85
103
110
86
92
87
98
108
Sum
Sum
Sum
Sum
of
of
of
of
Ad = 1134.0
squares of Ad = 108258
Sales = 1168.0
squares of Sales = 114750
Note that the sum and sum of squares
of sales can’t be used directly, but
they should help you to get the
corrected numbers.
Change the 103 in the ‘sales’ column by adding the second-to-last digit of your Social Security number to it.
For example, Seymour Butz’s SS number is 123456789 and he will change 103 to 111. This should not
change the results by much. The question is whether our relative advertising expenses affect our relative
sales, so ‘Sales’ should be your dependent variable and ‘Ad’ should be your independent variable.
Show your work – it is legitimate to check your results by running the problem on the computer, but I
expect to see hand computations that show clearly where you got your numbers for every part of this
problem.
a. Compute the regression equation Y  b0  b1 x to predict the ‘Sales’ on the basis of ‘Ad’. (2)
b. Compute R 2 . (2)
c. Compute s e . (2)
d. Compute s b0 and do a significance test on b0 (2)
e. Do an ANOVA table for the regression. What conclusion can you draw from this table about the
relationship between advertising expenditures and sales? Why? (2)
f. It is proposed to raise our expenditures to 110% of our competitors’ in every region. Use this
to find a predicted value for sales and to create a confidence interval for sales. Explain the
difference between this and a prediction interval and when the prediction interval would be more
useful. (3)
6
Download