4/23/02 252x0231 c ECO252 QBA2 Name

advertisement
4/23/02 252x0231 c
(Page layout view!)
ECO252 QBA2
THIRD HOUR EXAM
April 18, 2002
Name
Hour of Class Registered (Circle)
MWF TR 10 12 12:30 2:00
I. (10+ points) Do all the following;
1. Hand in your computer printouts for problems 2 and 3.(5 points – 3 point penalty for not handing in).
remember that the ANOVA printout must be completed, using a 5% significance level, for full credit. I
should be able to tell what is tested and what are the conclusions.
2. a. In particular, is the interaction between car and driver significant? Which numbers made you think
that? (2)
b. Create two confidence intervals for the difference between the means for cars 3 and 4, one that is valid
alone, and one that is valid simultaneously with other similar intervals. Do these intervals show a significant
difference between these two means? Why? (4)
c. In your income and education regression,
(i) explain what coefficients are significant and why? (2)
(ii) What income would you predict for someone with 2 years of education? (1)
(iii) Make a confidence interval for the income of someone with 2 years of education using some
of the information generated by Minitab below. (2)
Descriptive Statistics
Variable
Educ
N
32
Mean
12.000
Median
12.000
TrMean
12.071
Variable
Educ
Min
4.000
Max
20.000
Q1
8.000
Q3
16.000
StDev
4.363
Column Sum of Squares
Sum of squares (uncorrected) of Educ
=
5198.0
SEMean
0.771
4/12/02 252x0231
II. Do at least 4 of the following 5 Problems (at least 10 each) (or do sections adding to at least 40 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where
applicable. Never say 'yes' or 'no' without a statistical test.
1. On the following pages there are printouts from two computer problems.
a. The One-way ANOVA Problem ( Albright, Winston, Zappe - abbreviated): An automobile parts producer
has instituted an employee empowerment program in five plants. Random samples of employees in
each plant are asked to rate the success of the program on a 1 to 10 scale. 10 being the highest
rating. They want to know if the program is being implemented with equal success at each plant
and are thus looking to see if there is a significant difference between mean ratings at each plant.
They are assuming that the results are distributed according to Normal distributions with similar
variances.
(i) Indicate what hypothesis was tested, what the p-value was and whether, using the p-value, you
would reject the null if () the significance level was 5% and () the significance level was 1%.
Explain why. Does this mean that the success was equal in all plants? (3)
(ii) Do a 'normal' and a Scheffe confidence interval   .05  for the difference between the means
in the two plants that were most successful. Do these intervals indicate a difference in the success
of the program between these two plants? Why? (4.5).
(iii) The printout gives 95% confidence intervals for the means for each plant. Find the numbers
for the confidence interval for 'South.' Why is this interval larger than the others? (2.5)
(iv) I would question whether ANOVA was appropriate for this problem because there is no
evidence that the underlying populations are Normally distributed. What method would I prefer for
this problem? (1)
b. The Regression Problem: This relates the number of shares in thousands to the age of board members of
a corporation.
(i) Looking at significance tests and the value of R-squared, how successful is this regression?
Why? Why shouldn't this surprise you? (3)
(ii) Note that c1 contains 'shares' and that c4 contains predicted values of 'shares.' Add a regression
line to the graph. (1)
(ii) What equation relates the number of shares owned to the age of the board member? How many
shares does it say that we should expect a 82-year old board member to own? Would you take this
seriously? Why? (2)
2
4/12/02 252x0231
One-way ANOVA problem
Worksheet size: 100000 cells
MTB > RETR 'C:\MINITAB\2X0231-1.MTW'.
Retrieving worksheet from file: C:\MINITAB\2X0231-1.MTW
Worksheet was saved on 4/ 9/2002
MTB > print c1-c5
Data Display
Row
south
midwest
n-east
s-west
west
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
7
1
8
7
2
9
3
8
5
7
4
7
6
10
3
9
10
8
4
3
2
7
7
5
10
10
6
3
5
2
6
4
5
2
7
8
7
7
5
5
4
4
2
4
5
5
3
3
3
5
5
6
4
7
10
7
6
7
7
4
4
7
8
9
10
4
10
5
6
6
6
6
6
3
4
8
6
2
4
5
6
4
7
4
3
5
4
7
6
4
MTB > AOVOneway c1 c2 c3 c4 c5.
One-Way Analysis of Variance
Analysis of Variance
Source
DF
SS
Factor
4
57.45
Error
85
386.15
Total
89
443.60
Level
south
midwest
n-east
s-west
west
N
11
26
14
18
21
Pooled StDev =
Mean
5.545
6.000
4.286
6.722
5.048
MS
14.36
4.54
StDev
2.697
2.623
1.267
2.081
1.532
F
3.16
p
0.018
Individual 95% CIs For Mean
Based on Pooled StDev
---------+---------+---------+------(--------*-------)
(-----*-----)
(-------*------)
(------*-----)
(------*-----)
---------+---------+---------+-------
2.131
Regression Problem
Worksheet size: 100000 cells
MTB > RETR 'C:\MINITAB\2X0231-5.MTW'.
Retrieving worksheet from file: C:\MINITAB\2X0231-5.MTW
Worksheet was saved on 4/10/2002
MTB > #252sols3
MTB > print c1 c2
Data Display
3
4/12/02 252x0231
Row
shares
age
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
7.9
66.4
29.7
60.5
10.4
28.7
86.9
121.1
35.3
2.8
74.4
11.1
9.1
19.1
18.8
3.1
96.5
47.0
31.1
53
60
69
49
67
68
46
62
63
55
57
71
66
70
66
57
54
64
56
MTB > plot c1*c2 (plot omitted)
MTB > regress c1 on 1 c2 c3 c4
Regression Analysis
The regression equation is
shares = 154 - 1.88 age
Predictor
Constant
age
Coef
154.14
-1.881
Stdev
64.88
1.062
s = 33.04
R-sq = 15.6%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
1
17
18
t-ratio
2.38
-1.77
p
0.030
0.094
R-sq(adj) = 10.6%
SS
3425
18557
21982
MS
3425
1092
F
3.14
Unusual Observations
Obs.
age
shares
Fit Stdev.Fit
8
62.0
121.10
37.52
7.71
R denotes an obs. with a large st. resid.
Residual
83.58
St.Resid
2.60R
plot c4*c2 (plot omitted)
plot c4*c2 c1*c2;
symbol;
type 3 1;
color 8 9;
overlay.
end
100
C4
MTB >
MTB >
SUBC>
SUBC>
SUBC>
SUBC>
MTB >
p
0.094
50
0
50
60
70
age
4
4/12/02 252x0231
2. A researcher believes that the data below has a Normal distribution with a mean of 80 and a standard
x   x  80

deviation of 5. For your convenience the values of z 
are computed for you.

5
a. Use a chi-squared test to find out if the distribution is correct. (9)
b. Is there a better way to do this problem than chi-squared? Why? Do it. (5)
c. Assume that, instead of using population means given above, we actually checked the data and
found that x  80 and s  5. How would this change what we did in a)? (1)
d. Assume that, instead of using population means given above, we actually checked the data and
found that x  80 and s  5. How would this change what we did in b)? (1)
x interval z interval
below 70
below -2.0
70-74
-2.0 to -1.2
74-78
-1.2 to -0.4
78-82
-0.4 to 0.4
82-86
0.4 to 1.2
above 86
above 1.2
Observed
Frequency
3
20
53
52
46
26
200
5
4/12/02 252x0231
3. (Weirs) A maker of stain removers is testing the effectiveness of four different formulations of a new
product. Columns represent formulations 1-4 of the product and the 6 rows represent different stains
(Creosote, crayon, motor oil, grape juice, ink, coffee). Each formulation is rated on a 1-10 scale for its
effectiveness.
Stain
1
2
3
4
5
6
Sum
Count
Form 1 Form 2 Form 3 Form 4
2
7
3
6
9
10
7
5
4
6
1
4
9
7
4
5
6
8
4
4
9
4
2
6
39
42
21
30
6
6
6
6
Sum of
Squares
299
314
sum count
18
4
31
4
15
4
25
4
22
4
21
4
132
24
24
Sum of squares
98
255
69
171
132
137
862
95
a. Assume that the parent distribution is Normal and compare the mean ratings for the four formulations,
noting the fact that it is cross-classified. Use   .10 . (14) Note: If you wish to ignore that the fact that the
data is classified by stain type, indicate this now and compare the column means assuming that the data is
four independent random samples from a Normal distribution.(10). (   .10 )
b. Using the same significance level, assume that Formulation 1 is the current formula and use Scheffe
intervals to see which formulations have mean ratings that differ significantly from the current formulation.
(4)
c. Using a significance level of 15%, repeat the analysis in b) using Bonferroni intervals. (4)
6
4/12/02 252x0231
3(ctd.). d. Actually, when Weirs presented the data in the previous problem, repeated below, he assumed
that the underlying distribution was not Normal. So compare the median ratings using a 10% significance
level. (6)
Stain
1
2
3
4
5
6
Sum
Count
Sum of
Squares
Form 1 Form 2 Form 3 Form 4
2
7
3
6
9
10
7
5
4
6
1
4
9
7
4
5
6
8
4
4
9
4
2
6
39
42
21
30
6
6
6
6
299
314
sum count
18
4
31
4
15
4
25
4
22
4
21
4
132
24
24
Sum of squares
98
255
69
171
132
137
862
95
7
4/12/02 252x0231
4. Use methods appropriate to testing goodness of fit.
a. Test the hypothesis that the numbers below came from a Normal distribution. Use a 10%
significance level. (6) note that Minitab says the following:
mean
294.444
stdev
52.6548
n
9.00000
b. Test the hypothesis that the numbers below came from a Normal distribution with a mean of
230 and a standard deviation of 50 (6)
235 219 269 277 289 298 330 354 379
8
4/12/02 252x0231
5. (Weirs) The following data gives years of membership and numbers of shares (in thousands) owned for 8
board members of our corporation. Numbers are the dependent variable and years is the independent
variable.
Data Display
Row
1
2
3
4
5
6
7
8
Total
share
years
300
408
560
252
288
650
630
522
3610
6
12
14
6
9
13
15
9
84
years
shares
squared squared
36
90000
144
166464
196
313600
36
63504
81
82944
169
422500
225
396900
81
272484
968 1808396
Note that n  8 and that you will have to compute
 xy .
a. Compute the regression equation Y  b0  b1 x to predict thousands of shares owned on the basis
of age. (6)
b. On the basis of your regression, how many thousands of shares do you expect to be owned by
someone who has been on the board for 20 years ? (1)
c. Compute R 2 . (4)
d. Compute s e . (3)
e. Compute s b0 and do a significance test on b0 .(4)
f.. Do an interval that shows the average number of shares that would be owned by someone who
has been on the board for 20 years. (3)
g. Using your SST etc., put together the ANOVA table (6)
9
4/12/02 252x0231
(Intentionally left blank for calculations)
10
Download