252x0641 12/15/05 Name and Class hour:_________________________

advertisement
252x0641 12/15/05
ECO252 QBA2
Final EXAM
December , 2006
Version 1
Name and Class hour:_________________________
I. (18+ points) Do all the following. Note that answers without reasons and citation of appropriate
statistical tests receive no credit. Most answers require a statistical test, that is, stating or implying a
hypothesis and showing why it is true or false by citing a table value or a p-value. If you haven’t done it
lately, take a fast look at ECO 252 - Things That You Should Never Do on a Statistics Exam (or Anywhere
Else)
Regression A seeks to explain the selling price of a home in terms of a group of variables explained on the
output sheet. Note that regressions 1 and 7 are identical. Look at the definitions of the variables carefully
and, in particular, notice which are interaction variables.
a) The homes in this regression are in three different areas. There are dummy variables to indicate that the
homes are in Area 1 or Area 2. Why isn’t there a dummy variable for Area 3? (1)
b) In Regression 1, what coefficients are significant at the 5% level? (2)
c) What independent variables did I remove from the problem to get to Regression 2 from Regression 1?
Why? (2)
d) Following the same process, I went on to remove one or more variables each time until I got to
Regression 5. When I got to Regression 5 I ran the ‘best subsets regression.’ 6. I concluded that it was time
to quit removing variables. Between the best subsets regression and the characteristics of the coefficients of
the results in Regression 5 I felt that I had gone as far as was reasonable in removing independent variables.
What are the three things that led me to think that regression 5 was the best that I could do? (3)
e) Using Regression 5 and assuming that all homes have two baths, Regression 5 effectively becomes 3
regressions relating price to living area. Take the coefficient of bath, multiply it by two and add it to the
constant to get the effective intercept for homes with two baths. Using L or any other symbol that you find
convenient for living area, what are the equations relating living area to price in (3 points)
Area 1?
Area 2?
Area 3?
[11]
f) Continuing with Regression 5 and assuming that a home has 2(thousand) square feet of living area and 2
baths, what would it sell for in
Area 1?
Area 2?
Area 3?
What is the percent difference between the lowest and highest price? (2)
1
252x0641 12/15/05
g) We have not yet dealt with the question of whether the coefficients in Regression 5 are reasonable. In
order to do this look at two homes in Area 1 that have two baths. If one has 2(thousand) square feet of
living area and the other 3, how would there prices differ? Does that seem reasonable? Try the same for a
home in area 3. (3)
[16]
h) As I warned you, I now repeated Regression 1 as Regression 7, without using the VIFs. Much to my
surprise, I ended up dropping the same variables as I did after Regression 1. Why? (1)
i) Continuing in the same way, I worked myself to Regression 9. Looking at the things I usually check, this
looked pretty good. Then I tried to check the coefficients in the same way that I did in g). Why was I very
unhappy? What is there in Regression 8 that could explain these results? (4)
j) Regression 11 is a stepwise regression. The printout, which continues on page 7 presents four different
possible regressions in column form. Look at in each case a coefficient has a t-value under it and a p-value
for a significance test. After the fourth try, the computer refused to add any more independent variables.
The only regression here that I thought was worth looking at was the one with four independent variables.
What can you tell me about its acceptability? (3)
[24]
k) Do an F test to compare regressions 2 and 3 and to find out if lot 1 and lot 2 have any explanatory power.
(3)
II. Hand in your third computer problem. (2 to 7 points)
2
252x0641 12/15/05
III. Do at least 4 of the following 7 Problems (at least 12 each) (or do sections adding to at least 50 points –
(Anything extra you do helps, and grades wrap around). You must do parts a) and b) of problem 1. Show
your work! State H 0 and H1 where applicable. Use a significance level of 5% unless noted otherwise.
Do not answer questions without citing appropriate statistical tests – That is, explain your hypotheses
and what values from what table were used to test them. Clearly label what section of each problem
you are doing! The entire test has about 151 points, but 70 is considered a perfect score. Don’t waste
our time by telling me that two means, proportions, variances or medians don’t look the same to you.
You need statistical tests! There are two blank pages below.
1. a) If I want to test to see if the mean of x 2 is larger than the mean of x1 my null hypotheses are:
(Note: D  1   2 )
i) 1   2 and D  0
ii) 1   2 and D  0
v) 1   2 and D  0
vi) 1   2 and D  0
iii) 1   2 and D  0
vii) 1   2 and D  0
iv) 1   2 and D  0
viii) 1   2 and D  0 (2)
The first two columns below represent times for 25 workers on an industrial task. The third column is the
difference between them
d
Row
x1
x2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
6.11
5.13
6.42
4.65
5.82
4.08
4.01
5.26
5.25
7.66
6.29
5.41
6.17
5.50
4.06
6.19
6.71
4.41
5.25
4.85
6.50
5.24
7.29
4.99
4.26
4.81
4.19
5.17
4.07
4.58
2.97
3.39
4.14
4.31
6.68
5.37
3.95
4.93
4.04
2.40
4.71
5.93
2.93
4.25
4.41
4.68
3.50
6.09
2.87
3.06
1.30
0.94
1.25
0.58
1.24
1.11
0.62
1.12
0.94
0.98
0.92
1.46
1.24
1.46
1.66
1.48
0.78
1.48
1.00
0.44
1.82
1.74
1.20
2.12
1.20
Assume that   .05 . Minitab gives us the following summary (edited).
Descriptive Statistics: x1, x2, d
Variable
x1
x2
d
N
25
25
25
N*
0
0
0
Mean
5.50
4.30
1.20
SE Mean
0.200
0.212
…………
StDev
1.00
1.06
………
Minimum
4.010
2.400
0.4400
Q1
4.750
3.445
0.9400
Median
Q3 Maximum
5.260 6.240 7.660
4.250 4.870 6.680
1.200 1.4700 2.120
In the d column, the column sum is 30.08 and the sum of the first 24 numbers squared is 38.585. Do not
recompute things that have been done for you if you want to ever get much done on this exam.
Clearly label parts b, c, d etc. The null hypothesis is the same for parts c, d and e, so state it clearly.
b). Find the sample variance for the d column. (2)
c) On the assumption that the underlying distributions are Normal and that the first two columns represent
independent samples from populations that represent plants 1 and 2 and come from populations with similar
3
252x0641 12/15/05
variances, can we conclude that average workers in plant 2 complete the task faster than those in plant 1?
(4)
d) (Extra credit) Repeat part c) after dropping the assumption that the variances are similar. (5)
e) Actually, these data supposedly represent performance of a single sample of 25 workers on two
administrations of a standard test of manual dexterity. The question was ‘Did the time for the test improve
between the first and second administration?’ (3)
[11]
f) Assume that the means above come from independent samples, but that the data represent samples for
populations with known population variances of 1.00 and 1.06. Test the null hypothesis that you used in
part c) and find an exact p-value. (3)
[14]
g) Using the value of s d that you used in e), make a confidence interval with a confidence level of 92%.
You must find the value of z  needed to do this first. Of course, it is not on the t-table. (2) [16]
2
4
252x0641 12/15/05
2. Let us expand the problem of question 1 by adding another column. The full data set with lots done for
you looks like this. The first three columns represent the given data. In the next three columns I have take
the first three columns and squared them. I have added the first three rows to get the seventh column. I have
computed row means in the 9th column. The tenth column is a row sum of squares. In the 11th to the 13th
columns the numbers in the first three columns are ranked from 1 to 75. Sums are provided for all 13
columns.
(1)
(2)
(3)
(4)
(5)
(6)
Row
x 1
x 2
x 3
x21
x22
x 23
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
x1
6.11
5.13
6.42
4.65
5.82
4.08
4.01
5.26
5.25
7.66
6.29
5.41
6.17
5.50
4.06
6.19
6.71
4.41
5.25
4.85
6.50
5.24
7.29
4.99
4.26
x2
4.81
4.19
5.17
4.07
4.58
2.97
3.39
4.14
4.31
6.68
5.37
3.95
4.93
4.04
2.40
4.71
5.93
2.93
4.25
4.41
4.68
3.50
6.09
2.87
3.06
x3
5.46
4.66
5.80
4.36
5.20
3.53
3.70
4.70
4.78
7.17
5.83
4.68
5.55
4.77
3.23
5.45
6.32
3.67
4.75
4.63
5.59
4.37
6.69
3.93
3.66
x1sq
37.3321
26.3169
41.2164
21.6225
33.8724
16.6464
16.0801
27.6676
27.5625
58.6756
39.5641
29.2681
38.0689
30.2500
16.4836
38.3161
45.0241
19.4481
27.5625
23.5225
42.2500
27.4576
53.1441
24.9001
18.1476
x2sq
23.1361
17.5561
26.7289
16.5649
20.9764
8.8209
11.4921
17.1396
18.5761
44.6224
28.8369
15.6025
24.3049
16.3216
5.7600
22.1841
35.1649
8.5849
18.0625
19.4481
21.9024
12.2500
37.0881
8.2369
9.3636
x3sq
29.8116
21.7156
33.6400
19.0096
27.0400
12.4609
13.6900
22.0900
22.8484
51.4089
33.9889
21.9024
30.8025
22.7529
10.4329
29.7025
39.9424
13.4689
22.5625
21.4369
31.2481
19.0969
44.7561
15.4449
13.3956
(2)
(3)
Sum (1)
(4)
(5)
(6)
(7)
x
i
rsum
16.38
13.98
17.39
13.08
15.60
10.58
11.10
14.10
14.34
21.51
17.49
14.04
16.65
14.31
9.69
16.35
18.96
11.01
14.25
13.89
16.77
13.11
20.07
11.79
10.98
(7)
(8)
(9)
(10)
x
2
i
x i
x i2
rmean
5.46
4.66
5.80
4.36
5.20
3.53
3.70
4.70
4.78
7.17
5.83
4.68
5.55
4.77
3.23
5.45
6.32
3.67
4.75
4.63
5.59
4.37
6.69
3.93
3.66
rmsq
29.8116
21.7156
33.6013
19.0096
27.0400
12.4374
13.6900
22.0900
22.8484
51.4089
33.9889
21.9024
30.8025
22.7529
10.4329
29.7025
39.9424
13.4689
22.5625
21.4369
31.2481
19.0969
44.7561
15.4449
13.3956
rssq
90.280
65.589
101.585
57.197
81.889
37.928
41.262
66.897
68.987
154.707
102.390
66.773
93.176
69.325
32.676
90.203
120.131
41.502
68.188
64.408
95.401
58.805
134.988
48.582
40.907
(8)
(9)
(10)
(11)
r1
rank1
63.0
44.0
68.0
31.0
59.0
19.0
15.0
50.0
48.5
75.0
66.0
52.0
64.0
55.0
17.0
65.0
72.0
27.5
48.5
41.0
69.0
47.0
74.0
43.0
23.0
(11)
(12) (13)
r2
r3
rank2
40.0
21.0
45.0
18.0
29.0
4.0
7.0
20.0
24.0
70.0
51.0
14.0
42.0
16.0
1.0
36.0
61.0
3.0
22.0
27.5
33.5
8.0
62.0
2.0
5.0
rank3
54.0
32.0
58.0
25.0
46.0
9.0
12.0
35.0
39.0
73.0
60.0
33.5
56.0
38.0
6.0
53.0
67.0
11.0
37.0
30.0
57.0
26.0
71.0
13.0
10.0
(12) (13)
The sums of the columns will not fit on the table so they are printed here.
(1)Sum of x1 = 137.51; (2)Sum of x2 = 107.43; (3)Sum of x3 = 122.48;
(4)Sum of x1sq = 780.400; (5)Sum of x2sq = 488.725; (6)Sum of x3sq = 624.649;
(7)rsum = 367.42; (8)Sum of rmean = 122.473; (9)Sum of rmsq = 624.587; (10)Sum of rssq = 1893.77;
(11)Sum of rank 1 = 1236.5; (12)Sum of rank 2 = 662; (13)Sum of rank 3 = 951.5. You are left to find
column means and the grand mean. Please avoid Recomputing stuff that I have done for you. Life is not
that long. You will need to get column and overall means. Almost everything else is done for you.
a) Consider the first three columns to be three independent random samples from Normal distributions with
similar variances. Compare the means using an appropriate statistical test or tests. (6)
b) Actually as in 1e) these data represent three tests of a single random sample of 25 workers. Consider the
data blocked by worker and compare means. (4)
c) Consider the first three columns to be three independent random samples from a distribution that is not
Normal. Compare the medians using an appropriate statistical test or tests. (5) [31]
5
252x0641 12/15/05
(Blank)
6
252x0641 12/15/05
3. A sales manager wishes to predict newspaper circulation on Sunday ( y ) on the basis of weekday
morning circulation ( x1 ) weekday evening circulation ( x 2 ) and time ( x3 ). The data is below (Use
  .01) . All circulation data is in millions sold.
Row
1
2
3
4
5
6
7
8
9
y
x1
x2
x3
S
54
55
55
56
56
57
58
59
60
AM
27
29
30
33
34
35
36
37
39
PM
34
33
31
29
29
28
26
25
24
T
1
2
3
4
5
6
7
8
9
The quantities below are given:
y  510,
x  300,
n 9,



x 22
 7549,
compute
1
 x y  ?,  x
1
 x y as part of a).
2y
x
2
y
and  x x
 259,
 14623
2
1 2
 28932,
x
2
1
 10126,
 8525. Yes, you will have to
1
You do not need all of these.
a) Compute a simple regression of Sunday circulation against morning circulation.(8)
b) Compute R 2 (4)
c) Compute s e (3)
d) Compute s b1 ( the std deviation of the slope) and do a confidence interval for 1 .(3)
e) Do a prediction interval for units when morning circulation rises to 45 million. (3) Why is this interval
likely to be larger than other prediction intervals we might compute for morning circulation we have
actually observed? (1)
[53]
7
252x0641 12/15/05
4. Data from problem 3 is repeated. (Use   .01) .
A sales manager wishes to predict newspaper circulation on Sunday ( y ) on the basis of weekday morning
circulation ( x1 ) weekday evening circulation ( x 2 ) and time ( x3 ). The data is below (Use   .01) . All
circulation data is in millions sold.
Row
1
2
3
4
5
6
7
8
9
y
x1
x2
x3
S
54
55
55
56
56
57
58
59
60
AM
27
29
30
33
34
35
36
37
39
PM
34
33
31
29
29
28
26
25
24
T
1
2
3
4
5
6
7
8
9
The quantities below are given:
y  510,
x  300,
n 9,


x 22
 7549,

1
 x y  ?,  x
1
2y
x
2
 259,
 14623 and
 y  28932,  x
 x x  8525.
2
2
1
 10126,
1 2
a) Do a multiple regression of Sunday circulation against morning and evening circulation. (12)
b) Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem. Compare
the values of R 2 adjusted between this and the previous problem. Use an F test to compare R 2 here with
the R 2 from the previous problem.(6)
c) Compute the regression sum of squares and use it in an F test to test the usefulness of this regression. (5)
d) Use your regression to predict Sunday circulation when AM circulation is 40 and PM circulation is 23.
(2)
e) Use the directions in the outline to make this estimate into a confidence interval and a prediction interval.
(4)
[82]
8
252x0641 12/15/05
5. Data from problem 3 is repeated. (Use   .01) .
A sales manager wishes to predict newspaper circulation on Sunday ( y ) on the basis of weekday morning
circulation ( x1 ) weekday evening circulation ( x 2 ) and time ( x3 ). The data is below (Use   .01) . All
circulation data is in millions sold.
The time variable is now added with the following results.
MTB >
SUBC>
SUBC>
Regress c1 3 c2 c3 c10;
VIF;
DW.
Regression Analysis: S versus AM, PM, T
The regression equation is
S = 62.2 - 0.253 AM - 0.071 PM + 0.991 T
Predictor
Constant
AM
PM
T
Coef
62.19
-0.2533
-0.0707
0.9913
S = 0.453612
SE Coef
17.23
0.2960
0.3642
0.4957
R-Sq = 96.8%
Analysis of Variance
Source
DF
SS
Regression
3 30.971
Residual Error
5
1.029
Total
8 32.000
T
3.61
-0.86
-0.19
2.00
P
0.015
0.431
0.854
0.102
VIF
53.6
61.6
71.7
R-Sq(adj) = 94.9%
MS
10.324
0.206
F
50.17
P
0.000
Durbin-Watson statistic = 2.13279
a) What do the significance tests on the coefficients reveal? Give reasons. (2)
b) Can you explain why the coefficients of AM and PM seem unreasonable? What is the apparent reason for
this? (2)
c) Do a 2% two-sided Durbin-Watson test on the result as suggested in class. What is the hypothesis tested
and what is the result? (3)
d) Reuse your spare parts from the previous regression if possible to compute the correlation between AM
and PM circulation and test it for significance. (4)
e) Compute a rank correlation between AM and PM circulation and test it for significance. Can you explain
why it is larger than the correlation in d)? (4)
f) Test the hypothesis that the correlation that you computed in d) is -.99. (4) [101]
g) (Extra credit) If AM, PM and T are x1 , x 2 and x3 , find the partial correlation coefficient (square root
of the coefficient of partial determination) rY 3.12 . (2)
9
252x0641 12/15/05
6. The following times were recorded for 6 skiers on 3 slopes. In order to assess their difficulty we look at
the median time for each slope. We do not assume a Normal distribution. Do not compute the median or
mean time for any slope.
Skier
Slope 1
Slope 2
Slope 3
1
4.9
6.1
5.2
2
4.5
6.0
5.1
3
4.1
5.4
4.9
4
4.4
4.7
5.1
5
4.5
4.9
4.5
6
3.3
3.8
3.9
a) Test the hypothesis that the median time on slope 1 is 4 minutes (3 or 2 depending on method) (3)
b) Test the hypothesis that slope 1 and slope 2 have the same median times. (4)
c) Test the hypothesis that the slopes all have the same median time. (4)
d) Explain what methods you would use in b) and c) if the columns were independent random samples. (1)
e) Rank the skiers times on each slope from 1 (fastest) to 6. Use these as rankings of the skiers and test to
see if the ranks agree between slopes. (4)
[117]
10
252x0641 12/15/05
7. Clarence Sales is a marketing major and knows that national soft drink market shares are as below.
Classic Coke
15.6%
Pepsi
13.2%
Diet Coke
5.1%
Diet Pepsi
3.5%
Other brands
62.6%
He gets in a bit of trouble here and is sentenced to 20 hours of public service. After he finishes his public
service he takes off for Maine, gets caught littering and is sentenced to another 20 hours of public service.
During his public service, he picks up 100 cans in each state. The cans are as below.
Brand
PA
ME
Classic Coke
22
17
Pepsi
15
11
Diet Coke
13
10
Diet Pepsi
6
5
Other brands
44
57
Use a 1% significance level throughout this problem. Don’t waste our time by just computing percents and
saying that they are different. Each problem requires a statistical test or the equivalent. State your null and
alternative hypotheses in each problem.
a) Regard the cans picked up as a random sample of sales in the two states. Can we say that the proportions
of soft drink cans discarded in Pennsylvania are the same as the national market shares? (5)
b) Clarence knows that that Maine is Moxie country, so he believes that the proportion of other brands sold
is higher in Maine than in Pennsylvania. Is this true? (4)
c) Create a 0.2% 2-sided confidence interval for the difference between the proportions of other brands sold
in Maine. Using your Normal table, make this into a 0.1% 2-sided interval. (3)
d) Actually Clarence’s mother owns the Coke franchise for Maine and last year between her sales of Classic
Coke and Diet Coke accounted for 25% of the soft drink market in Maine. She tells Clarence that her sales
are now above 25%. On the basis of Clarence’s Maine sample is that true? (2) [131]
11
252x0641 12/15/05
(Blank)
12
Download