4/25/03 252x0342 ECO252 QBA2 Name

advertisement
4/25/03 252x0342
ECO252 QBA2
FINAL EXAM
May 7, 2003
Name
Hour of Class Registered (Circle)
I. (18 points) Do all the following. Note that answers without reasons receive no credit.
A researcher wishes to explain the selling price of a house in thousands on the basis of its asseseesd
valuation, whether it was new and the time period. New is 1 if the house is new construction, zero
otherwise. The researcher assembles the following data for a random sample of 30 home sales. Use   .10
in this problem.
—————
4/25/2003 9:58:00 PM
————————————————————
Welcome to Minitab, press F1 for help.
MTB > Retrieve "C:\Documents and Settings\RBOVE\My Documents\Drive D\MINITAB\2x03421.MTW".
Retrieving worksheet from file: C:\Documents and Settings\RBOVE\My Documents\Drive
D\MINITAB\2x0342-1.MTW
# Worksheet was saved on Fri Apr 25 2003
Results for: 2x0342-1.MTW
MTB > print c1 - c4
Data Display
Row
Price
Value
New
Time
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
69.00
115.50
100.80
96.90
72.00
61.90
97.00
87.50
96.90
81.50
69.34
97.90
96.00
92.00
94.10
101.90
109.50
88.65
93.00
83.00
106.70
97.90
97.30
90.50
95.90
113.90
94.50
86.50
91.50
93.75
66.28
86.31
84.78
79.74
65.54
59.93
79.98
75.22
81.88
72.94
60.80
81.61
79.11
77.96
78.17
80.24
85.88
74.03
75.27
74.31
84.36
77.90
79.85
74.92
79.07
85.61
76.50
72.78
72.43
76.64
0
0
1
1
0
0
1
0
1
0
0
1
0
0
1
1
1
0
0
0
0
1
1
0
1
0
1
0
0
0
1
2
2
3
4
4
4
5
5
5
6
6
7
9
10
10
10
11
11
11
12
12
12
12
12
13
14
14
17
17
1. Looking for a place to start, the researcher does individual regressions of price against the individual
independent variables.
a. Explain why the researcher concludes from the rgressions that valuation (‘value’) is the most
important independent variable. Consider the values of R 2 and the significance tests on the
slope of the equation (2)
b. What kind of variable is ‘new.’ Explain why the regression of ‘price’ against ‘new’ is
equivalent to a test of the equality of 2 sample means, and what the conclusion would be. (2)
4/25/03 252x0342
MTB > regress c1 1 c2
Regression Analysis: Price versus Value
The regression equation is
Price = - 44.2 + 1.78 Value
Predictor
Constant
Value
Coef
-44.172
1.78171
S = 3.475
SE Coef
7.346
0.09546
R-Sq = 92.6%
T
-6.01
18.66
P
0.000
0.000
R-Sq(adj) = 92.3%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
28
29
SS
4206.7
338.1
4544.8
Unusual Observations
Obs
Value
Price
6
59.9
61.900
11
60.8
69.340
MS
4206.7
12.1
Fit
62.606
64.156
F
348.37
SE Fit
1.719
1.642
P
0.000
Residual
-0.706
5.184
St Resid
-0.23 X
1.69 X
X denotes an observation whose X value gives it large influence.
MTB > regress c1 1 c3
Regression Analysis: Price versus New
The regression equation is
Price = 88.5 + 9.93 New
Predictor
Constant
New
S = 11.70
Coef
88.458
9.926
SE Coef
2.759
4.362
R-Sq = 15.6%
T
32.07
2.28
P
0.000
0.031
R-Sq(adj) = 12.6%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
28
29
Unusual Observations
Obs
New
Price
2
0.00
115.50
6
0.00
61.90
26
0.00
113.90
SS
709.3
3835.5
4544.8
Fit
88.46
88.46
88.46
MS
709.3
137.0
SE Fit
2.76
2.76
2.76
F
5.18
P
0.031
Residual
27.04
-26.56
25.44
St Resid
2.38R
-2.33R
2.24R
R denotes an observation with a large standardized residual
MTB > regress c1 1 c4
2
4/25/03 252x0342
Regression Analysis: Price versus Time
The regression equation is
Price = 86.4 + 0.698 Time
Predictor
Constant
Time
Coef
86.355
0.6980
S = 12.33
SE Coef
4.942
0.5057
R-Sq = 6.4%
T
17.47
1.38
P
0.000
0.178
R-Sq(adj) = 3.0%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
28
29
SS
289.6
4255.2
4544.8
Unusual Observations
Obs
Time
Price
2
2.0
115.50
6
4.0
61.90
MS
289.6
152.0
Fit
87.75
89.15
F
1.91
SE Fit
4.07
3.27
P
0.178
Residual
27.75
-27.25
St Resid
2.38R
-2.29R
R denotes an observation with a large standardized residual
MTB > regress c1 2 c2 c4;
SUBC> dw;
SUBC> vif.
2. The researcher now adds time. Compare this regression with the regression with Value alone. Are the
coefficients significant? Does this explain the variation in Y better than the regression with value alone? .
What would the predicted selling price be for an old house with a valuation of 80 in time 17? (3)
Regression Analysis: Price versus Value, Time
The regression equation is
Price = - 45.0 + 1.75 Value + 0.368 Time
Predictor
Constant
Value
Time
Coef
-44.988
1.75060
0.3680
S = 3.097
SE Coef
6.553
0.08576
0.1281
R-Sq = 94.3%
T
-6.87
20.41
2.87
P
0.000
0.000
0.008
VIF
1.0
1.0
R-Sq(adj) = 93.9%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Value
Time
DF
1
1
DF
2
27
29
SS
4285.8
258.9
4544.8
MS
2142.9
9.6
F
223.46
P
0.000
Seq SS
4206.7
79.2
Unusual Observations
Obs
Value
Price
2
86.3
115.500
11
60.8
69.340
20
74.3
83.000
Fit
106.842
63.656
89.146
SE Fit
1.385
1.474
0.680
Residual
8.658
5.684
-6.146
St Resid
3.13R
2.09R
-2.03R
R denotes an observation with a large standardized residual
Durbin-Watson statistic = 2.73
3
4/25/03 252x0342
3. The researcher now adds the variable ‘new’ Remember that there is nothing wrong with a negative
coefficient unless there is some reason why it should not be negative.
a. What two reasons would I find to doubt that this regression is an improvement on the
regression with just value and time by just looking at the t tests and the sign of the coefficients?
What does the change in R 2 adjusted tell me about this regression? (3)
b. We have done 5 ANOVA’s so far. What was the null hypothesis in these ANOVA’s and what
does the one where the null hypothesis was accepted tell us? (2)
c. What selling price does this eqution predict for an old home with a valuation of 80 in time 17?
What percentage difference is this from the selling price predicted in the regression with just
time and value? (2)
d. The last two regressions have a Durbin-Watson statistic computed. What did this test for,
what should our conclusion be, and why is it important? (3)
e. The column marked VIF (variance inflation factor) is a test for (multi)collinearity. The rule of
thumb is that if any of these exceeds 5, we have a multicollinearity problem. None does. What is
multicollinearity and why am I worried about it? (2)
f. Do an F test to show whether the regression with ‘value’, ‘time’ and ‘new’ is an improvement
over the regression with ‘value’ alone. (3)
MTB > regress c1 3 c2 c4 c3;
SUBC> dw;
SUBC> vif.
Regression Analysis: Price versus Value, Time, New
The regression equation is
Price = - 47.7 + 1.79 Value + 0.351 Time - 1.22 New
Predictor
Constant
Value
Time
New
Coef
-47.675
1.79394
0.3508
-1.218
S = 3.105
SE Coef
7.190
0.09804
0.1298
1.322
R-Sq = 94.5%
T
-6.63
18.30
2.70
-0.92
P
0.000
0.000
0.012
0.366
VIF
1.3
1.0
1.3
R-Sq(adj) = 93.8%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
3
26
29
SS
4294.0
250.7
4544.8
MS
1431.3
9.6
F
148.42
P
0.000
4
4/25/03 252x0342
Source
Value
Time
New
DF
1
1
1
Seq SS
4206.7
79.2
8.2
Unusual Observations
Obs
Value
Price
2
86.3
115.500
11
60.8
69.340
20
74.3
83.000
Fit
107.862
63.502
89.492
SE Fit
1.777
1.487
0.778
Residual
7.638
5.838
-6.492
St Resid
3.00R
2.14R
-2.16R
R denotes an observation with a large standardized residual
Durbin-Watson statistic = 2.60
MTB >
5
4/25/03 252x0342
II. Do at least 4 of the following 7 Problems (at least 15 each) (or do sections adding to at least 60 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where
applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing
appropriate statistical tests.
1. (Berenson et. al. 1220) A firm believes that less than 15% of people remember their ads. A survey is
taken to see what recall occurs with the following results (In these problems calculating proportions won’t
help you unless you do a statistical test):
Medium
Mag
TV
Radio Total
Remembered
25
10
8
43
Forgot
73
93
107
273
Total
98
103
115
316
a. Test the hypothesis that the recall rate is less than 15% by using proportions calculated from the
‘Total’ column. Find a p-value for this result. (5)
b. Test the hypothesis that the proportion recalling was lower for Radio than TV. (4)
c. Test to see if there is a significant difference in the proportion that remembered according to the
medium. (6)
d. The Marascuilo procedure says that if (i) equality is rejected in c) and
 
(ii) p 2  p3   2 s p , where the chi – squared is what you used in c) and the standard deviation is
2
what you would use in a confidence interval solution to b), you can say that you have a significant
difference between TV and Radio. Try it! (5)
6
4/25/03 252x0342
2. (Berenson et. al. 1142) A manager is inspecting a new type of battery. These are subjected to 4 different
pressure levels and their time to failure is recorded. The manager knows from experience that such data is
not normally distributed. Ranks are provided.
PRESSURE
Use
low
1
2
3
4
5
8.2
8.3
9.4
9.6
11.9
rank normal
11
12
15
16
19
7.9
8.4
10.0
11.1
12.5
rank
high
rank
whee!
rank
9
13
17
18
20
6.2
6.5
7.3
7.8
9.1
4
5
7
8
14
5.3
5.8
6.1
6.9
8.0
1
2
3
6
10
a. At the 5% level analyze the data on the assumption that each column represents a random
sample. Do the column medians differ? (5)
b. Rerank the data appropriately and repeat a) on the assumption that the data is non-normal but
cross classified by use. (5)
c. This time I want to compare high pressure (H) against low - moderate pressure (L). I will write
out the numbers 1-20 and label them according to pressure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
H H H H H H H H L
H L L L H L L L L L L
Do a runs test to see if the H’s and L’s appear randomly. This is called a Wald-Wolfowitz test for the
equality of means in two nonnormal samples. Null hypothesis is that the sequence is random and the means
are equal. What is your conclusion? (5)
7
4/25/03 252x0342
3. A researcher studies the relationship of numbers of subsidiaries and numbers of parent companies in 11
metropolitan areas and finds the following:
Area
parents
x
1
2
3
4
5
6
7
8
9
10
11
658
396
357
266
231
223
207
156
146
143
139
2922
subsidiaries
y
2602
1709
1852
1223
875
666
1519
884
477
564
657
13028
x2
432964
156816
127449
70756
53361
49729
42849
24336
21316
20449
19321
1019346
xy
1712116
676764
661164
325318
202125
148518
314433
137904
69642
80652
91323
4419959
y2
6770404
2920681
3429904
1495729
765625
443556
2307361
781456
227529
318096
431649
19891990
a. Do Spearman’s rank correlation between x and y and test it for significance (6)
b. Compute the sample correlation between x and y and test it for significance (6)
c. Compute the sample standard deviation of x and test to see if it equals 200 (4)
8
4/25/03 252x0342
4. Data from the previous page is repeated:
Area
parents
1
2
3
4
5
6
7
8
9
10
11
658
396
357
266
231
223
207
156
146
143
139
2922
x
subsidiaries
y
2602
1709
1852
1223
875
666
1519
884
477
564
657
13028
x2
432964
156816
127449
70756
53361
49729
42849
24336
21316
20449
19321
1019346
y2
xy
1712116
676764
661164
325318
202125
148518
314433
137904
69642
80652
91323
4419959
6770404
2920681
3429904
1495729
765625
443556
2307361
781456
227529
318096
431649
19891990
a. Test the hypothesis that the correlation between x and y is .7 (5)
b. Test the hypothesis that x has the Normal distribution. (9)
c. Test the hypothesis that x and y have equal variances. (4)
9
4/25/03 252x0342
5. Data from the previous page is repeated:
Area
parents
x
1
2
3
4
5
6
7
8
9
10
11
658
396
357
266
231
223
207
156
146
143
139
2922
subsidiaries
y
2602
1709
1852
1223
875
666
1519
884
477
564
657
13028
x2
432964
156816
127449
70756
53361
49729
42849
24336
21316
20449
19321
1019346
xy
1712116
676764
661164
325318
202125
148518
314433
137904
69642
80652
91323
4419959
y2
6770404
2920681
3429904
1495729
765625
443556
2307361
781456
227529
318096
431649
19891990
a. Compute a simple regression of subsidiaries against parents as the independent variable. (5)
b. Compute s e . (3)
c. Predict how many subsidiaries will appear in a city with 60 parent corporations. (1)
d. Make your prediction in c) into a confidence interval. (3)
e. Compute s b0 and make it into a confidence interval for  0 . (3)
f. Do an ANOVA for this regression and explain what it says about 1 . (3)
10
4/25/03 252x0342
6. A chain has the following data on prices, promotion expenses and sales of one product. (You can do
x1 x 2 ):

Store
1
2
3
4
5
6
7
8
9
10
11
12
sales
promotion
x1
x2
x12
3842
3754
5000
1916
3224
2618
3746
3825
1096
1882
2159
2927
35989
59
59
59
79
79
79
79
79
99
99
99
99
968
200
400
600
200
200
400
600
600
200
400
400
600
4800
3481
3481
3481
6241
6241
6241
6241
6241
9801
9801
9801
9801
80852
y2
x 22
Store
1
2
3
4
5
6
7
8
9
10
11
12
price
y
40000
160000
360000
40000
40000
160000
360000
360000
40000
160000
160000
360000
2240000
x1 y
14760964
14092516
25000000
3671056
10394176
6853924
14032516
14630625
1201216
3541924
4661281
8567329
121407527
y  2999.08, x1  80.6667
226678
221486
295000
151364
254696
206822
295934
302175
108504
186318
213741
289773
2752491
x2 y
768400
1501600
3000000
383200
644800
1047200
2247600
2295000
219200
752800
863600
1756200
15479600
x 2  400.000.
and
a. Do a multiple regression of sales against x1 and x 2 . (10)
b. Compute R 2 and R 2 adjusted for degrees of freedom. Use a regression ANOVA to test the usefulness
of this regression. (6)
d. Use your regression to predict sales when price is 79 cents and promotion expenses are $200. (2)
e. Use the directions in the outline to make this estimate into a confidence interval and a prediction interval.
(4)
f. If the regression of Price alone had the following output: The regression equation is
sales = 7391 - 54.4 price
Predictor
Constant
price
S = 726.2
Coef
7391
-54.44
SE Coef
1133
13.81
R-Sq = 60.9%
T
6.52
-3.94
P
0.000
0.003
R-Sq(adj) = 56.9%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
10
11
SS
8200079
5273437
13473517
MS
8200079
527344
F
15.55
P
0.003
Do an F-test to see if adding x 2 helped. (4). The next page is blank – please show your work.
11
4/25/03 252x0342 (Blank)
12
4/25/03 252x0342
7. The Lees present the following data on college students summer wages vs. years of work experience
blocked by location.
Years of Work Experience
Region
1
2
3
1
16
19
24
2
21
20
21
3
18
21
22
4
14
21
25
a. Do a 2-way ANOVA on these data and explain what hypotheses you test and what the
conclusions are. (9) (Or do a 1-way ANOVA for 6 points.) The following column sums are done for you:
x
1
 69,
x
2
 81, n1  4, n 2  4,
x
2
1
 1217 and
x
2
2
 1643. So x1  17.25,and x 2  20.25.
b. Do a test of the equality of the means in columns 2 and 3 assuming that the columns are random
samples from Normal populations with equal variances (4).
c. Assume that columns 2 and 3 do not come from a Normal distribution and are not paired data
and do a test for equal medians. (4)
d. Test the following data for uniformity. n  20.
Category
1
2
3
4
5
Numbers
0
2
0
10
8
13
Download