4/30/02 252x0242c ECO252 QBA2 Name

advertisement
4/30/02 252x0242c
ECO252 QBA2
FINAL EXAM
May 6, 2002
Name
Hour of Class Registered (Circle)
I. (16+ points) Do all the following.
1.
Hand in your fourth regression problem (2 points)
Remember: Y = 'Vol' = volatility (std. Deviation of return) , X1 = 'CR' = Credit rating on a zero to 100
(per cent) scale, X2 = 'emd' = a dummy variable that is 1 if a country has an emerging market , 0 if the
country has a developed market, X3 = 'ecr' = the product of 'CR' and 'emd', X4 = 'gdp' = per capita
income in thousands of US dollars in the late '90's, X5 = 'gd-cr' = the product of 'CR' and 'gdp.' We
would expect foreign exchange rates to became less volatile as i) credit rating improves, ii) markets
became developed, and iii) per capita income rises. Remember saying 'yes' or 'no' to a question is
useless unless you cite statistical tests.
Use a significance level of 1% in this problem except when you are told otherwise.
2.
Answer the following questions:
a. For the regression of 'Vol' against 'CR', 'emd', 'ecr' 'gdp' and 'gd-cr' , what coefficients are significant
at the 5% level? Why? What about the 1% level? (3)
b. Given the comments at the beginning of this page, what signs would you expect the coefficients to
have. Do they have the expected signs? (4)
c. For the same regression, what does the ANOVA tell us? Why? (2)
d. In view of the analysis above, is there a regression that seems to work better than the one mentioned
in a) above? Why? (2)
The problem in the text says "Write a model that describes the relationship between volatility (Y) and
credit rating as two nonparallel lines, one for each type of market ……. Is there evidence to conclude
that the slope of the linear relationship between volatility (Y) and credit rating (X1) depends on market
type?"
a. What equation did you fit that answers the questions in the text? Given the coefficients that you
found, what are the two equations (and coefficients) that your equation implies for these two market
types? (3)
b. Using the 1% confidence level, what evidence can you present as to whether the slope depends on
market type? (2)
What equation was suggested by your stepwise regression. Does this seem to work as well as the one
suggested by the textbook authors? Why? If you compare the slope of the regression line relating
volatility to the credit rating for countries with gdps of 2(thousand) and 20(thousand), what seems to be
happening to the slope as per capita gdp rises? (5)
3.
4.
4/30/02 252x0242
II. Do at least 4 of the following 7 Problems (at least 15 each) (or do sections adding to at least 60 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where
applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing
appropriate statistical tests.
1. A researcher is investigating the behavior of the Dow-Jones Transportation, Industrial and Utility
averages. Data is presented below for closing numbers for 14 days in May 2001. Because the researcher
believes that the underlying distributions are not Normal, she computes rank correlations instead of standard
correlations.
For your convenience, ranks have been computed for Transportation and Industry.
a. Check the utilities for rises and falls in value, marking rises with + and falls with -. Using a statistical test,
find out if the pattern of rises and falls is random. (5)
b. Compute a rank correlation between industry and utility prices and test it for significance. (5)
c. Compute a measurement of concordance between the three series and test it for significance. Express it
on a zero to one scale. (6)
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Date
5
5
5
5
5
5
5
5
5
5
5
5
5
5
07
08
09
10
11
14
15
16
17
18
21
22
23
24
Trans
Indust
x1
r1
x2
2850.64
2865.54
2865.73
2899.76
2879.56
2874.59
2880.24
2925.50
2957.58
2978.95
3004.35
2990.97
2969.16
2951.01
1
2
3
7
5
4
6
8
10
12
14
13
11
9
10935.2
10888.5
10867.0
10910.4
10821.3
10877.3
10873.0
11215.9
11248.6
11301.7
11337.9
11257.2
11105.5
11122.4
Util
r2
7
5
2
6
1
4
3
10
11
13
14
12
8
9
x3
383.93
378.74
383.74
383.52
386.64
391.04
385.70
387.52
387.84
391.54
394.43
394.67
398.31
397.68
2
4/30/02 252x0242
2. (Pelosi and Sandifer) A diaper company is testing three filler materials for diapers. Eight diapers were
tested with each of the three filler materials making a total of 24 diapers put on 24 toddlers. Each column
( x1 , x 2 , and x3 ) can be considered a random sample of eight taken from a Normally distributed
population. As each toddler played, fluid was injected into the diaper until the product leaked. Each number
below in x1 , x 2 , and x3 represents the capacity of the diaper. The remaining columns ( r1 , r2 , and r3 ) are
a ranking of the 24 numbers. In this entire problem we assume that the underlying distributions are Normal.
Row
1
2
3
4
5
6
7
8
x1
792
790
797
803
811
791
801
791
r1
5.0
2.0
6.0
8.5
13.5
3.5
7.0
3.5
x2
r2
809
818
803
781
813
808
805
811
12.0
17.0
8.5
1.0
15.5
11.0
10.0
13.5
x3
826
813
854
843
846
847
835
872
r3
18.0
15.5
23.0
20.0
21.0
22.0
19.0
24.0
The following are computed for you:
x  6376.00,
x  6448.00,

x
x
1
3
2
2
 6736.00,
 5197954,
n1  n2  n3  8 .

x
x
2
2
1
 5082066,
2
3
 5673944 and
a. Compute the sample variances of x1 and x3 and test the hypothesis that the population variances for
these two columns are equal. (4)
b. Assume that the variances of the populations from which x 2 and x3 come are equal and test the
hypothesis that  3 is greater than  2 i) First state your null and alternate hypotheses (2) and then test the
hypotheses using a (ii) test ratio, (iii) a critical value and (iv) a confidence interval. (6)
c. Test if the hypothesis that the means of all three populations are equal holds water (7)
d. Use a test of goodness of fit to see if x 2 has the Normal distribution. (5)
3
4/30/02 252x0242
3. Data from the previous problem is repeated. In this problem assume that the underlying distributions are
not Normal. Remember that each column is an independent sample.
Row
1
2
3
4
5
6
7
8
x1
792
790
797
803
811
791
801
791
r1
5.0
2.0
6.0
8.5
13.5
3.5
7.0
3.5
x2
r2
809
818
803
781
813
808
805
811
12.0
17.0
8.5
1.0
15.5
11.0
10.0
13.5
x3
826
813
854
843
846
847
835
872
r3
18.0
15.5
23.0
20.0
21.0
22.0
19.0
24.0
The following are computed for you:
x
x
x
1
3
2
2
 6376.00,
 6736.00,
 5197954,
n1  n2  n3  8 .
x
x
x
2
2
1
2
3
 6448.00,
 5082066,
 5673944 and
a. Test the hypothesis that the median of the population underlying x3 is larger than the median of the
population underlying x 2 . (6)
b. Test the hypothesis that all three columns come from populations with equal medians. (7)
c. Test the hypothesis that x 2 comes from a population with a median of 804 using either a sign test (4) or a
Wilcoxon signed rank test (5).
4
4/30/02 252x0242
4. (Pelosi and Sandifer) A survey on student drinking revealed the following:
Residence
Nonbinge
Infrequent
Frequent
Total
Drinker
Binge Drinker
Binge Drinker
On Campus
35
29
47
111
Off Campus
49
31
24
104
Total
84
60
71
215
a. Test the hypothesis that the proportion in each of the three drinking categories is the same regardless of
where a student lives. (7)
b. Test the hypothesis that the proportion of infrequent binge drinkers is higher off campus than on campus.
(4)
c. The researcher believes that, nationwide, the proportion of frequent binge drinkers is 30%. Test to see if
the proportion on the campus profiled above is higher. (3)
d. Find a p-value for the result in c (2)
5
4/30/02 252x0242
5. A fast food corporation wishes to predict its mean weekly sales as a function of weekly traffic flow on the
street where the restaurant is and the city in which it is located. In the first version of the study, the data is as
below. y is 'sales' in thousands, x1 is 'flow', traffic flow in thousands of cars per week, x 2 is 1 if the store
is in city 2, zero otherwise. (Use   .01) .
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
y
6.4
6.7
7.7
2.9
9.5
6.0
6.2
5.0
3.5
8.4
5.2
3.9
5.5
4.1
3.2
5.4
x1
x2
59.3
60.3
82.1
32.3
98.0
54.1
54.4
51.4
36.7
75.9
48.4
41.5
52.6
41.1
29.6
49.5
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
The following data is computed for you
y  89.6000,
x  867.200,


 x 2  7.0, n  16,  x  52023.3,
 x  ?,  y  554.760,
 x y  5358.62,  x y  ?,
 x x  338.60.
1
2
1
2
2
2
1
2
1 2
You do not need all of these on this page.
a. Compute a simple regression of sales against flow. (7)
b. Given your equation, what sales do you expect when the flow is 60.00? (1)
c. Compute R 2 (4)
d. Compute s e (3)
e. Compute s b1 ( the std deviation of the slope) and do a significance test for 1 .(3)
f. Do a prediction interval for sales when the flow is 60. (3)
6
4/30/01 252x0242
6.. Data from the previous problem is repeated. below . (Use   .05) .
y is 'sales' in thousands, x1 is 'flow', traffic flow in thousands of cars per week, x 2 is 1 if the store is in
city 2, zero otherwise. (Use   .01) .
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
y
6.4
6.7
7.7
2.9
9.5
6.0
6.2
5.0
3.5
8.4
5.2
3.9
5.5
4.1
3.2
5.4
x1
59.3
60.3
82.1
32.3
98.0
54.1
54.4
51.4
36.7
75.9
48.4
41.5
52.6
41.1
29.6
49.5
x2
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
The following data is computed for you
y  89.6000,
x  867.200,


 x 2  7.0, n  16,  x  52023.3,
 x  ?,  y  554.760,
 x y  5358.62,  x y  ?,
 x x  338.60.
1
2
1
2
2
1
2
2
1 2
a. Do a multiple regression of price against x1 and x 2 . (12)
b. Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem. Compare
the values of R 2 adjusted between this and the previous problem. Use an F test to compare R 2 here with
the R 2 from the previous problem. What does your F-test suggest about the significance of the coefficient
of x 2 ? (5)
c. Compute the regression sum of squares and use it in an F test to test the usefulness of this regression. (5)
d. Use your regression to predict sales in city 2 when flow is 60.00. (2)
e. Use the directions in the outline to make this estimate into a confidence interval and a prediction interval.
(4)
7
4/30/01 252x0242
7. The regression in the previous problem was run again, using data from four cities. Remember, y is 'sales'
in thousands, x1 is 'flow', traffic flow in thousands of cars per week. (Use   .05) .
First it was run in the form Y  b0  b1 X 1 with the following results.
The regression equation is
sales = 0.010 + 0.109 flow
Predictor
Constant
flow
Coef
0.0104
0.108570
s = 0.5947
Stdev
0.3583
0.006077
R-sq = 93.6%
t-ratio
0.03
17.87
p
0.977
0.000
R-sq(adj) = 93.3%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
1
22
23
SS
112.87
7.78
120.65
MS
112.87
0.35
F
319.17
p
0.000
Then it was run again in the form Y  b0  b1 X 1  b2 X 2  b3 X 3  b4 X 4 with the following results:
The regression equation is
sales = - 0.178 + 0.105 flow + 0.199 city2 + 0.675 city3 + 1.17 city4
Predictor
Constant
flow
city2
city3
city4
Coef
-0.1782
0.105002
0.1991
0.6751
1.1717
s = 0.3960
Stdev
0.2941
0.004475
0.2049
0.2745
0.2245
R-sq = 97.5%
t-ratio
-0.61
23.47
0.97
2.46
5.22
p
0.552
0.000
0.343
0.024
0.000
R-sq(adj) = 97.0%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
4
19
23
SS
117.674
2.979
120.653
SOURCE
flow
city2
city3
city4
DF
1
1
1
1
SEQ SS
112.873
0.274
0.254
4.272
MS
29.418
0.157
F
187.61
p
0.000
a) What does the ANOVA show? (2)
b) Do an F test to show if location (adding x 2 'city 2', x3 'city 3',and x 4 'city 4' all at once) improves our
explanation of weekly sales. (4)
c) We have added dummy variables for cities 2, 3 and 4. Why didn't we add one for city 1? (1)
d) What is the sales predicted for a flow of 60 in city 3. What does it mean to say that the coefficient of
'city3' is .6751. (2)
e) Explain how the model would be modified to show interaction between city and traffic flow. (2)
8
4/30/01 252x0242
f) An ANOVA was run to determine if management style affected the number of sick days taken by
employees. The research was done using 3 different management styles in five separate departments. The
dependent variable was the number of sick days taken by each employee. The Minitab output follows
(Pelosi and Sandifer):
Source
Department
Mgt. Style
Interaction
Error
Total
DF
4
2
8
60
74
SS
208.187
101.440
44.293
42.000
395.920
MS
52.047
50.720
5.537
0.700
Finish the Minitab table and explain what it shows. In particular, citing numbers in the table or from the F
table, does management style make a difference in the number of sick days that employees take and does
what department management style is changed in seem to have an effect? (5)
9
4/30/01 252x0242
(Intentionally left blank for calculations)
10
4/30/01 252x024-
Copy Number
Name
8. Extra Credit - Questions on correlation.
Go back to problem 7. Use the R-sq in the first regression to find the correlation between sales and
traffic flow (0). Use the same significance level that you used on that problem.
a. Test the correlation between sales and flow for significance. (3)
b. Test the hypothesis that the correlation between sales and traffic flow is .9. (4)
c. Compute the partial correlation between sales and 'city4' , rY 4.123 . (2)
d. It's no secret that not all the coefficients of the second regression in problem 7 were very
significant. I checked for (multi)collinearity by doing the following Minitab command:
MTB > corr c2 c3 c4 c5
Correlations (Pearson)
city2
city3
city4
flow
-0.228
-0.256
0.313
city2
city3
-0.243
-0.329
-0.194
These results were also printed out as:
Matrix CORR1
flow
flow
1.00000
city2 -0.22820
city3 -0.25624
city4
0.31345
city2
-0.22820
1.00000
-0.24254
-0.32918
city3
-0.25624
-0.24254
1.00000
-0.19389
city4
0.31345
-0.32918
-0.19389
1.00000
Explain what collinearity is and whether it is likely that collinearity influenced my results. (3)
e. Aczel reports the following regression results:
MTB > REGRESS 'export' on 4 'm1' 'lend' 'price' 'exch';
SUBC > DW.
………… (Most of output omitted)
Durbin-Watson statistic = 2.58
If n  67 , explain, telling your significance level, what we ought to conclude from this printout.
(3)
11
Download