4/30/02 252x0242c ECO252 QBA2 FINAL EXAM May 6, 2002 Name Hour of Class Registered (Circle) I. (16+ points) Do all the following. 1. Hand in your fourth regression problem (2 points) Remember: Y = 'Vol' = volatility (std. Deviation of return) , X1 = 'CR' = Credit rating on a zero to 100 (per cent) scale, X2 = 'emd' = a dummy variable that is 1 if a country has an emerging market , 0 if the country has a developed market, X3 = 'ecr' = the product of 'CR' and 'emd', X4 = 'gdp' = per capita income in thousands of US dollars in the late '90's, X5 = 'gd-cr' = the product of 'CR' and 'gdp.' We would expect foreign exchange rates to became less volatile as i) credit rating improves, ii) markets became developed, and iii) per capita income rises. Remember saying 'yes' or 'no' to a question is useless unless you cite statistical tests. Use a significance level of 1% in this problem except when you are told otherwise. 2. Answer the following questions: a. For the regression of 'Vol' against 'CR', 'emd', 'ecr' 'gdp' and 'gd-cr' , what coefficients are significant at the 5% level? Why? What about the 1% level? (3) b. Given the comments at the beginning of this page, what signs would you expect the coefficients to have. Do they have the expected signs? (4) c. For the same regression, what does the ANOVA tell us? Why? (2) d. In view of the analysis above, is there a regression that seems to work better than the one mentioned in a) above? Why? (2) The problem in the text says "Write a model that describes the relationship between volatility (Y) and credit rating as two nonparallel lines, one for each type of market ……. Is there evidence to conclude that the slope of the linear relationship between volatility (Y) and credit rating (X1) depends on market type?" a. What equation did you fit that answers the questions in the text? Given the coefficients that you found, what are the two equations (and coefficients) that your equation implies for these two market types? (3) b. Using the 1% confidence level, what evidence can you present as to whether the slope depends on market type? (2) What equation was suggested by your stepwise regression. Does this seem to work as well as the one suggested by the textbook authors? Why? If you compare the slope of the regression line relating volatility to the credit rating for countries with gdps of 2(thousand) and 20(thousand), what seems to be happening to the slope as per capita gdp rises? (5) 3. 4. 4/30/02 252x0242 II. Do at least 4 of the following 7 Problems (at least 15 each) (or do sections adding to at least 60 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing appropriate statistical tests. 1. A researcher is investigating the behavior of the Dow-Jones Transportation, Industrial and Utility averages. Data is presented below for closing numbers for 14 days in May 2001. Because the researcher believes that the underlying distributions are not Normal, she computes rank correlations instead of standard correlations. For your convenience, ranks have been computed for Transportation and Industry. a. Check the utilities for rises and falls in value, marking rises with + and falls with -. Using a statistical test, find out if the pattern of rises and falls is random. (5) b. Compute a rank correlation between industry and utility prices and test it for significance. (5) c. Compute a measurement of concordance between the three series and test it for significance. Express it on a zero to one scale. (6) Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Date 5 5 5 5 5 5 5 5 5 5 5 5 5 5 07 08 09 10 11 14 15 16 17 18 21 22 23 24 Trans Indust x1 r1 x2 2850.64 2865.54 2865.73 2899.76 2879.56 2874.59 2880.24 2925.50 2957.58 2978.95 3004.35 2990.97 2969.16 2951.01 1 2 3 7 5 4 6 8 10 12 14 13 11 9 10935.2 10888.5 10867.0 10910.4 10821.3 10877.3 10873.0 11215.9 11248.6 11301.7 11337.9 11257.2 11105.5 11122.4 Util r2 7 5 2 6 1 4 3 10 11 13 14 12 8 9 x3 383.93 378.74 383.74 383.52 386.64 391.04 385.70 387.52 387.84 391.54 394.43 394.67 398.31 397.68 2 4/30/02 252x0242 2. (Pelosi and Sandifer) A diaper company is testing three filler materials for diapers. Eight diapers were tested with each of the three filler materials making a total of 24 diapers put on 24 toddlers. Each column ( x1 , x 2 , and x3 ) can be considered a random sample of eight taken from a Normally distributed population. As each toddler played, fluid was injected into the diaper until the product leaked. Each number below in x1 , x 2 , and x3 represents the capacity of the diaper. The remaining columns ( r1 , r2 , and r3 ) are a ranking of the 24 numbers. In this entire problem we assume that the underlying distributions are Normal. Row 1 2 3 4 5 6 7 8 x1 792 790 797 803 811 791 801 791 r1 5.0 2.0 6.0 8.5 13.5 3.5 7.0 3.5 x2 r2 809 818 803 781 813 808 805 811 12.0 17.0 8.5 1.0 15.5 11.0 10.0 13.5 x3 826 813 854 843 846 847 835 872 r3 18.0 15.5 23.0 20.0 21.0 22.0 19.0 24.0 The following are computed for you: x 6376.00, x 6448.00, x x 1 3 2 2 6736.00, 5197954, n1 n2 n3 8 . x x 2 2 1 5082066, 2 3 5673944 and a. Compute the sample variances of x1 and x3 and test the hypothesis that the population variances for these two columns are equal. (4) b. Assume that the variances of the populations from which x 2 and x3 come are equal and test the hypothesis that 3 is greater than 2 i) First state your null and alternate hypotheses (2) and then test the hypotheses using a (ii) test ratio, (iii) a critical value and (iv) a confidence interval. (6) c. Test if the hypothesis that the means of all three populations are equal holds water (7) d. Use a test of goodness of fit to see if x 2 has the Normal distribution. (5) 3 4/30/02 252x0242 3. Data from the previous problem is repeated. In this problem assume that the underlying distributions are not Normal. Remember that each column is an independent sample. Row 1 2 3 4 5 6 7 8 x1 792 790 797 803 811 791 801 791 r1 5.0 2.0 6.0 8.5 13.5 3.5 7.0 3.5 x2 r2 809 818 803 781 813 808 805 811 12.0 17.0 8.5 1.0 15.5 11.0 10.0 13.5 x3 826 813 854 843 846 847 835 872 r3 18.0 15.5 23.0 20.0 21.0 22.0 19.0 24.0 The following are computed for you: x x x 1 3 2 2 6376.00, 6736.00, 5197954, n1 n2 n3 8 . x x x 2 2 1 2 3 6448.00, 5082066, 5673944 and a. Test the hypothesis that the median of the population underlying x3 is larger than the median of the population underlying x 2 . (6) b. Test the hypothesis that all three columns come from populations with equal medians. (7) c. Test the hypothesis that x 2 comes from a population with a median of 804 using either a sign test (4) or a Wilcoxon signed rank test (5). 4 4/30/02 252x0242 4. (Pelosi and Sandifer) A survey on student drinking revealed the following: Residence Nonbinge Infrequent Frequent Total Drinker Binge Drinker Binge Drinker On Campus 35 29 47 111 Off Campus 49 31 24 104 Total 84 60 71 215 a. Test the hypothesis that the proportion in each of the three drinking categories is the same regardless of where a student lives. (7) b. Test the hypothesis that the proportion of infrequent binge drinkers is higher off campus than on campus. (4) c. The researcher believes that, nationwide, the proportion of frequent binge drinkers is 30%. Test to see if the proportion on the campus profiled above is higher. (3) d. Find a p-value for the result in c (2) 5 4/30/02 252x0242 5. A fast food corporation wishes to predict its mean weekly sales as a function of weekly traffic flow on the street where the restaurant is and the city in which it is located. In the first version of the study, the data is as below. y is 'sales' in thousands, x1 is 'flow', traffic flow in thousands of cars per week, x 2 is 1 if the store is in city 2, zero otherwise. (Use .01) . Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 y 6.4 6.7 7.7 2.9 9.5 6.0 6.2 5.0 3.5 8.4 5.2 3.9 5.5 4.1 3.2 5.4 x1 x2 59.3 60.3 82.1 32.3 98.0 54.1 54.4 51.4 36.7 75.9 48.4 41.5 52.6 41.1 29.6 49.5 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 The following data is computed for you y 89.6000, x 867.200, x 2 7.0, n 16, x 52023.3, x ?, y 554.760, x y 5358.62, x y ?, x x 338.60. 1 2 1 2 2 2 1 2 1 2 You do not need all of these on this page. a. Compute a simple regression of sales against flow. (7) b. Given your equation, what sales do you expect when the flow is 60.00? (1) c. Compute R 2 (4) d. Compute s e (3) e. Compute s b1 ( the std deviation of the slope) and do a significance test for 1 .(3) f. Do a prediction interval for sales when the flow is 60. (3) 6 4/30/01 252x0242 6.. Data from the previous problem is repeated. below . (Use .05) . y is 'sales' in thousands, x1 is 'flow', traffic flow in thousands of cars per week, x 2 is 1 if the store is in city 2, zero otherwise. (Use .01) . Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 y 6.4 6.7 7.7 2.9 9.5 6.0 6.2 5.0 3.5 8.4 5.2 3.9 5.5 4.1 3.2 5.4 x1 59.3 60.3 82.1 32.3 98.0 54.1 54.4 51.4 36.7 75.9 48.4 41.5 52.6 41.1 29.6 49.5 x2 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 The following data is computed for you y 89.6000, x 867.200, x 2 7.0, n 16, x 52023.3, x ?, y 554.760, x y 5358.62, x y ?, x x 338.60. 1 2 1 2 2 1 2 2 1 2 a. Do a multiple regression of price against x1 and x 2 . (12) b. Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem. Compare the values of R 2 adjusted between this and the previous problem. Use an F test to compare R 2 here with the R 2 from the previous problem. What does your F-test suggest about the significance of the coefficient of x 2 ? (5) c. Compute the regression sum of squares and use it in an F test to test the usefulness of this regression. (5) d. Use your regression to predict sales in city 2 when flow is 60.00. (2) e. Use the directions in the outline to make this estimate into a confidence interval and a prediction interval. (4) 7 4/30/01 252x0242 7. The regression in the previous problem was run again, using data from four cities. Remember, y is 'sales' in thousands, x1 is 'flow', traffic flow in thousands of cars per week. (Use .05) . First it was run in the form Y b0 b1 X 1 with the following results. The regression equation is sales = 0.010 + 0.109 flow Predictor Constant flow Coef 0.0104 0.108570 s = 0.5947 Stdev 0.3583 0.006077 R-sq = 93.6% t-ratio 0.03 17.87 p 0.977 0.000 R-sq(adj) = 93.3% Analysis of Variance SOURCE Regression Error Total DF 1 22 23 SS 112.87 7.78 120.65 MS 112.87 0.35 F 319.17 p 0.000 Then it was run again in the form Y b0 b1 X 1 b2 X 2 b3 X 3 b4 X 4 with the following results: The regression equation is sales = - 0.178 + 0.105 flow + 0.199 city2 + 0.675 city3 + 1.17 city4 Predictor Constant flow city2 city3 city4 Coef -0.1782 0.105002 0.1991 0.6751 1.1717 s = 0.3960 Stdev 0.2941 0.004475 0.2049 0.2745 0.2245 R-sq = 97.5% t-ratio -0.61 23.47 0.97 2.46 5.22 p 0.552 0.000 0.343 0.024 0.000 R-sq(adj) = 97.0% Analysis of Variance SOURCE Regression Error Total DF 4 19 23 SS 117.674 2.979 120.653 SOURCE flow city2 city3 city4 DF 1 1 1 1 SEQ SS 112.873 0.274 0.254 4.272 MS 29.418 0.157 F 187.61 p 0.000 a) What does the ANOVA show? (2) b) Do an F test to show if location (adding x 2 'city 2', x3 'city 3',and x 4 'city 4' all at once) improves our explanation of weekly sales. (4) c) We have added dummy variables for cities 2, 3 and 4. Why didn't we add one for city 1? (1) d) What is the sales predicted for a flow of 60 in city 3. What does it mean to say that the coefficient of 'city3' is .6751. (2) e) Explain how the model would be modified to show interaction between city and traffic flow. (2) 8 4/30/01 252x0242 f) An ANOVA was run to determine if management style affected the number of sick days taken by employees. The research was done using 3 different management styles in five separate departments. The dependent variable was the number of sick days taken by each employee. The Minitab output follows (Pelosi and Sandifer): Source Department Mgt. Style Interaction Error Total DF 4 2 8 60 74 SS 208.187 101.440 44.293 42.000 395.920 MS 52.047 50.720 5.537 0.700 Finish the Minitab table and explain what it shows. In particular, citing numbers in the table or from the F table, does management style make a difference in the number of sick days that employees take and does what department management style is changed in seem to have an effect? (5) 9 4/30/01 252x0242 (Intentionally left blank for calculations) 10 4/30/01 252x024- Copy Number Name 8. Extra Credit - Questions on correlation. Go back to problem 7. Use the R-sq in the first regression to find the correlation between sales and traffic flow (0). Use the same significance level that you used on that problem. a. Test the correlation between sales and flow for significance. (3) b. Test the hypothesis that the correlation between sales and traffic flow is .9. (4) c. Compute the partial correlation between sales and 'city4' , rY 4.123 . (2) d. It's no secret that not all the coefficients of the second regression in problem 7 were very significant. I checked for (multi)collinearity by doing the following Minitab command: MTB > corr c2 c3 c4 c5 Correlations (Pearson) city2 city3 city4 flow -0.228 -0.256 0.313 city2 city3 -0.243 -0.329 -0.194 These results were also printed out as: Matrix CORR1 flow flow 1.00000 city2 -0.22820 city3 -0.25624 city4 0.31345 city2 -0.22820 1.00000 -0.24254 -0.32918 city3 -0.25624 -0.24254 1.00000 -0.19389 city4 0.31345 -0.32918 -0.19389 1.00000 Explain what collinearity is and whether it is likely that collinearity influenced my results. (3) e. Aczel reports the following regression results: MTB > REGRESS 'export' on 4 'm1' 'lend' 'price' 'exch'; SUBC > DW. ………… (Most of output omitted) Durbin-Watson statistic = 2.58 If n 67 , explain, telling your significance level, what we ought to conclude from this printout. (3) 11