4/27/01 252x0143 ECO252 QBA2 FINAL EXAM May 2, 2001 Name Hour of Class Registered (Circle) I. (16+ points) Do all the following. 1. Hand in your fourth regression problem (2 points) Remember: Y = Company profit in millions of dollars, X1 = CEO's yearly income in thousands of dollars (X1 = 1000 means a million dollar annual income) , X2 = Percentage of stock owed by CEO (X2 = 3 means the CEO owns 3.0% of the stock) Use a significance level of 10% in this problem. 2. Answer the following questions. a. For the regression of Y against X1 and X2 only, what does the ANOVA tell us? Which of the coefficients are significant? What tells you this? (3) b. Do an F test to show if the addition of X1 and X3 improves the regression over your results with X2 alone. (4) c. Based on your regression of Y against X1, X2, and X3, (i) What evidence is there that CEO income and stock percentage interact? (1) (ii) What change does this equation predict for every one thousand dollars of CEO income when the CEO owns 5% of the company's stock? (3) (iii) What profit does the equation predict for a firm where the CEO earns $1.2 million and owns 40% of the stock? What might this lead you to suspect about this equation? (2) (iv) Based only on the adjusted R-squared and the significance of the coefficients, is there an equation that seems to work better than the equation with three independent variables? Why? (3) 4/27/01 252x0143 II. Do at least 4 of the following 7 Problems (at least 15 each) (or do sections adding to at least 60 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing appropriate statistical tests. 1. (Black, p532) A researcher wishes to predict the price of a meal in New Orleans ( y ) on the basis of location ( x1 - a dummy variable, 1 if the restaurant is in the French Quarter, 0 otherwise) and the probability of being seated on arrival. ( x 2 ). The data is below (Use .10 ) . Row price FQ prob y x1 x2 1 8.52 0 0.62 2 21.45 1 0.43 3 16.18 1 0.58 4 6.21 0 0.74 5 12.19 1 0.19 6 25.62 1 0.49 7 13.90 0 0.80 8 18.66 1 0.75 9 5.25 0 0.37 10 14.85 1 0.32 The following are given to help you. y 142 .83, y 2 2427 .52, x 6, 1 x12 6, x 2 5.29, x22 3.1713, x y ?, x y 75.1651, x x ? and n 10 . 1 2 1 2 You do not need all of these. a. Compute a simple regression of price against x1 .(7) b. On the basis of this regression, what price do you expect to pay for a meal in the French Quarter? Outside the French Quarter? (2) b. Compute R 2 (4) c. Compute s e (3) d. Compute s b0 ( the std deviation of the intercept) and do a confidence interval for 0 .(3) f. Do a confidence interval for the price of a meal in the French Quarter. (3) 2 4/27/01 252x0143 2. Data from the previous problem is repeated. below . (Use .10 ) . Row price FQ prob y x1 x2 1 8.52 0 0.62 2 21.45 1 0.43 3 16.18 1 0.58 4 6.21 0 0.74 5 12.19 1 0.19 6 25.62 1 0.49 7 13.90 0 0.80 8 18.66 1 0.75 9 5.25 0 0.37 10 14.85 1 0.32 The following are given to help you. y 142 .83, y 2 2427 .52, x 6, 1 x12 6, x 2 5.29, x22 3.1713, x y ?, x y 75.1651, x x ? and n 10 . 1 2 1 2 a. Do a multiple regression of price against x1 and x 2 . (12) b. Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem. Compare the values of R 2 adjusted between this and the previous problem. Use an F test to compare R 2 here with the R 2 from the previous problem.(4) c. Compute the regression sum of squares and use it in an F test to test the usefulness of this regression. (5) d. Use your regression to predict the price of a meal in the French Quarter sold when the probability of being seated on arrival is 30%(2) e. Use the directions in the outline to make this estimate into a confidence interval and a prediction interval. (4) 3 4/27/01 252x0143 3. An airline wants to select a computer package for its reservation system. Over 20 weeks it tries the four commercially available reservation system packages and records as x1 , x 2 , x3 , and x 4 , the number of passengers bumped by each system. It will choose the package with the smallest average bumps, assuming that there is a significant difference between the median or average number of bumps. The data below are in the columns labeled x, the original numbers and, in the r columns, their ranks on a 1 to 20 scale. Below this I have given you the sums of the columns, the number of items in each column, the means for each columns and the sums of the squared numbers (ssq) in each column. The columns are independent samples. Use a 5% significance level. Row 1 2 3 4 5 6 x1 62.0 5 12.4 798.0 P1 x1 12 14 9 11 16 r1 15.0 17.0 9.5 13.0 19.0 x2 17.0 5 3.4 79.0 P2 x2 2 4 7 3 1 x3 66 6 11 850 r2 2.0 4.0 7.5 3.0 1.0 x4 40 4 10 454 P3 x3 20 9 5 10 12 10 r3 20.0 9.5 5.0 11.5 15.0 11.5 P4 x4 7 6 15 12 r4 7.5 6.0 18.0 15.0 sum count mean ssq a. Assume that the underlying distribution is Normal and test for a significant difference between the means. (7) b. Assume that the underlying distribution is not normal and test for a significant difference between the medians. (5). c. Find the mean and standard deviation for column P3 and test column P3 for a Normal distribution. (5) 4 4/27/01 252x0143 4. The data from the previous page is repeated. Use a 5% significance level. Row 1 2 3 4 5 6 x1 62.0 5 12.4 798.0 P1 x1 12 14 9 11 16 r1 15.0 17.0 9.5 13.0 19.0 x2 17.0 5 3.4 79.0 P2 x2 2 4 7 3 1 x3 66 6 11 850 r2 2.0 4.0 7.5 3.0 1.0 x4 40 4 10 454 P3 x3 20 9 5 10 12 10 r3 20.0 9.5 5.0 11.5 15.0 11.5 P4 x4 7 6 15 12 r4 7.5 6.0 18.0 15.0 sum count mean ssq a. Assume that the underlying distribution is Normal and test columns 1 and 3 for differences in means. Assume identical variances. Use a (i) test ratio, (ii) a critical value and (iii) a confidence interval (6) b. Assume that the underlying distribution is not normal and test for a significant difference between the medians of columns 1 and 3(4) c. Assume again that the distributions are Normal and test that the variances are the same. (3) d. Test column P3 to see if its standard deviation is 6. (3). 5 4/27/01 252x0143 5. a. A machine fills a sample of 100 one-pound boxes of a product and they are later tested to see how many are over or under the desired one-pound size. The manufacturer wishes to test whether exactly half of the population of boxes is over the one-pound mark and that the occurrence of boxes that are 'over' and 'under' is random. In the sample there are 53 boxes that are 'over' and 47 that are under and there are 45 runs of 'overs' or 'unders'. (i) Test that the proportion of 'overs' is 50%. (2) (ii) Test that the sequence of 'overs' and 'unders' is random. (5) b. A series of 24 observations are used to calculate a simple regression with three independent variables. We calculate a Durbin-Watson statistic of 0.471. Is Autocorrelation present? Is it positive or negative? (3) c. We are testing to see if the mean of a normally distributed population with a known variance of 20 is 5. We take a sample of 100 and find that the sample mean is 11. Given these results, what is the p-value of our result if (i) the Null hypothesis is H 0 : 5 , (ii) the Null hypothesis is H 0 : 5 , (iii) The Null hypothesis is H 0 : 5 (6) 6 4/27/01 252x0143 6. An electronics chain reports the following data on number of households, sales volume and number of customers for 10 stores. Row hshlds sales cust x1 x2 x3 1 149 308 302 2 80 98 51 3 123 230 202 4 108 144 102 5 152 270 252 6 221 440 507 7 167 344 352 8 192 378 452 9 220 410 402 10 89 188 153 x 1 1501.0, x 2 1 248413, x 2 2810.0, x 2 2 909268, x x 1 2 472810 a) Compute the correlation between households and sales and test it for significance. (5) b) Test the same correlation to see if it is .86 (5) c) Compute the rank correlation between households and sales and test it for significance. (5) d) Compute Kendall's W for households, sales and customers and test it for significance (6) 7 4/27/01 252x0143 7. A producer of filters is getting complaints about the quality of the filters it is producing. It thus examines 1000 filters from each of its three shifts and discovers for shift 1 39 defects, for shift 2 43 defects and for shift 3 58 defects. a) Test the hypothesis that the proportion of defective filters is the same for all three shifts at the 95% level. (7) b) Test the hypothesis that the defect rate is higher for the third shift than the first. (3) c) Find a p-value for your result in b) (2) d) Do a confidence interval for the difference between the proportion defective for shifts 1 and shift 2. (4) e) Extra credit: If I do a regression with sales as the dependent variable and households and customers as independent variables, what sort of results would I be likely to get? Why? (3) 8