252x0541 4/22/05 ECO252 QBA2 Final EXAM May 2-6, 2005 1 252x0541 4/22/05 ECO252 QBA2 Final EXAM May 2-6, 2005 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ Class days and time : _________________________ 1) Please Note: computer problems 2,3 and 4 should be turned in with the exam (2). In problem 2, the 2 way ANOVA table should be checked. The three F tests should be done with a 5% significance level and you should note whether there was (i) a significant difference between drivers, (ii) a significant difference between cars and (iii) significant interaction. In problem 3, you should show on your third graph where the regression line is. Check what your text says about normal probability plots and analyze the plot you did. Explain the results of the t and F tests using a 5% significance level. (2) 2) 4th computer problem (4+) This is an internet project. You are trying to answer the question, ‘how well does manufacturing explain differences in income?’ You should use some measure of income per person or family in each state as your dependent variable and try to explain it as a function of (to start with) percent of output or labor force in manufacturing. This should start out as a simple regression. Then you should try to see whether there are other variables that explain the differences as well. One possibility is the per cent of the adult population with college or high school diplomas. Possible sources of data are below, but think about what you use, and try to find some other sources. Total income of a state, for example is a very poor choice, rather than some per capita measure because it is simply going to be high for places with a lot of people without indicating how well off they are. Similarly the fraction of the workforce with a certain education level is far better then the number. For instructions on how to do a regression, try the material in Doing a Regression. http://www.nam.org/s_nam/sec.asp?CID=5&DID=3 Manufacturing share in state economies (http://www.nam.org/Docs/IEA/26767_20002001ManufacturingShareandChangeinStateEconomies.pdf?DocTypeID=9&TrackID=&Param=@CategoryI D=1156@TPT=2002-2001+Manufacturing+Share+and+Change+in+State+Economics) http://www.nemw.org/data.htm Per capita income by state. http://www.nemw.org/data.htm State personal income per capita. http://www.bea.doc.gov/bea/regional/data.htm Personal income per capita by state. http://www.census.gov/statab/www/ Many state statistics, including persons with bachelor’s degrees. http://www.epinet.org/content.cfm/datazone_index Income inequality, median income, unemployment rates. Anyway, your job is to add whatever variable you think ought to explain your income measure. Consider all 50 states your sample. Your report should tell what numbers you used, from where and from what years. What coefficients were significant and do you think on the basis of your results that manufacturing is an important predictor of a state’s prosperity? Mark all significant F and t coefficients using a 5% significance level. Explain VIFs. Of course, if you don’t like this assignment, get approval to research something else on the internet. For example, does the per cent of the population in prison affect the crime rate (maybe with a few years’ lag)? Or are there better predictors? And get out the Durbin-Watson, prison vs. crime rate is a time series project. [8] 3) Hotshot Associates is afraid of sex discrimination charges and collects the data below. The dependent variable is income in thousands of dollars and the two independent variables are education in years and a dummy variable indicating sex (1 means a female). The lines in the middle are missing because the totals 2 252x0541 4/22/05 are reliable and you don’t need them. The only thing that is missing is you. Add yourself to the sample as a 21st observation with 12 years of education and an income of 100.0 (thousand) plus the last two digits of your student number as hundreds. For example Roland Dough’s student number is 123689, so he adds $8900 to $100000 to get 108900, which he records as 108.9. y Row 1 2 3 4 5 INC 39.0 43.7 62.6 42.8 55.0 17 72.9 18 56.1 19 67.1 20 82.3 1168.5 x1 x2 x12 x 22 EDUC 2 4 8 8 8 SEX 0 1 0 1 0 4 16 64 64 64 0 1 0 1 0 16 16 17 21 241 0 1 0 0 7 256 256 289 441 3285 y2 1521.00 1909.69 3918.76 1831.84 3025.00 x1 y 78.0 174.8 500.8 342.4 440.0 x2 y x1 x 2 0.0 43.7 0.0 42.8 0.0 0 4 0 8 0 0 5314.41 1166.4 0.0 1 3147.21 897.6 56.1 0 4502.41 1140.7 0.0 0 6773.29 1728.3 0.0 7 70091.67 14783.9 370.6 0 16 0 0 81 a. Compute the regression equation Yˆ b0 b1 x1 to predict salaries the basis of education only. (2) b. Compute R 2 . (2) c. Compute s e . (2) d. Compute s b1 and do a significance test on b1 (1.5) e. Compute s b0 and do a confidence interval for b0 (1.5) f. You are about to hire your nephew for the summer and want to know how much to pay him He has 14 years of education. Using this create a prediction interval his salary. Explain why a confidence interval for the price is inappropriate. (3) g. Do an ANOVA table for the regression. What conclusion can you draw from the hypothesis test in the ANOVA? (2) [22] Extra credit from here on. h. Do a multiple regression of price against education and sex.(5) i. Compute R-squared and R-squared adjusted for degrees of freedom for this regression and compare them with the values for the previous problem. (2) j. Using either R – squares or SST, SSR and SSE do F tests (ANOVA). First check the usefulness of the simple regression and then the value of ‘sex’ as an improvement to the regression. How should this impact Hotshot Associates’ discrimination problem? (Don’t say a word without referring to a statistical test.) (3) k. Predict what you will pay your nephew now. How much change is there from your last prediction? (2) 4) An airport authority wants to compare training of air traffic controllers at three locations. Data is on the next page. To personalize these data add the last two digits of your student number as a 9 th number to column C. a. Compare the performance of locations A, B, and C assuming that the underlying distribution is nonNormal. (4) [26] b. Use a one-way ANOVA to test the hypothesis of equal means. (5) It is legitimate to check your results by computer, but I expect to see hand computations every step of the way. [31] c. (Extra Credit) Decide between the methods that you used in a) and b). To do this test for equal variances and for Normality on the computer. What is your decision? Why? [4] You can do most of this with the following commands in Minitab if you put your data in 3 columns of Minitab with A, B, and C above them. MTB > AOVOneway A B C MTB > stack A B C c11; #Does a 1-way ANOVA # Stacks the data in c12, col.no. in c12. 3 252x0541 4/22/05 SUBC> SUBC> MTB > MTB > subscripts c12; UseNames. rank c11 c13 vartest c11 c12 #Puts the ranks of the stacked data in c13 #Does a bunch of tests, including Levene’s On stacked data in c11 with IDs in c12. MTB > Unstack (c13); SUBC> Subscripts c12; SUBC> After; SUBC> VarNames. #Unstacks the ranks in the next 5 available # columns. Uses IDs in c12. MTB > NormTest 'A'; SUBC> KSTest. #Does a test (apparently Lilliefors)for Normality # on column A. Data for Problem 4 Row 1 2 3 4 5 6 7 8 A 96 82 88 70 90 91 87 88 B 65 74 72 66 79 82 73 C 60 73 85 61 79 85 88 79 This might help. MTB > sum c1 Sum of A Sum of A = 692 MTB > ssq c1 Sum of Squares of A Sum of squares (uncorrected) of A = 60278 MTB > sum c2 Sum of B Sum of B = 511 MTB > ssq c2 Sum of Squares of B Sum of squares (uncorrected) of B = 37535 4