252x0771 11/26/07 (Page layout view!) ECO252 QBA2 THIRD EXAM November 29, 2007 Version 1 Name ______________________ Student number_______________ Class Day and hour____________ I. (8 points) Do all the following (2 points each unless noted otherwise). Make Diagrams! Show your work! x ~ N 26,14 1. P20 x 38 2. Px 0 3. P32 x 76 4. x.075 1 252x0771 11/26/07 (Page layout view!) II. (22+ points) Do all the following (2 points each unless noted otherwise). Do not answer a question ‘yes’ or ‘no’ without giving reasons. Show your work when appropriate. Use a 5% significance level except where indicated otherwise. Note that this is extremely long and that no one will do all the problems, so look them over! 1. Turn in your computer problems 2 and 3 marked as requested in the Take-home. (5 points, 2 point penalty for not doing.) 2. In an ordinary 1-way ANOVA, if the computed F statistic is below the value from the F table at the given significance level, we can a. Reject the null hypothesis because the difference between the means is not significant b. Reject the null hypothesis because there is evidence of a significant difference between some of the means. c. Not reject the null hypothesis because the difference between the means is not significant. d. Not reject the null hypothesis because the difference between the means is significant. c. Not reject the null hypothesis because the difference between the variances is not significant. d. Not reject the null hypothesis because the difference between the variances is significant. e. None of the above. [7] 3. After an analysis if variance, you would use the Tukey-Kramer procedure or similar confidence intervals to check a. For Normality b. For equality of variances c. For independence of error terms d. For pairwise differences in means e. For all of the above f. For none of the above 4. If an ordinary one-way ANOVA has 25 columns 17 rows and 17 25 425 , the degrees of freedom for the F test are a. 400 and 24 b. 408 and 16 c. 24 and 400 d. 16 and 408 e. 400 and 424 f. 408 and 424 g. 424 and 400 h. 424 and 408 i. 16 and 24 j. None of the above. The correct answer is _______. 5. Assuming that your answer to 4 is correct and that the significance level is 5%, the correct value of F from the table is _______. (This may have to be approximate. If so, what did you use?) (1) [12] 2 252x0771 11/26/07 (Page layout view!) Exhibit 1 A realtor believes that the selling price of a home (in $ thousands) is related to the condition of the home (on a 1 to 10 scale) and the size of the home (in hundreds of square feet). He runs the data below on Minitab and gets the following. Row Price Size Condition 1 360 23 5 2 200 11 2 3 340 20 9 4 280 17 3 5 280 15 8 6 330 21 4 7 380 24 7 8 250 13 6 MTB > regress c1 2 c2 c3 Regression Analysis: Price versus Size, Condition The regression equation is Price = 64.5 + 11.7 Size + 4.88 Condition Predictor Coef SE Coef T P Constant 64.539 4.228 15.27 0.000 Size 11.7282 0.2317 50.62 0.000 Condition 4.8826 0.4494 _____ _____ S = 2.75997 R-Sq = 99.9% R-Sq(adj) = 99.8% Analysis of Variance Source DF SS Regression 2 25712 Residual Error 5 38 Total 7 25750 Source Size Cond DF 1 1 MS 12856 8 F 1687.70 P 0.000 Seq SS 24813 899 The sum of the price column is 2420 and the sum of the squared numbers in the sales column is not needed. The sum of the 'Size' column is 144 and the sum of the squared numbers in the Size column is 2750. The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is 284. If Price is the dependent variable and Size and Condition are the independent variables we have found that the sum of x1y is 45540 and the sum of x1 x2 is 818. The sum of x2y has not been computed. 6 and 7. In the multiple regression, are the coefficients of size and condition significant at the 5% significance level? Give reasons. Do not do unneeded computations. (2) [15] 8. Assuming that the coefficients in the multiple regression are correct, what price would we predict for a home with 20(hundred) square feet and a condition score of 9? (1) 9. Using the information in the multiple regression printout, make your result in 8) into a rough prediction interval. (2) 10. Using the information in the printout, what is the value of R-squared for a regression of ‘Price’ against ‘Size’ alone? (2) [20] 3 252x0771 11/26/07 (Page layout view!) Exhibit 1 A realtor believes that the selling price of a home (in $ thousands) is related to the condition of the home (on a 1 to 10 scale) and the size of the home (in hundreds of square feet). He runs the data below on Minitab and gets the following. Row Price Size Condition 1 360 23 5 2 200 11 2 3 340 20 9 4 280 17 3 5 280 15 8 6 330 21 4 7 380 24 7 8 250 13 6 MTB > regress c1 2 c2 c3 Regression Analysis: Price versus Size, Condition The regression equation is Price = 64.5 + 11.7 Size + 4.88 Condition Predictor Coef SE Coef T P Constant 64.539 4.228 15.27 0.000 Size 11.7282 0.2317 50.62 0.000 Condition 4.8826 0.4494 _____ _____ S = 2.75997 R-Sq = 99.9% R-Sq(adj) =99.8% Analysis of Variance Source DF SS MS F P Regression 2 25712 12856 1687.70 0.000 Residual Erro 5 38 8 Total 7 25750 Source DF Seq SS Size 1 24813 Condition 1 899 The sum of the price column is 2420 and the sum of the squared numbers in the sales column is not needed. The sum of the 'Size' column is 2750 and the sum of the squared numbers in the Size column is 2950. The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is 284. If Price is the dependent variable and Size and Condition are the independent variables we have found that the sum of x1y is 45540 and the sum of x1 x2 is 818. The sum of x2y has not been computed. 11. Do a simple regression of ‘Price’ against ‘Condition’ alone. xy that you will need for this regression. Show your work! (2) a) Compute the sum Don’t compute stuff that has already been done for you! b) It says that you do not need to know the sum of squares in the sales column. You do Y 2 nY 2 . Without doing any computing, tell however need the spare part SS y what its value is. (1) c) Compute the coefficients of the equation Yˆ b0 b2 x to predict the value of ‘Price’ on the basis of ‘Condition.’ (4) [27] d) Compute R 2 . (3) 4 252x0771 11/26/07 (Page layout view!) Exhibit 1 A realtor believes that the selling price of a home (in $ thousands) is related to the condition of the home (on a 1 to 10 scale) and the size of the home (in hundreds of square feet). He runs the data below on Minitab and gets the following. Row Price Size Condition 1 360 23 5 2 200 11 2 3 340 20 9 4 280 17 3 5 280 15 8 6 330 21 4 7 380 24 7 8 250 13 6 MTB > regress c1 2 c2 c3 Regression Analysis: Price versus Size, Condition The regression equation is Price = 64.5 + 11.7 Size + 4.88 Condition Predictor Coef SE Coef T P Constant 64.539 4.228 15.27 0.000 Size 11.7282 0.2317 50.62 0.000 Condition 4.8826 0.4494 _____ _____ S = 2.75997 R-Sq = 99.9% R-Sq(adj) =99.8% Analysis of Variance Source DF SS MS F P Regression 2 25712 12856 1687.70 0.000 Residual Erro 5 38 8 Total 7 25750 Source DF Seq SS Size 1 24813 Condition 1 899 The sum of the price column is 2420 and the sum of the squared numbers in the sales column is not needed. The sum of the 'Size' column is 144 and the sum of the squared numbers in the Size column is 2750. The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is 284. If Sales is the dependent variable and Size and Condition are the independent variables we have found that the sum of x1y is 45540 and the sum of x1 x2 is 818. The sum of x2y has not been computed. e) Is the slope of the simple regression significant at the 5% level? Do not answer this question without appropriate calculations! (4) f) Predict the price of an average home with a condition of 9 and make your estimate into an appropriate 99% interval. (4) g) Do an analysis of variance using your SST, SSE and SSR for this equation or using 1, R 2 and 1 R 2 . What have you already done that makes this table redundant? If you don’t know what redundant means, ask! (3) [43] h) Using the information on Regression Sums of squares or R 2 and 1 R 2 in the ANOVA that you just did and from the multiple regression, do an F test to see if adding ‘Size’ to the regression of ‘Price’ against ‘Condition’ is worthwhile. Do not waste our time by repeating stuff that has already been done. (3) [46] 5 252x0771 11/26/07 (Page layout view!) Exhibit 2 (Groebner) A product is being produced on 3 different lines using 3 different layouts for the lines. A sample of 36 observations are taken on various days over a period of four weeks so that there are 12 observations for the daily output for each line evenly divided between the three possible layouts. Assume .05 . MTB > Twoway c4 c2 c3; SUBC> Means c2 c3. Two-way ANOVA: output 1 versus line, layout Source DF line 2 layout 2 Interaction __ Error __ Total 35 S = 20.63 R-Sq SS MS F P 187.1 93.5 0.22 0.804 28263.4 14131.7 33.21 0.000 _______ _____ ____ _____ 11489.0 425.5 41874.6 = 72.56% R-Sq(adj) = 64.43% Individual 95% CIs For Mean Based on Pooled StDev line Mean ------+---------+---------+---------+--1 132.583 (---------------*--------------) 2 128.167 (--------------*--------------) 3 127.417 (--------------*---------------) ------+---------+---------+---------+--120.0 128.0 136.0 144.0 Individual 95% CIs For Mean Based on Pooled StDev layout Mean ----+---------+---------+---------+----1 116.667 (----*----) 2 168.250 (----*----) 3 103.250 (----*----) ----+---------+---------+---------+----100 125 150 175 12. Fill in the missing degrees of freedom, the missing sum of squares and the missing mean square. (2) [48] 13. Is there significant interaction between ‘line’ and ‘layout’? Don’t answer unless you can tell me what the evidence is. (2) 14. Is the difference between lines significant? Why?(1) 15. Do a confidence interval of your choice for the difference between layout 1 and layout 3. Tell what kind of interval you are using , what its characteristics are and whether it shows a significant difference. (4)[55] 6 252x0771 11/26/07 (Page layout view!) 16. (Groebner) An industrial firm analyses the amount of breakage (in dollar cost) that occurs using 3 different shipping methods. There is a strong likelihood that the data does not come from the Normal distribution. The purpose of the test is to see if the four shipping methods differ in breakage. The columns can be considered random samples. Rail Plane Truck 7960 8053 8818 8399 7764 9432 9429 9196 9260 6022 5821 5676 The most appropriate method for doing this test is: a) The Friedman Test b) The Kruskal-Wallis Test c) One-way ANOVA d) Two-way ANOVA e) The sign test [57] f) Another test (Name it!) 17. Assume that your decision is correct in 16. What is your null hypothesis or hypotheses? Be specific! Are you talking about rows or columns or both? Are you comparing means, medians, proportions or variances? 18. OK. Let’s see you do the test. (4) [63] 7 252x0771 11/26/07 (Page layout view!) (Blank) 8 252x0771 11/26/07 (Page layout view!) ECO252 QBA2 THIRD EXAM Nov 26-29, 2007 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ Class days and time : _________________________ Please Note: Computer problems 2 and 3 should be turned in with the exam (2). In problem 2, the 2 way ANOVA table should be checked. The three F tests should be done with a 1% significance level and you should note whether there was (i) a significant difference between drivers, (ii) a significant difference between cars and (iii) significant interaction. In problem 3, you should show on your third graph where the regression line is. You should explain whether the coefficients are significant at the 1% level. Check what your text says about normal probability plots and analyze the plot you did. Explain the results of the t and F tests using a 5% significance level. (3) III Do the following. (22+ points) Note: Look at 252thngs (252thngs) on the syllabus supplement part of the website before you start (and before you take exams). Show your work! State H 0 and H 1 where appropriate. You have not done a hypothesis test unless you have stated your hypotheses, run the numbers and stated your conclusion. (Use a 95% confidence level unless another level is specified.) Answers without reasons or accompanying calculations usually are not acceptable. Neatness and clarity of explanation are expected. This must be turned in when you take the in-class exam. Note that from now on neatness means paper neatly trimmed on the left side if it has been torn, multiple pages stapled and paper written on only one side. Show your work! 1) The Lees, in their book on statistics for Finance majors, ask about the relationship of gasoline prices y in cents per gallon to crude oil prices x1 in dollars per barrel and present the data for the years 1975 1988. I have obtained most of the data for the years 1980 – 2007. It is presented below. Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Year 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 GasPrice 1.25 1.38 1.30 1.24 1.21 1.20 0.93 0.95 0.96 1.02 1.16 1.14 1.13 1.11 1.11 1.15 1.23 1.23 1.06 1.17 1.51 1.46 1.36 1.59 1.88 2.30 * 3.10 CrudePrice 26.07 35.24 31.87 26.99 28.63 26.25 14.55 17.90 14.67 17.97 22.22 19.06 18.43 16.41 15.59 17.23 20.71 19.04 12.52 17.51 28.26 22.95 24.10 28.53 36.98 50.23 * 90.00 Yr-1979 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 This data set also contains the year with 1979 subtracted from it x 2 . You may need to use this later. Ignore it in Problem 1. Note that the numbers for 2006 have not yet been published in my source, Statistical 9 252x0771 11/26/07 (Page layout view!) Abstract of the United States, and that the numbers for 2007 are my estimates for third quarter prices. These are unleaded prices, which the Lees did not use. You are supposed to use only the numbers for 1990 through 2006 and one other observation for your data. You will thus have n 17 observations. The other column is the value for the year 1980 a , where a is the second to last digit of your student number. If you are unsure of the data that you are using or if you want help with the sums that you need to do the regression go to 3takehome072a. Show your work – it is legitimate to check your results by running the problem on the computer. (In fact, I will give you 2 points extra credit for checking it and annotating the output for significance tests etc.) But I expect to see hand computations for every part of this problem. a. Compute the regression equation Y b0 b1 x to predict the price of gasoline on the basis of crude oil prices. (3) b. Compute R 2 . (2) c. Compute s e . (2) d. Compute s b1 and do a significance test on b1 (2) e. Compute a confidence interval for b0 . (2) f. You have a crude price for 2007. Using this, predict the gasoline price for 2007 and create a prediction interval for the price of gasoline for that year. Explain why a confidence interval for the price is inappropriate and check to see if my estimated price is in the interval. (3) g. Do an ANOVA for this regression. (3) f) Make a graph of the data. Show the trend line and the data points clearly. If you are not willing to do this neatly and accurately, don’t bother. (2) [19] 2) Now we can use the date to see if there is a trend line in addition to the effect of crude oil. a. Do a multiple regression of the price of gasoline against crude prices and the data variable, which has been massaged to make 1980 year 1. This involves a simultaneous equation solution. Attempting to recycle b1 from the previous page won’t work. (7) c. Compute the regression sum of squares and use it in an ANOVA F test to test the usefulness of this regression. (4) b. Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem. Compare the values of R 2 adjusted between this and the previous problem. Use an F test to compare R 2 here with the R 2 from the previous problem. The F test here is one to see if adding a new independent variable improves the regression. This can also be done by modifying the ANOVAs in b.(4) d. Use your regression to predict the price of gasoline in 2007. Is this closer to the estimated gasoline price? Do a confidence interval and a prediction interval. (3) [37] e. Again there is extra credit for checking your results on the computer. Use the pull-down menu or try Regress GasPrice on 2 CrudePrice Yr-1979 (2) 3) According to Russell Langley, three sopranos were discussing their recent performances. Fifi noted that she got 36 curtain calls at La Scala last week, but Adalina put her down with the fact that she got 39. Could one of the singers really say that she had more curtain calls than another or could the differences just be due to chance? Personalize the data below by adding the last digit of your student number to each number in the first row. Use a 10% significance level throughout this question. Row 1 2 3 4 Fifi 36 22 19 16 Adelina 39 14 20 18 Maria 21 32 28 22 a) State your hypothesis and use a method to compare means assuming that each column represents a random sample of curtain calls at La Scala. (4) 10 252x0771 11/26/07 (Page layout view!) b) Still assuming that these are random samples, use a method that compares medians instead. (3) c) Actually, these were not random samples. Though row 1 represents curtain calls at La Scala (Milan), row 2 was in Venice, row 3 in Naples and row 4 in Rome. Will this affect our results? Does this show anything about audiences on the four cities? Use an appropriate method to compare medians. (5) d) Do two different types of confidence intervals between Milan and the least enthusiastic opera house. Explain the difference between the intervals. (2) e) Assume that we want to compare medians instead. How does the fact that these data were collected at three opera houses affect the results? (3) f) Do you prefer the methods that compare medians or means? Don’t answer this unless you can demonstrate an informed opinion. (1) g) (Extra credit) Do a Levine test on these data and explain what it tests and shows.(3) h) (Extra credit)Check your work on the computer. This is pretty easy to do. Use the same format as in Computer Problem 2, but instead of car and driver numbers use the singers’ and cities’ names. You can use the stat and ANOVA pull-down menus for One-way ANOVA, two-way ANOVA and comparison of variances of the columns. You can use the stat and the non-parametrics pull-down menu for Friedman and Kruskal-Wallis. You also probably ought to test columns for Normality. Use the Statistics pull-down menu and basic statistics to find the normality tests. The Kolmogorov-Smirnov option is actually Lilliefors. The ANOVA menu can check for equality of variances. In light of these tests was ANOVA appropriate? You can get descriptions of unfamiliar tests by using the Help menu and the alphabetic command list or the Stat guide. (Up to 7) [58] You should note conclusions on the printout – tell what was tested and what your conclusions are using a 10% significance level. 11