252y0431 4/30/04 (Page layout view!) ECO252 QBA2 THIRD HOUR EXAM Apr 16 2004 Name Key Hour of Class Registered ____ I. (30+ points) Do all the following (2 points each unless noted otherwise). Do not answer question ‘yes’ or ‘no’ without giving reasons. 1. Turn in your computer problems 2 and 3 marked as requested in the Take-home. (5 points, 2 point penalty for not doing.) 2. (Dummeldinger) As part of a study to investigate the effect of helmet design on football injuries, head width measurements were taken for 30 subjects randomly selected from each of 3 groups (High school football players, college football players and college students who do not play football – so that there are a total of 90 observations) with the object of comparing the typical head widths of the three groups. If the researchers are reluctant to assume that the data in each of these three groups comes from a Normally distributed population, they should use the following method. a. *The Kruskal-Wallis test. b. One-way ANOVA c. The Friedman test d. Two-Way ANOVA 3. Assume that the researchers ignore your advice, whether right or wrong, in problem 2. If one-way ANOVA is used, how many degrees of freedom apply to the Within sum of squares? [9] Solution: Because n 90, there are a total of 89 degrees of freedom. Three columns use up 2 degrees of freedom, leaving n m 87 . 4. (Berenson et al.) In a study of drive-through times at fast food chains, the following was recorded (in seconds). n1 n 2 n3 n 4 n 5 20 , x.1 150 , x.2 167 , x.3 169 , x.4 171, x.5 172 , where 1 = Wendy’s, 2 = McDonald’s, 3 = Checkers, 4 = Burger King, 5 = Long John Silver’s. H 0 : 1 2 3 4 5 One-Way ANOVA Source DF SS MS F Statistic p Value 4 ???? ???? ???? 3.24067E-08 Between 95 12407 130.6 ???? Within 99 Total You do not need to fill in any of the omitted data. Does the ANOVA show a significant difference between drive through times? Why? (2) [11] Solution: Because the p-value is .000000032067, which is less than any value of that we might use, reject the null hypothesis of equal mean drive-through times for the three chains. 5. From the above ANOVA and the means given above, do the mean time times for McDonald’s and Long John Silver’s differ significantly? Use a Tukey method. (5) All right you Tukeys, here’s the answer. The formula given in the last graded assignment was 1 2 x1 x2 q m,n m s 2 1 1 . This gives rise to Tukey’s HSD (Honestly Significant n1 n 2 Difference) procedure. Two sample means x .1 and x .2 are significantly different if x.1 x.2 is greater 252y0431 4/15/04 than q m,n m s 2 1 1 . We have for McDonald’s x.2 167 , and for Silver’s x.5 172 . We can read n1 n 2 n1 n 2 n3 n 4 n 5 20 , s MSW 130.6 11.428 and n m 95 from the printout. From the m, n m Tukey table q.05 q.505,95 3.93 . The confidence interval is thus 2 5 x2 x5 q m,n m s 1 1 11 .428 167 172 3.93 n 2 n5 2 1 2 5 3.93 11 .428 20 20 2 5 44.912 0.2236 5 10.04 . Since 10.04 is larger in absolute value than -5, this confidence interval includes zero and the difference is not significant. Exhibit 1: A large national bank charges local companies for using their services. A bank official reported the results of a regression analysis designed to predict the bank’s charges (Y) -- measured in dollars per month -- for services rendered to local companies. One independent variable used to predict service charge to a company is the company’s sales revenue (X) -- measured in millions of dollars. Data for 21 companies who use the bank’s services were used to fit the model. The analyst took the Minitab output home to check out, but it fell into a puddle and all that he (or I) can read is below. The regression equation is Y = -2700 + 20.00 X Predictor Constant X s = 65.00 6. 7. 8. Coef -2700.0 20.000 R-sq = ---- Stdev ------- t-ratio ----- p 0.600 0.034 R-sq(adj) = ---- Referring to Exhibit 1, interpret the estimate of 0 , the Y-intercept of the line. a) All companies will be charged at least $2,700 by the bank. b) *There is no practical interpretation since a sales revenue of $0 is a nonsensical value. c) About 95% of the observed service charges fall within $2,700 of the least squares line. d) For every $1 million increase in sales revenue, we expect a service charge to decrease $2,700. Referring to Exhibit 1, interpret the p value for testing whether 1 exceeds 0. a) *There is sufficient evidence (at the = 0.05 level) to conclude that sales revenue (X) is a useful linear predictor of service charge (Y). b) There is insufficient evidence (at the = 0.10 level) to conclude that sales revenue (X) is a useful linear predictor of service charge (Y). c) Sales revenue (X) is a poor predictor of service charge (Y). d) For every $1 million increase in sales revenue, we expect a service charge to increase $0.034. Referring to Exhibit 1, a 95% confidence interval for 1 is (15, 30). Interpret the interval. a) We are 95% confident that the mean service charge will fall between $15 and $30 per month. b) We are 95% confident that the sales revenue (X) will increase between $15 and $30 million for every $1 increase in service charge (Y). c) *We are 95% confident that average service charge (Y) will increase between $15 and $30 for every $1 million increase in sales revenue (X). d) At the = 0.05 level, there is no evidence of a linear relationship between service charge (Y) and sales revenue (X). [22] 2 252y0431 4/15/04 Exhibit 2: The marketing manager of a company producing a new cereal aimed for children wants to examine the effect of the color and shape of the box's logo on the approval rating of the cereal. He combined 4 colors and 3 shapes to produce a total of 12 designs. Each logo was presented to 2 different groups (a total of 24 groups) and the approval rating for each was recorded and is shown below. The manager analyzed these data using the = 0.05 level of significance for all inferences. SHAPES Red Circle 54 44 34 36 46 48 Square Diamond COLORS Green Blue 67 61 56 58 60 60 Yellow 36 44 36 30 34 38 45 41 21 25 31 33 Analysis of Variance Source df Colors 3 Shapes 2 Interaction 6 Error 12 Total 23 SS 2711.17 579.00 150.33 150.00 3590.50 MS 903.72 289.50 25.055 12.500 F p 72.30 23.16 2.044 0.000 0.000 9. Referring to Exhibit 2, fill in the first 5 missing numbers (not the missing p-value). (3) Answer: Values are filled in in red. Note that 6 is the product of 3 and 2 and that 12 makes the column add up. The MS column can be found by dividing SS by df . The F column is the MS values divided by MSW 12.5 , so that MSW can be gotten or checked from the two values of F that are supplied. 10. Referring to Exhibit 2, assume that your degrees of freedom are correct and find the 5% value of F on the table that would be used to test if the interaction is significant. What is your conclusion and why? (3) [28] 6,12 3.00 . Since the computed F is below the table F, we do not reject the null Answer: F.05 hypothesis and conclude that the interaction is not significant. 3 252y0431 4/15/04 Exhibit 3 (Mendenhall, et al.): The president of a local company has asked the vice presidents of the firm to provide an analysis of the business climate of 4 states that may be considered for the location of a manufacturing facility. Each VP rates the state’s business climate on a 1-10 scale with 10 as outstanding and 1 as unacceptable. State Vice President Arkansas Colorado Illinois Iowa Abel 8.5 8.0 3.5 6.0 Baker 7.5 8.0 6.0 5.5 Charley 9.0 6.0 4.0 7.0 Dogg 8.0 6.0 7.0 4.0 Easy 7.0 5.5 4.5 7.5 11. Referring to Exhibit 3, assume that the underlying distribution is not Normal. Do an appropriate analysis. a)Tell what test you are going to use.(1) b) State your null hypothesis. (1) c) Perform the test and state your conclusion. (4) d) On the basis of your results, should business climate be considered in locating the facility? Why? (1) [35] Solution: . a) We use the Friedman test. The following is edited from the outline. The Friedman test is equivalent to two-way ANOVA with one observation per cell when the underlying distribution is non-normal. The null hypothesis is H 0 : Columns come from same distribution or the medians are equal. Note that the only difference between this and the Kruskal-Wallis test is that the data is cross-classified in the Friedman test. There are c columns and r rows. In each row the numbers are ranked from 1 to c . For each column, compute SRi , the rank sum of column i . To check the ranking, note that the sum of the rank sums is 12 rc c 1 compute the Friedman statistic F2 shown in the Friedman Table, use the not larger than SR i rc c 1 . Now 2 SR 3r c 1 . If the size of the problem is larger than those 2 i i distribution, with df c 1 , where c is the number of columns. If F2 is 2 .205 , do not reject the null hypothesis. b) The null hypothesis is equal medians. c) The table is shown below with names replaced by numbers and rankings shown. x1 r1 x2 r2 x3 r3 x4 Row 1 8.5 4 8.0 3 3.5 1 6.0 2 Row 2 7.5 3 8.0 4 6.0 2 5.5 1 Row 3 9.0 4 6.0 2 4.0 1 7.0 3 Row 4 8.0 4 6.0 2 7.0 3 4.0 1 Row 5 7.0 3 5.5 2 4.5 1 7.5 4 SR 18 13 8 The sum of the rank sums is 18 + 13 + 8 + 11 = 50 and this checks against 12 545 50 . The Friedman statistic is F2 2 rc c 1 r4 11 rc c 1 SRi 2 SR 3r c 1 2 i i 12 18 2 13 2 8 2 11 2 355 0.12 324 169 64 121 75 81 .36 75 6.36 The 5 4 5 3 dimensions are to large for the Friedman table, so compare this with 2 .05 =7.8147. Since the computed chi-squared is smaller than the table value, do not reject the null hypothesis. d) So, since there is no significant difference in business climate, do not consider it. 4 252y0431 4/15/04 ECO252 QBA2 Third EXAM Apr 16, 2004 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ Class days and time : _________________________ Please Note: computer problems 2 and 3 should be turned in with the exam. In problem 2, the 2 way ANOVA table should be completed. The three F tests should be done with a 5% significance level and you should note whether there was (i) a significant difference between drivers, (ii) a significant difference between cars and (iii) significant interaction. In problem 3, you should show on your third graph where the regression line is. II. Do the following: (23+ points). Assume a 5% significance level. Show your work! 1. (Albright, Winston, Zappe) Boa Constructors, an international construction company with offices in Texas, the Cayman Islands, Belarus, Bosnia and Iraq, conducts an employee empowerment program and after a few months asks random samples of its employees in each office to rate the program on a 1- 10 scale. Assume that each column below represents a random sample taken in one office. Assume that the underlying distribution is Normal and test the hypothesis 1 2 3 4 5 . Data is on the next page. a) Note that office 2, the ‘head’ office in the Cayman Islands, has a smaller sample than the rest. You can help by adding a seventh measurement, the third to last digit of your student number (If it’s a zero, use 10). For example, Seymour Butz’s student number is 976500 and he will have a second column that reads 7, 6, 10, 3, 9, 10, 5. This should not change the results by much. Find the sample variance of this column. (2) b) Test the hypothesis (6) Show your work – it is legitimate to check your results by running these problems on the computer, but I expect to see hand computations for every part of them. c) Compare means two by two, using any one appropriate statistical method, to find out which were happiest. Actually, we really want to test if the programs worked significantly better in the first two offices, which are English-speaking, than the other three, which are not. Citing numbers from your comparison results, is this correct? (3) d) (Extra Credit) Now we find out that this was not a random sample and that each row represents a separate job description. If this changes your analysis, redo the analysis. In order to fill out the data from the Cayman Islands, use the last two digits in your student number. For example, Seymour Butz’s student number is 976500 and he will have a second column that reads 7, 6, 10, 3, 9, 10, 5, 10, 10 (5) e) (Extra Credit) What if you found out that each column in the data in b) was a random sample from a non-Normal distribution? If this changes your analysis, redo the analysis. (5) f) Run Levene’s test on the data in b) . You may do this by computer. There will be lots of output, but just look at the 2 or 3 lines from Levene’s test. What does it test for and what is the conclusion ? (2) Hint: If you put your data in the first 5 columns of Minitab with a column number above them, the following should be of interest. MTB > AOVOneway c1-c5. #Does a 1-way ANOVA MTB > Stack c1-c5 c11; # Stacks the data in c12, col.no. in c12. SUBC> Subscripts c12; SUBC> UseNames. MTB > rank c11 c13 #Puts the ranks of the stacked data in c13 MTB > Unstack (c13); SUBC> Subscripts c12; SUBC> After; #Unstacks the data in the next 5 available SUBC> VarNames. # columns. Uses IDs in c12. MTB > %Vartest c11 c12 #Does a bunch of tests, including Levene’s On stacked data in c11 with IDs in c12. If you remember what you did in Computer Problem 2, you should be able to add row numbers in an unused column and run a 2-way ANOVA. Row 1 2 3 4 5 6 7 8 9 1 8 2 9 8 3 10 9 6 8 Ratings of Program Office 2 3 4 5 7 6 10 3 9 10 7 5 5 5 4 3 5 5 3 5 3 6 9 6 5 5 6 3 6 6 6 6 3 4 8 6 2 Sum of column 1 = 63.000 Sum of squares of column Sum of column 3 = 42.000 Sum of squares of column Sum of column 4 = 48.000 Sum of squares of column Sum of column 5 = 47.000 Sum of squares of column 1 = 503.00 3 = 208.00 4 = 282.00 5 = 273.00 5 252y0431 4/15/04 Solution: Seymour’s data is as below. Row 1 2 3 4 5 6 7 8 9 a) s 2 Ratings of Program Office 1 8 2 9 8 3 10 9 6 8 x 2 2 3 4 5 7 6 10 3 9 10 5 7 5 5 5 4 3 5 5 3 5 3 6 9 6 5 5 6 3 6 6 6 6 3 4 8 6 2 nx 2 n 1 b) Row 400 77.143 2 7.1405 6 1 8 2 9 8 3 10 9 6 8 1 2 3 4 5 6 7 8 9 sum Sum of column 1 = 63.000 Sum of squares of column 503.00 Sum of column 2 = 50.000 Sum of squares of column 400.00 Sum of column 3 = 42.000 Sum of squares of column 208.00 Sum of column 4 = 48.000 Sum of squares of column 282.00 Sum of column 5 = 47.000 Sum of squares of column 273.00 2 7 6 10 3 9 10 5 3 7 5 5 5 4 3 5 5 3 4 5 3 6 9 6 5 5 6 3 5 6 6 6 6 3 4 8 6 2 3 = 4 = 5 = Total . 50.000 42.000 48.000 47.000 n j (count) 9 7.000 9.000 9.000 9.000 x j (mean) 7 7.143 4.667 5.333 5.222 503 400.000 208.000 282.000 273.000 49 51.020 21.778 28.444 27.272 meansq 1 = s 2.673 63 SS 1 = 250.00 x 43.00 n 5.814 x 1666.00 x x n 2 x nx 1666 435.814 1666 1453.512 212.488 SSB n x nx 949 751 .020 921 .778 928 .444 927 .272 1453 .512 1495 .136 1453 .512 SST 2 2 j .j 2 2 2 41 .624 SSW SST SSB 212 .488 41.624 170 .864 Source SS DF MS F Between 41.624 4 10.406 2.31 Within 170.864 38 4.496 Total 212.488 42 4,38 Since F.05 2.62 we do not reject H 0 which is ‘no difference between mean satisfaction by offices.’ I’m really surprised. There isn’t too much reason to compare offices at this point. 6 252y0431 4/15/04 c) The two useful intervals would be Scheffe’ 1 2 x1 x2 s and Tukey 1 2 x1 x2 q m,n m 2 1 1 n1 n2 1 1 . For Scheffe’ the error part of the interval is n1 n 2 1 1 2 42.62 4.496 6.864 .4714 3.236 or n1 n 2 9 m 1Fm1,nm s 42.62 4.496 m 1Fm 1, n m s 1 1 6.864 .5040 3.459 for intervals with x2 in them. For Tukey the error part of 9 7 the interval is q 5,38 4.496 2 1 1 4.496 , which is about 4.05 n1 n 2 2 1 1 1 6.07 2.02 or 9 9 9 1 1 8 6.07 2.16 . So let’s look at the differences. 9 7 63 2 12 7 – 7.143 = -0.143 24 7.143 – 5.333 = 1.810 13 7 – 4.667 = 2.333s 25 7.143 – 5.222 = 1.921 14 7 – 5.333 = 1.667 34 4.667 – 5.333 = -0.666 15 7 – 5.222 = 1.778 35 4.667 – 5.222 = -0.555 23 7.143 – 4.667 = 2.476s 45 5.333 – 5.222 = 0.111 Obviously, none of the differences are as large as the error terms by the Scheffe’ criterion, so none of them are significant by this criterion and this small sample gives us no evidence of differences between English and non-English speaking offices. The less conservative Tukey differences show the third office to be less happy with the program than the first or second. d) This is real work. Seymour has the following. 4.05 4.496 Row 1 1 2 3 4 5 6 7 8 9 sum 2 4 7 5 5 5 4 3 5 5 3 42.00 5 5 3 6 9 6 5 5 6 3 48.00 sum 8 2 9 8 3 10 9 6 8 63 7 6 10 3 9 10 5 10 10 70.000 count 9 9.000 9.000 9.000 9.000 x j 7 7.778 4.667 5.333 5.222 6 6 6 6 3 4 8 6 2 47.0 count 33 5 22 5 36 5 31 5 25 5 32 5 32 5 33 5 26 5 270.00 45 SS 6.6 4.4 7.2 6.2 5.0 6.4 6.4 6.6 5.2 6.0 223 110 278 215 151 250 220 233 186 1866 x 45.00 xi 2 x i 43.56 19.36 51.84 38.44 25.00 40.96 40.96 43.56 27.04 330.72 2 xijk x i 2 6.00 x 2 503 600.000 208.000 282.000 273.000 1866.00 xijk SS x j2 49 60.494 From the above x 3 21.778 x 270 , x 270 6.00 . n x SSR C x SSC R 45 28.444 n 45 , SST 27.272 x x 2 ij 2 ij 186.99 x . j . 2 1866 , x 2 i. 330 .72 x 2 .j 186 .99 and n x 1866 456.00 2 1866 1620 246 . 2 2 .j nx 2 9186.99 456.002 1682.91 1620 62.91 . 2 i. nx 2 5330.72 456.002 1653.60 1620 33.20 SSW SST SSC SSR 149 .49 7 252y0431 4/15/04 Source Rows Columns Within Total SS 33.60 62.91 149.49 246.00 DF 8 4 32 44 MS 4.20 15.7275 4.6716 F 0.90ns 3.37s 8,32 2.29 F.05 4,32 3.32 F.05 8,32 2.29 we do not reject Since F.05 H 01 which is ‘no difference between individual (row) means.’ Since 4,32 3.32 we reject F.05 H 02 which is ‘no difference between office (column) means. e) If each column is a random sample from a non-normal distribution, use a Kruskal-Wallis test. We only need a Friedman test if the data is cross-classified. The original data and its ranks are shown below. The ranking should go from one to 47, but because there are so many ties, each number represents an average rank. Column x1 8 2 9 8 3 10 9 6 8 Sum x2 x3 x4 x5 r1 r2 r3 r4 r5 7 6 10 3 9 10 5 7 5 5 5 4 3 5 5 3 5 3 6 9 6 5 5 6 3 6 6 6 6 3 4 8 6 2 34.5 1.5 38.5 34.5 6.0 42.0 38.5 25.5 34.5 31.5 25.5 42.0 6.0 38.5 42.0 16.0 31.5 16.0 16.0 16.0 10.5 6.0 16.0 16.0 6.0 16.0 6.0 25.5 38.5 25.5 16.0 16.0 25.5 6.0 25.5 25.5 25.5 25.5 6.0 10.5 34.5 25.5 1.5 255.5 201.5 134.0 175.0 180.0 To check the ranking, note that the sum of the five rank sums is 255.5 + 201.5 + 134.0 + 175.5 + 180.0 = nn 1 4344 946 . 946.0, and that the sum of the first n numbers is 2 2 12 SRi 2 3n 1 Now, compute the Kruskal-Wallis statistic H nn 1 i ni 12 255 .52 201 .52 134 .02 175 .02 180 .02 344 .006342 22051 .57 132 7.862 . 9 7 9 9 9 4344 If. Since both are above .05 , do not reject H 0 . Since the size of the problem is larger than those shown in the Kruskal-Wallis table , use the 2 distribution, with df m 1 4 , where m is the number of columns. Compare H with .2054 9.48775 . Since H is smaller than .205 , do not reject the null hypothesis. 8 252y0431 4/15/04 f) As I threatened, I ran this on the computer. Data was stacked in c1-c5. The ‘stat’ pulldown menu was chosen and ANOVA picked, followed by ‘test for equal variances.’ Output follows. MTB > %Vartest c11 c12; SUBC> Confidence 95.0. Executing from file: W:\wminitab13\MACROS\Vartest.MAC Macro is running ... please wait Test for Equal Variances Response C11 Factors C12 ConfLvl 95.0000 Bonferroni confidence intervals for standard deviations (Comment: These are apparently intervals of the type n 1s 2 2 2 n 1s 2 2 1 2k . Note, for example that there are k 5 intervals and 1.68047 2 82.78388 2 8 2 .005 .) 2k Lower Sigma Upper N Factor Levels 1.68047 1.52009 0.73931 1.08823 1.12031 2.78388 2.67261 1.22474 1.80278 1.85592 6.79093 7.96390 2.98761 4.39765 4.52729 9 7 9 9 9 1 2 3 4 5 Bartlett's Test (normal distribution) (Comment: This was explained in the new outline pages and has a null hypothesis of equal variances, which, because of the high p-value, we do not reject.) Test Statistic: 5.961 P-Value : 0.202 Levene's Test (any continuous distribution) (Comment: This was the only part that you were supposed to worry about. This was explained in the new outline pages and has a null hypothesis of equal variances, which, because of the high p-value, we do not reject.) Test Statistic: 1.056 P-Value : 0.391 Test for Equal Variances: C11 vs C12 Test for Equal Variances for C11 95% Confidence Intervals for Sigmas Factor Levels 1 Bartlett's Test Test Statistic: 5.961 2 P-Value : 0.202 3 Levene's Test 4 Test Statistic: 1.056 P-Value : 0.391 5 0 1 2 3 4 5 6 7 8 9 252y0431 4/15/04 2. (Keller, Warrack) A dealer records the odometer reading and selling price in thousands of a sample of 100 3-year old Ford Tauruses (well equipped and in excellent condition) sold at auction. Unfortunately, he missed one car in his initial computations. The 101st car has an odometer reading of 16.000 (in thousands) and sold at 14.800 plus the last three digits of your student number divided by 1000. For example, Seymour Butz’s student number is 976500, so he thinks the car sold at $14.800 + $0.500 = 15.300 (thousands). The column sums are given below without the 101st car, so you should find it easy to adjust these sums for the 101st car. Row Odometer Price 1 2 3 . . . 98 99 100 37.388 44.758 45.833 .. .. .. 33.190 36.196 36.392 14.646 14.122 14.016 .. .. .. 14.518 14.712 14.266 sumy sumx smxsq smxy smysq 1482.28 3600.95 133977 53107.6 21997.3 Note that these sums can’t be used directly, but they should help you to get the corrected numbers. ‘Price’ is the dependent variable and ‘Odometer’ is the independent variable. If you don’t know what that means, don’t do the problem until you find out. Show your work – it is legitimate to check your results by running the problem on the computer, but I expect to see hand computations that show clearly where you got your numbers for every part of this problem. a. Compute the regression equation Y b0 b1 x to predict the ‘Price’ on the basis of ‘Odometer’. (2) b. How much does the equation predict that a car with an odometer reading of 35000 miles will sell for? If the answer isn’t reasonable compared to the prices shown above, find your mistake and fix it. (1) 2 c. Compute R . (2) d. Compute s e . (2) e. Compute s b0 and do a significance test on f . Compute s b1 and do a confidence interval for b0 (1.5) b1 (1.5) g. Do an ANOVA table for the regression. What conclusion can you draw from this table about the relationship between odometer reading and price? Why? (2) h. Do a prediction interval for the selling price of the car in b. Explain the difference between this and a confidence interval and why this is the appropriate interval to use here. (3) [73] Solution: Seymour’s data is modified as below. sumy sumx smxsq sumxy wrong. If sumxy 1482 3601 133977 53108 + + + + 15.3 16 256 244.8 = = = = 1498 3617 134233 53352 I find it hard to believe that so many people got this xy , then you should have added x 101 y101 to it, not x101 y101 . Even if you didn’t know this, and you only had 20 or 30 examples, you shoulod have realized something was wrong when R 2 came out above 1 or SSE came out negative. These are ‘unreasonable’ answers and it would at least have been wise to cover your tails by admitting it. smysq n n 101 , 21997 + 234.09 100 + 1 x 3617 , y 1498 , x = 22231 = 101 2 134233 , xy 53352 Spare Parts Computation: SSx x 2 x 3617 35.8113 x 4705 .75 * y 1498 14.8275 y 277 .992 n 101 n 101 Sxy SSy and y 2 22231 . nx 2 134233 10135.81132 xy nx y 53352 10135.8113 14.8275 y 2 ny 22231 10114 .8275 2 2 25 .9650 SST * 10 252y0431 4/15/04 a) b1 xy nxy 277 .992 0.05907 x nx 4705 .75 Sxy SSx 2 2 Since price falls as odometer rises b1 0. b0 y b1 x 14.8275 0.05907 35.8113 16.9431 So Y b b x becomes Yˆ 16 .9431 0.05907 x . 0 1 b) Yˆ 16.9431 0.0590735 14.876 , a price that looks very much like the ones in the original data. c) SSR b1 Sxy .05907 (277 .992 ) 16.4224 * R 2 R2 Sxy2 277 .992 2 SSxSSy 4705 .75 25.9650 SSR 16 .4224 .6324 or SST 25 .9650 This must be between zero and one. d) SSE SST SSR 25.9650 16.4224 9.5426 * s e2 s e 0.09640 0.3105 SSE 9.5426 0.09640 n2 99 ( s e2 is always positive!) H 0 : 0 00 0 99 e) For the remainder of this problem, t n2 t.025 1.984 We are testing 2 H 1 : 0 00 0 1 X 2 1 35 .8113 2 s b20 s e2 0.09640 0.09640 0.009901 0.272528 0.02723 * n SSx 101 4705 .75 sb0 0.02723 0.16500 * t b0 00 16 .9431 102 .69 0.16500 s b0 Make a diagram of an almost Normal curve with zero in the middle and, if .05 , the rejection zones are above t n2 1.984 and below t n2 1.984 . Since our computed t-ratio is, at 102.69, not between 2 2 the two critical values, we reject the null hypothesis that the coefficient is not significantly different from zero and we can say that b0 is significant. 1 1 f) s b21 s e2 0.09640 0.00002049 * sb1 0.00002049 0.004526 * 4705 . 75 SS x 1 b1 t sb1 0.05984 1.984 0.004526 0.060 0.009 . Note that the error part of the equation 2 is smaller in absolute value than the slope, so the slope is significant. g) Note that the F test shown next shows the same information as in f). In fact, the two F s in the table are just the squares of our t s. Because the computed F is larger than the table F, we reject the null hypothesis of no relationship between odometer reading and price. Source SS DF MS F F.05 Regression 16.4224* 1 16.4224 170.375s F 1,99 3.14 Error (Within) 9.5426* Total 25.9650* * These quantities must be positive. 99 100 0.09639 11 252y0431 4/15/04 1 X X 2 1 35 35 .8113 2 h) sY2 s e2 0 1 0.09640 1 n SS x 4705 .75 101 0.09640 0.009901 0.000140 1 0.09640 1.01004 0.0974 * sYˆ 0.0974 0.3120 * Y0 Yˆ0 t sY 14.876 1.984 0.3120 14.876 0.6190 A prediction interval is appropriate when we are concerned about the price of one car, rather than the average price of cars with the same odometer reading. 12