4/24/01 252y0132 ECO252 QBA2 THIRD HOUR EXAM April 17, 2001 Name KEY Hour of Class Registered (Circle) MWF TR 10 12 12:30 2:00 I. (10+ points) Do all the following; 1. Hand in your computer printouts for problems 2 and 3.(5 points – 3 point penalty for not handing in). remember that the ANOVA printout must be completed, using a 5% significance level, for full credit. I should be able to tell what is tested and what are the conclusions. 2. Do not do the following unless you handed in at least two outputs. On the next few pages there are problems very much like the ones you did. a. A random survey of CEOs (Don Black) asks the question "Do you agree that an increase of market share is a reason to consider a merger?" Responses (Agree?) were between 5 and 1, with 5 indicating strong agreement. Responses were classified in 2 ways, with 'years' dividing respondents according to the number of years they had been with the company and 'size' dividing firms according to the companies' sales in millions of dollars. The output follows. Tabulated Statistics ROWS: Size 1 2 3 4 ALL COLUMNS: Years 1 2 3 ALL 4 4 4 4 16 4 4 4 4 16 4 4 4 4 16 12 12 12 12 48 CELL CONTENTS -COUNT Tabulated Statistics ROWS: Size COLUMNS: Years 1 2 3 1 2.0000 3.0000 2.0000 2.0000 2.0000 1.0000 2.0000 3.0000 2.0000 1.0000 1.0000 2.0000 2 2.0000 1.0000 2.0000 3.0000 2.0000 3.0000 2.0000 3.0000 2.0000 3.0000 1.0000 2.0000 3 3.0000 4.0000 4.0000 5.0000 3.0000 2.0000 4.0000 4.0000 3.0000 2.0000 3.0000 3.0000 4 3.0000 4.0000 4.0000 3.0000 3.0000 3.0000 3.0000 4.0000 2.0000 3.0000 2.0000 3.0000 1 4/24/01 252y0132 CELL CONTENTS -Agree?:DATA Tabulated Statistics ROWS: Size 1 2 3 4 ALL COLUMNS: Years 1 2 3 ALL 2.2500 2.0000 4.0000 3.5000 2.9375 2.0000 2.5000 3.2500 3.2500 2.7500 1.5000 2.0000 2.7500 2.5000 2.1875 1.9167 2.1667 3.3333 3.0833 2.6250 CELL CONTENTS -Agree?:MEAN MTB > Twoway c1 c2 c3; SUBC> Means c2 c3. Two-way Analysis of Variance Analysis of Variance for Agree? Source DF SS MS Size 3 17.083 5.694 Years 2 4.875 2.437 Interaction 6 2.292 0.382 Error 36 17.000 0.472 Total 47 41.250 Size 1 2 3 4 Mean 1.92 2.17 3.33 3.08 Years 1 2 3 Mean 2.94 2.75 2.19 Individual 95% CI -----+---------+---------+---------+-----(------*------) (------*------) (------*-----) (-----*------) -----+---------+---------+---------+-----1.80 2.40 3.00 3.60 Individual 95% CI --------+---------+---------+---------+--(---------*---------) (---------*---------) (---------*--------) --------+---------+---------+---------+--2.10 2.45 2.80 3.15 (i) Complete the ANOVA table that compares the effect of size and years on the response of the CEOs. (2) Solution: Most of this table is copied from above. F is gotten by dividing MS by the error (within) mean square. The F.05 column contains values found in the F table. If the F we computed exceeds the table F we put an 's' (significant) in the F column, otherwise we put in a 'ns' ( not significant) Source DF SS MS F F.05 3,36 Size 3 17.083 5.694 12.064s F 2.87 2,36 Years 2 4.875 2.437 5.163s F 3.26 Interaction 6 2.292 0.382 0.809ns F 6,36 2.36 Error 36 17.000 0.472 Total 47 41.250 (ii) Is there a statistically significant difference between the means for responses of CEOs of firms of different sizes? Show what numbers brought you to your conclusion. (2) Solution: Yes. The 'Size' line is marked significant as explained in (i). This means that we reject the null hypothesis of no significant difference for the means of the 4 'size' categories. 2 4/24/01 252y0132 b. Ken Black says that the Delta Wire Corporation believes that the more positive outlook created by employee education sessions results in greater interest in the job, and thus fewer sick days per worker. A random sample of workers is taken, and a regression is run with 'sick' (number of sick days per worker) as the dependent variable and 'educ' (number of hours of employee education) as the independent variable. Part of the output appears below. Regression Analysis The regression equation is Sick = 7.75 - 0.0795 Educ Predictor Constant Educ Coef 7.7455 -0.07946 s = 2.455 Stdev 0.7980 0.01536 R-sq = 59.8% t-ratio 9.71 -5.17 p 0.000 0.000 R-sq(adj) = 57.5% Analysis of Variance SOURCE Regression Error Total DF 1 18 19 SS 161.26 108.49 269.75 Unusual Observations Obs. Educ Sick 4 120 1.000 17 120 1.000 MS 161.26 6.03 Fit -1.789 -1.789 F 26.75 Stdev.Fit 1.378 1.378 p 0.000 Residual 2.789 2.789 St.Resid 1.37 X 1.37 X X denotes an obs. whose X value gives it large influence. pred 10 5 0 0 50 100 Educ (i) What equation does it give to relate education hours to number of sick days? How many sick days does it predict that someone with 12 hours of education will take? (2) Solution: The equation can be written Y 7.75 0.0795 x or Y 7.7455 0.07946 x . Depending on which version we substitute 12 into, we get Y 7.75 0.079512 6.796 or Y 7.7455 0.0794612 6.792 . (ii) What test or tests would lead you to believe that the company is correct? Cite evidence using a 5% significance level.(2) Solution: The company has asserted that more education will cut down sick days. They are right if 'educ' has a significant negative coefficient. (-0.07946) is negative and the p-value for the t test is zero. This means that we would reject the null hypothesis of insignificance at any level. Alternately, compare the t of -5.17 with the 5% value of t with 18 (from the 'Error' line in the ANOVA) degrees of freedom. 3 4/24/01 252y0132 II. Do at least 4 of the following 5 Problems (at least 10 each) (or do sections adding to at least 40 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where applicable. Never say 'yes' or 'no' without a statistical test. 1. a. In the ANOVA problem beginning on page 1, there is a plot for an individual 95% confidence interval for the mean for size 3. (i) Figure out, using your formulas and the data on the pages, what this interval actually is.(2) (ii) Is there a significant difference between the means for level 1 and level 2 of 'years? To answer this question, do 95% confidence intervals for the difference between these two means that are () Valid when used alone. (1) () Valid when used with other possible differences between means. (2) and state a conclusion (1) b. In the regression problem on page 3. (i) Add a regression line to the graph (1) (ii) Do a 99% confidence interval for the slope of the equation. (2) (iii) Find s e2 from the printout and (if the sample mean for x is 37.70 and the sample standard deviation for x is 36.67) do a confidence interval for the mean number of sick days taken by someone with 12 hours of education (4). Solution: a) (i) From the printout, in the COUNT table, we can find out that there are R 4 rows, C 3 columns and P 4 measurements per cell. We can also see that MSW 0.472 and has 36 degrees of 36 2.028 . There are PC 12 numbers in a row. freedom associated with it. t .025 A source of a confidence interval could be Exercise 14.42 - for row means 1 x1 tRC P 1 2m The table of means says that the mean of row 3 is 3.3337 so 1 3.3337 2.028 MSW . PC 0.472 12 3.334 0.402 . (ii) The printout says that the mean for column 1 is 2.9375, the mean for column 2 is 2.7500, that there are PR 16 numbers in a column and that the F for columns has 2 and 36 degrees of freedom. The outline says: i. A Single Confidence Interval If we desire a single interval we use the formula for a Bonferroni Confidence Interval below with m 1. ii. Scheffé Confidence Interval 2MSW For column means, use 1 2 x1 x2 C 1FC 1, RC P 1 . PR iii. Bonferroni Confidence Interval 2MSW Use for column means 1 2 x1 x2 t RC P 1 . 2m PR () A 95% confidence interval for the difference between these two means that is valid when used alone is 2MSW 1 2 x1 x2 t RC P 1 2 PR 2.9375 2.7500 2.028 20.472 0.1875 2.028 0.243 0.19 0.49 16 4 4/24/01 252y0132 ()A 95% confidence interval for the difference between these two means that is valid when used with 2MSW other possible differences between means 1 2 x1 x2 C 1FC 1, RC P 1 PR 0.1875 2F.052,36 20.472 0.1875 23.26 20.472 0.19 0.62 16 16 Conclusion: Since both these intervals include zero, the difference between these two means is not significant. b) In the regression problem on page 3. (i) Just connect the x's (ii) From the printout, there are 18 degrees of freedom, the coefficient of 'educ' is -0.07946 and the 18 standard deviation of that coefficient is 0.01536. t .005 2.878 . 1 b1 t sb1 2 .0795 2.878 0.01536 0.079 0.044 (iii) From the printout s e 2.455 . Either square this, or copy the Mean Square Error, which is 6.03. Because the total degrees of freedom are 19, n 20 . x 37.70 and since s x 36.67, s x2 1344.69, SS x n 1s x2 191344.68 25549.089 . Since SS x appears in so many formulas, there are many other ways to get it. If X 0 12, Y0 7.7455 0.0794612 6.792 . 1 X X 2 The Confidence Interval is Y0 Yˆ0 t sYˆ ,where sY2ˆ s e2 0 n SS x 2 1 12 37 .70 = 0.45753. So Y Yˆ0 t s ˆ 6.792 2.101 0.45753 6.79 1.42 . 2.455 2 Y 0 20 25549 . 089 . 5 4/24/01 252y0132 2. According to Ken Black an agricultural researcher planted parts of six blocks of land with peanuts , using each of three different methods.. Part of the results are given below. Data is yield per acre in thousands. Assume that the parent distribution is Normal and compare the mean yields for the three methods noting the fact that it is cross-classified. Use .01 . (14) Note: If you wish to ignore the fact that the data is blocked, indicate this now and compare the column means assuming that the data is three independent random samples from a normal distribution.(10). ( .01 ) BLOCK 1 2 3 4 5 6 Sum Sum of squares Method 1 1.31 1.27 1.28 1.22 1.19 1.30 7.57 9.5619 Method 2 1.08 1.10 1.05 1.02 0.99 1.03 6.27 6.5603 Method 3 0.85 1.02 0.78 0.87 0.80 0.91 ? ? Sum 3.24 3.39 3.11 3.11 2.98 3.24 Sum of Squares 3.6050 3.8633 3.3493 3.2857 3.0362 ? Solution: As we said many times -- If the parent distribution is Normal use ANOVA, if it's not Normal, use Friedman or Kruskal-Wallis. If the samples are independent random samples use 1-way ANOVA or Kruskal Wallis. If they are cross-classified, use Friedman or 2-way ANOVA. a) 2-way ANOVA (Blocked by block) ‘s’ indicates that the null hypothesis is rejected. BLOCK Method 1 Method 2 Method 3 Sum SS ni x i. x i2. x1 x2 x3 x i.. 1 2 3 4 5 6 Sum 1.31 1.27 1.28 1.22 1.19 1.30 7.57 nj 6 1.08 1.10 1.05 1.02 0.99 1.03 + 6.27 0.85 1.02 0.78 0.87 0.80 0.91 + 5.23 = +6 +6 3.24 3.39 3.11 3.11 2.98 3.24 19.07 = 18 n x x j 1.2617 1.0450 0.8717 SS 9.5619 + 6.5603 +4.5963 1.0594 x =20.7185 xij2 x 2j 1.5919 + 1.0920 + 0.7599 = 3.4438 = Note that x is not a sum, but is SSC n SST x ij2 3 3 3 3 3 3 18 1.0800 1.1300 1.0367 1.0367 0.9933 1.0800 1.0594 3.6050 3.8633 3.3493 3.2857 3.0362 3.5790 20.7185 x 2 ij 1.1664 1.2769 1.0747 1.0747 0.9866 1.1664 6.7457 x x .2j x . n n x 20 .7185 181.0594 2 20 .7185 20 .2019 0.5166 . 2 j x j 2 n x 63.4438 181.0594 2 20 .6628 20 .2019 0.4609 . This is SSB in a one 2 way ANOVA. SSR n x 2 i i. n x 36.7457 181.0594 2 20 .2371 10 .2019 0.0352 2 ( SSW SST SSC SSR 0.0205 ) 6 2 i. 4/24/01 252y0132 Source SS DF MS F Rows (Blocks) 0.0352 5 0.00704 3.434 Columns(Methods) 0.4609 2 0.23045 112.41 F.01 F 5,10 5.64 ns F 2,10 7.56 s H0 Row means equal Column means equal Within (Error) 0.0205 10 0.00205 Total 0.5166 17 So the yields (column means) are significantly different. b) One way ANOVA (Not blocked by block) ( SSW SST SSB .0557 ) Source SS DF MS F Columns(Employees) 0.4609 2 0.23045 62.065 F.01 F 2,15 6.36 s H0 Column means equal Within (Error) 0.0557 15 0.003713 Total 0.5166 17 Once again, we reject the hypothesis that column means are equal. Yields differ! 7 4/24/01 252y0132 3. a. Data from problem 2 is repeated below. Assume that the distribution is not normal, but that it is blocked (cross-classified), and again compare the distributions represented by the columns. (5) ( .01 ) BLOCK Method 1 Method 2 Method 3 Sum Sum of Squares 1 1.31 1.08 0.85 3.24 3.6050 2 1.27 1.10 1.02 3.39 3.8633 3 1.28 1.05 0.78 3.11 3.3493 4 1.22 1.02 0.87 3.11 3.2857 5 1.19 0.99 0.80 2.98 3.0362 6 1.30 1.03 0.91 3.24 ? Sum 7.57 6.27 ? Sum of squares 9.5619 6.5603 ? b. A researcher looks at the effect of industry (Factor A) and size (measured by $millions in sales) (Factor B) on Research and Development expenditures as a per cent of sales. A sample is taken of 4 firms in each industry-size category. The sample is repeated over three different years (Factor C). If the researcher looks at 3 size categories within 5 industries over the three years, Generate an ANOVA table showing all possible interactions, using the following data. SSA = 24.021, SSB = 4.829, SSC = 4.029, SSAB = 9.059, SSAC = 14.976, SSBC = 2.528, SSABC = 9.615, SST = 154.047. Using a 5% significance level, explain whether the size of the firm and the industry seem to make a difference in R&D expenditures. Which of the other differences and interactions are significant.? (7) Solution: a) Friedman Test H 0 : Columns from same distribution . Rank within rows. BLOCK Method 1 Method 2 Method 3 r1 r2 r3 1 2 3 4 5 6 Sum 1.31 1.27 1.28 1.22 1.19 1.30 3 3 3 3 3 3 18 1.08 1.10 1.05 1.02 0.99 1.03 2 2 2 2 2 2 12 0.85 1.02 0.78 0.87 0.80 0.96 1 1 1 1 1 1 6 There are r 6 rows and c 3 columns. Check: the rank sums must add to r 18 + 12 + 6 = 36, we are all right. The Friedman Statistic is F2 12 r c c 1 cc 1 34 6 36 . Since 2 2 SR 3r c 1 2 12 1 18 2 12 2 6 2 364 504 72 12 . According to the Friedman Table ( c 3, r 6 ), 12 634 6 has a p-value of .000. Since .01, and the p-value for our null hypothesis is below this significance level, we reject H 0 . 8 4/24/01 252y0132 b) There are 5 industries, 3 sizes and 3 years or 5 3 3 45 groups with 4 observations in each group, so n 45 4 180 . . ‘s’ means ‘significant difference’ ( H 0 rejected), ‘ns’ means ‘no significant difference’ ( H 0 accepted). It seems that both industry (Factor A) and size (measured by $millions in sales) (Factor B) make a difference in Research and Development expenditures since their Fs are significant. Year (Factor C) and Interaction AC also have an effect. Source SS DF MS F Factor A 24.021 4 6.00525 9.539 s Factor B 4.829 2 2.41450 3.835 s Factor C 4.029 2 2.01450 3.200 s Interaction AB 9.059 8 1.13238 1.799 ns Interaction AC 14.976 8 1.87200 2.974 s Interaction BC 2.528 4 0.63200 1.004 ns 0.60094 0.62956 0.955 ns Interaction ABC 9.615 Error (Within) 84.990 Total 154.047 16 135 179 F.05 F 4,135 2.44 F 2,135 3.07 F 2,135 3.07 F 8,135 2.01 F 8,135 2.01 F 4,135 2.44 F 16,135 1.73 Note: A = Industry, B = Size, C = Year F ,125 is used in place of F ,135 because the table shows that there is very little change in this area. 9 4/24/01 252y0132 4. (Levine et. al. p 839) The following data are charges in dollars per minute and billions of minutes of calls made to 9 countries from the US in 1996. Country minutes charge Canada 3.05 0.34 Mexico 2.01 0.85 Britain 1.02 0.73 Germany 0.66 0.88 Japan 0.57 1.00 Dom.Rep.0.41 0.84 France 0.36 0.81 India 0.29 1.38 Brazil 0.28 0.96 For your convenience the following values are given: x 7.79, x 2 7.3331, y 8.65, y 2 15.6037 and n 9. a. Compute the regression equation Y b0 b1 x to predict billions of minutes of calls. (6) b. On the basis of your regression, how many billions of minutes of calls do you expect when the charge is $.90 ? (1) c. Compute R 2 . (4) d. Compute s e . (3) e. Compute s b1 and do a significance test on b1 .(4) f.. Do a prediction interval for billions of minutes of calls when the charge is $.90 (3) g. Using your SST etc., put together the ANOVA table (6) Solution: x y 5.9459 (See next page) We compute x Spare Parts Computation: x x 7.79 0.8656 SSx 0.590422 y 1.54116 n n 9 y Sxy 8.65 0.9611 9 SSy 2 nx 2 7.3331 90.8656 2 xy nx y 5.9459 90.8656 0.9611 y 2 ny 15 .6037 90.9611 2 2 7.29009 SST a) b1 Sxy SSx xy nxy 1.54116 2.6103 x nx 0.590422 2 2 b0 y b1 x 0.9611 2.6103 0.8656 3.2206 b) Y b0 b1 x becomes Yˆ 3.221 2.6103 x , and Yˆ 3.221 2.61030.90 0.8713 is the number of billions of minutes that we forecast. SSR 4.02289 xy nx y 2.6103 1.54116 4.02289 R 2 0.5518 or c) SSR b1 Sxy b1 SST 7.29009 10 4/24/01 252y0132 xy nxy Sxy2 1.54116 2 .5518 SSxSSy x 2 nx 2 y 2 ny 2 0.590422 7.29009 2 R 2 d) SSE SST SSR 7.29009 4.02289 3.2672 s e2 y SSy b1 Sxy n2 SSE 3.2672 0.46674 or n2 7 xy nxy 7.29009 0.2.6103 1.54116 0.46674 or ny 2 b1 2 1 R SST 1 R y 2 2 s e2 s e2 ( 0 R 2 1 always!) n2 n2 2 ny 2 n2 s e 0.46674 0.6832 ( s e2 or se2 y 7 2 ny 2 b x 2 1 2 nx 2 n2 is always positive!) e) H 0: 1 0 H 1 : 1 0 s b21 t s e2 SSxx x s e2 2 nx 2 0.46674 0.7906 0.590422 sb1 0.7906 0.8891 b1 10 b1 0 2.6103 2.9358 Assume that .05 and Make a diagram. Show an almost s b1 s b1 0.7906 7 7 normal curve and that the 'reject region is below t.n2 t .025 2.365 or above t.n2 t.025 2.365 . 2 2 Since -2.9358 is in the lower 'reject' region, reject H 0 and conclude that 1 is significant. f) We found in b) that if x 0.90 , Yˆ 0.8713. 0 1 s 2y s e2 0 n x 0 x 2 x 2 nx 2 0 1 x x 2 1 0.90 0.8656 2 1 s e2 0 1 0.46674 1 0.5195 9 n 0.590422 SS x s y0 0.5195 0.7208 . So Y0 Yˆ0 t 2 s y0 0.8713 2.365 0.7208 0.87 1.70 . g) From the previous page SSR 4.02289 , SST 7.29009 and SSE 3.2672 . H 0 is that there is no relation between Y and X . Source SS DF MS F F.05 Regression 4.0229 1 4.0229 Error (Within) Total 3.2672 7.2901 7 8 0.46676 8.619 F 1,7 5.59 s Since the table F is less than the computed F, reject H 0 . Appendix: Computation of column sums. Row i 1 2 3 4 5 6 7 8 9 Sum min y 3.05 2.01 1.02 0.66 0.57 0.41 0.36 0.29 0.28 8.65 charg x 0.34 0.85 0.73 0.88 1.00 0.84 0.81 1.38 0.96 7.79 C3 x2 0.1156 0.7225 0.5329 0.7744 1.0000 0.7056 0.6561 1.9044 0.9216 7.3331 C4 C5 xy y2 1.0370 1.7085 0.7446 0.5808 0.5700 0.3444 0.2916 0.4002 0.2688 5.9459 9.3025 4.0401 1.0404 0.4356 0.3249 0.1681 0.1296 0.0841 0.0784 15.6037 11 4/24/01 252y0132 5. The failure times in thousands of hours are given for a random sample of 7 components. 0.5 8.2 7.8 8.2 4.7 3.5 4.4 Minitab says that the sample mean is 5.33 and the sample standard deviation is 2.90 Use methods appropriate to testing goodness of fit. a. Test the hypothesis that these numbers came from a normal distribution. Use a 5% significance level. (5) b. Test the hypothesis that the above data came from a normal distribution with a mean of 4.5 and a standard deviation of 2 (5) c. A television set distributorship believes that television ownership in the local area is distributed according to a Poisson distribution with a mean of 4. A sample of 100 is taken. Is this true? Use a 5% significance level. (5) Number of TV sets: 0 1 2 3 4 5 6 or more. Number of Households: 2 30 30 18 10 2 8 Solution: a) H 0 : N ?, ? H 1 : Not Normal Because the mean and standard deviation are unknown, this is a Lilliefors problem. xx From the data we found that x 5.33 and s 2.90 . t . F t actually is computed from the Normal s table. For example F 1.67 Px 1.67 Pz 0 P1.67 z 0 .5 .4525 .0475 . x 0.5 3.5 4.4 4.7 7.8 8.2 8.2 t 1.67 0.63 0.32 0.22 0.85 0.99 0.99 MaxD .1611 F t .0475 .2643 .3745 .4129 .8023 .8389 .8389 Since the Critical O 1 1 1 1 1 1 1 O n 7 Value for .05 O is .300 , do not 0.1429 0.1429 0.1429 0.1429 0.1429 0.1429 0.1429 n reject H 0 . Fo 0.1429 0.2857 0.4286 0.5714 0.7143 0.8571 1.0000 D .0964 .0214 .0538 .1585 .0880 .0182 .1611 b) H 0 :N 4.5,2 H 1 : Not N 4.5,2 Because the population mean and standard deviation are known, this is a Kolmogorov-Smirnov problem. x z . x 0.5 z 2.00 F z .0228 O 1 O 0.1429 n Fo 0.1429 D .1201 3.5 4.4 4.7 7.8 8.2 8.2 0.50 .3050 1 0.05 .4801 1 0.10 .5398 1 1.65 .9505 1 1.85 .9678 1 1.85 .9678 1 0.1429 0.2857 .0193 0.1429 0.4286 .0515 0.1429 0.5714 .0316 0.1429 0.7143 .2362 0.1429 0.1429 0.8571 1.0000 .1107 .0322 MaxD .2362 Since the Critical O n 7 Value for .05 is .483 , do not reject H 0 . 12 4/24/01 252y0132 c) H 0 :Poisson4 H 1 : Not Poisson4 This can be done as a chi-square or Kolmogorov-Smirnov problem. f e and Fe come from the Poisson table. x 0 1 2 3 4 5 6+ O 2 30 30 18 10 2 8 100 O n .02 .30 .30 .18 .10 .02 .08 1.00 Fo fe .02 .32 .62 .80 .90 .92 1.00 .01832 .07326 .14652 .19537 .19537 .15629 .21487 1.00000 E 1.832 7.326 14.652 19.537 19.537 15.629 21.487 100.000 Fe D .01832 .09158 .23810 .43347 .62884 .78513 1.0000 .00168 .22842 .38190 .36653 .27116 .13487 0 For the Kolmogorov-Smirnov Method the 5% critical value is 136 . O 32 30 18 10 2 8 100 E 9.158 14.652 19.537 19.537 15.629 21.487 100.000 O2 E 111.8148 61.4251 16.5839 5.1185 0.2559 2.9785 198.1767 . This is less than the 0136 . 100 maximum value of D , which is .36653, so reject H 0 . For the Chi-Squared Method, we have had to merge two cells, because the first E was below 5. We thus have 6 - 1 = 5 degrees of freedom. The value of Chi-squared that we compute is 198.1767 - 100 = 98.1767. From the Chi-squared table .2055 11 .0705 . This is less than our computed 2 , so reject H 0 . 13