Notes Set #4 Stat402B (Spring 2016) Last update: January 27, 2016 Stat 402B (Spring 2016): Notes Set #4 Comparisons or Contrasts • The difference µp − µq is just one of many possible comparisons among the means. The important comparisons may not be of the form µp − µq . • Suppose there are a treatments and each is replicated n times. It is possible to subdivide the treatment sum of squares from the analysis of variance into sums of squares each of one degree of freedom which can be used to test a particular hypothesis about the means µ1, µ2, . . . , µa. • For example the sum of squares needed to test the hypothesis H0 : µ1 − 21 (µ2 + µ3) = 0 vs Ha : µ1 − 12 (µ2 + µ3) 6= 0 will have 1 d.f. • A linear combination of the µ’s of the type µ1 − 12 µ2 − 21 µ3 is called a comparison or a contrast of the means. 1 Stat 402B (Spring 2016): Notes Set #4 Comparisons or Contrasts(Cont’d 1) Pa • If c1, c2, . . . , ca are constants s.t. i=1 ci = 0 then Γ = called a contrast or a comparison in the µi’s. • Want to test the hypotheses: H0 : a X ciµi = 0 vs. H1 : i=1 • a X Pa i=1 ci µi is ciµi 6= 0 i=1 The contrast or a comparison in the sample means ȳi.’s is C= a X ciȳi. i=1 and is the estimate of the contrast Γ = Pa i=1 ci µi 2 Stat 402B (Spring 2016): Notes Set #4 • The variance of C is • By replacing σ 2 by its estimate we get a t-statistic a σ2 X 2 · ci V (C) = n i=1 Pa i=1 ci ȳi. t0 = q Pa 2 i=1 ci sE n Reject the null hypothesis above if t0 exceeds tα/2,N −a (or calculate a p-value). • We could also use an F statistic instead. A single degree of freedom sum of squares for testing the above hypothesis is SSc = ( Pa ci ȳi. )2 i=1 1 Pa 2 i=1 ci ) n 3 Stat 402B (Spring 2016): Notes Set #4 • This gives the F-statistic F0 = M Sc SSc/1 = M SE M SE which turns out to be computationally equal to t20 • For a significance level of α, the critical point Pa is the upper 100α point of the F (1, N − a) distribution where N = i=1 ni. Thus, we reject H0 if F0 > Fα,1,N −a • If the sample sizes were unequal, i.e, each of the a treatments were replicated ni times, respectively, the the t statistic and SSc are modified as follows: t0 = Pa i=1 ci ȳi. r Pa c2i sE · i=1 n i SSc = ( Pa i=1 ci ȳi. ) Pa c2i i=1 ni 2 4 Stat 402B (Spring 2016): Notes Set #4 • • • • • • Orthogonal Contrasts The contrasts usually chosen to be tested are those that are of interest to the experimenter or those suggested by the treatment structure. Such contrasts must be determined before the experiment design begins, and thus called pre-planned comparisons. In an experiment with equal sample sizes, it is possible to find a set of comparisons such that the sums of squares due to each of one degree of freedom form a subdivision of the SST rt Pa Pa If c1,P . . . , ca and d1, . . . , dP s.t. i=1 ci = 0, i=1 di = 0, a are constantsP a a a and i=1 cidi = 0, then i=1 ciµi and i=1 diµi are called orthogonal contrasts in the µi’s. The corresponding contrasts of the sample means are statistically independent of each other when the sample sizes are equal i.e. n1 = · · · = na . In that case, their contrast sum of squares form a complete partitioning of the treatment sum of squares SST rt i.e. SST rt = C1 + C2 + . . . + Ca−1 5 Stat 402B (Spring 2016): Notes Set #4 Plasma Etching Example (continued) ANOVA Table incorporating contrasts (Table 3.11 in the text) Source of Variation d.f. SS MS F p − value RF Power 3 66, 870.55 22, 290.18 66.8 < .0001 Orthogonal Contrasts C1 : µ 1 = µ 2 1 (3276.10) 3276.10 9.82 < 0.01 C2 : µ 1 + µ 2 = µ 3 + µ 4 1 (46, 948.05) 46, 948.05 140.69 < .001 C3 : µ 3 = µ 4 1 (16, 646.40) 16, 646.40 49.88 < .001 Error 16 5339.20 333.70 Total 19 72, 209.75 Note: In actual situations, the contrast are selected at the planning stage so that they lead to meeaningful conclusions about the treatment means or effects. Thus they are very much related to the structure of the factor levels. Ask the question whether the contrasts here are meaningful in this experiment? 6 Stat 402B (Spring 2016): Notes Set #4 Example (Cont’d) Compute the values of the contrasts and the sums of squares as follows: . Ci C= Pa i=1 ci ȳi. C 1 +1(551.2) − 1(587.4) −36.2 2 +1(551.2) + 1(587.4) − 1(625.4) − 1(707.0) −193.8 3: +1(625.4) − 1(707.0) −81.6 SSCi (−36.2)2 = 3276.10 (2/5) (−193.8)2 = 46, 948.05 (4/5) (−81.6)2 = 16, 646.40 (2/5) These Contrast sums of squares completely partition the treatment sum of squares. The F-tests on the contrasts are usually incorporated in the analysis of variance as above. We see that SST rt = 66, 870.55 = 3276.10 + 46, 948.05 + 16, 646.40 since the 3 contrasts considered are orthogonal to each other and thus partitions the treatment sum of squares to 3 single degree of freedom sums of squares. 7 Stat 402B (Spring 2016): Notes Set #4 Diagnostic Plots of Residuals 1. Probability Plot To determine possible deviations from normality of the error distribution. Also helps to locate possible outliers 2. Residuals vs.Time To reveal possible variation of the experimental techniques that occur as the experiment proceeds. May display more or less variability (as time goes on) in the data. 3. Residuals vs.Predicted Values (fitted values) may help show whether the absolute values of residuals increase (or decrease) as the size of the response increases, indicating that the model is suspect. Ordinarily, if the model is correct, the residuals should not be related to the size of the response. 4. Residuals vs Extraneous Variables of Interest • increase basic knowledge about the subject • suggest variables that must be controlled • lead to consider these variables as new factors in the experiment. 8 Stat 402B (Spring 2016): Notes Set #4 Choice of Sample Size for Oneway Classification To simplify matters consider the equal sample size case i.e., n1 = n2 = · · · = na = n Prob. of Type II error = β = P(fail to reject H0|H0 is false) = 1 − P(reject H0|H0 is false) = 1 − P (F0 > Fα,a−1,N −a|H0 is false) The relevant OC curves which are in table V of the Appendix, plot β vs. a parameter φ where Pa Pa 2 n i=1 τi n i=1(µi − µ̄)2 2 φ = = 2 aσ aσ 2 Separate curves available for α = .05 and α = .01 and a range of values of ν1 = a − 1, ν2 = N − a. (Note: We want to use OC curve to determine n for a specified power to reject a specified Ha at a chosen α. σ 2 is known). 9 Stat 402B (Spring 2016): Notes Set #4 How to use OC Curves See Example 3.10 in the text for the plasma etching experiment example. A practical approach is to specify the problem as finding the sample size needed to reject the null hypothesis if any pair of treatment means differ by at least D units. It can be shown that the minimum value of φ2 for any configuration of µ’s that satisfy this condition (see p. 107 of the text) is: 2 nD φ2 = 2aσ 2 Since the power increases (or β decreases) as φ2 increases, this value gives us a way to find the minimum n that provides a test that meets the specified power. Here we give a simple example where we are using a = 4 treatments, σ = 2 is known from past data, and the experimenter plans to use α = .05. 10 Stat 402B (Spring 2016): Notes Set #4 Suppose that the experimenter wants to be able to reject the null hypothesis if any pair of treatment means differ by at least 5 units. The minimum value of φ2 under these conditions is 2 n(5) φ2 = = 0.78125n 2(4)(2)2 From the OC curve for α = .05 and ν1 = 3, using ν2 = 4(n − 1) and φ2 = 0.78125n we can construct the following table for different guesses of n: n φ2 φ ν2 = 4(n − 1) β Power 5 3.90 1.98 16 .18 .82 6 4.69 2.17 20 .11 .89 7 5.47 2.34 24 .07 .93 11