116101773 Page 1 of 20 SP801 Session 8 1. Review: t-Tests and Basic ANOVA / Discussion of Computing Assignment ANOVA 1 (Part 2) 2. Computational Issues in Oneway ANOVA 2.1 Computation from group means, standard deviations and n’s 2.2 Unequal n’s and Multiple Comparisons 2.3 The F-Test, Directional Hypotheses and Effect Sizes 3. Multi-Factor ANOVA 3.1 The Basic Model 3.2 Interpreting Interactions 3.3 Multiple Comparisons 3.4 Odds and Ends: Unequal n; “Types” of Sums of Squares; Fixed vs. Random Factors; Nested Designs 116101773 Page 2 of 20 1. Review: t-Tests and Basic ANOVA / Discussion of Computing Assignment ANOVA 1 (Part 2) After the first two sessions on experimental design and statistical methods of comparing means, you should be able to: (Experiment) define experimental research, understand and discuss its purpose and rationale differentiate random assignment from random sampling describe within-subjects and between-subjects designs understand and use basic experimental terminology (random vs. systematic error, control techniques, field and quasi experiments etc.). (t-Test) understand the rationale of two types of t-test as well as their underlying assumptions apply the appropriate t-test to problems of data analysis compute and report effect size measures related to t-tests (ANOVA) understand the rationale and purpose of one-factor ANOVA for both independent groups and repeated measures as well as their underlying assumptions use ANOVA terminology (effect, error, sum of squares etc.) apply the appropriate ANOVA procedures to problems of data analysis discuss the limitations of one-factor ANOVA (Multiple Comparisons) discuss the problem of multiple testing differentiate between per-comparison and familywise error rates explain and apply the concept of alpha adjustment, in particular the Bonferroni approach differentiate between a-priori and post-hoc comparisons compute these comparisons using SPSS formulate and identify orthogonal sets of contrasts 116101773 Page 3 of 20 2. Computational Issues in Oneway ANOVA 2.1 Computing a Oneway ANOVA from group means, standard deviations and n’s A Oneway ANOVA can be computed by hand from a table of group means, standard deviations and n’s per condition, even if the raw data are not known (e.g. from published data). Computational example: Group n Mean Std. Dev. ____________________________________________ 1 10 4.50 1.08 2 8 6.50 1.20 3 12 5.00 0.85 SSwithin is obtained by squaring each standard deviation, multiplying the result by (nj – 1) and adding up these three values: SS within (1.08 2 9) (1.20 2 7) (0.852 11) 28.525 SSbetween can be found by (a) first computing the grand mean, (b) then the squared deviations of each group mean from the grand mean, and (c) then adding these up, each squared deviation weighted by the appropriate nj: ( a ) Grand Mean ( b) 4.5 10 6.5 8 5 12 5.2333 10 8 12 ( M 1 Grand Mean ) 2 ( 4.5 5.2333) 2 0.5377 ( M 2 Grand Mean ) 2 (6.5 5.2333) 2 1.6045 ( M 3 Grand Mean ) 2 (5.0 5.2333) 2 0.0543 (c) SStreatment 0.5377 10 1.6045 8 0.0543 12 18.8667 116101773 Page 4 of 20 As dftreatment = 2 and dferror = 27, we find: F ( 2,27) 18.8667 2 8.929 28.525 27 The raw data on which this example is based can be found in the file v:\courses\sp801\oneway.sav. Running a Oneway ANOVA with this dataset will confirm that the F above is correct within rounding precision. 2.2 How do unequal n’s affect computations in Oneway ANOVA and Multiple Comparisons? Oneway ANOVA does not require equal sample sizes per condition. The computation of sums of squares and degrees of freedom is unaffected by inequality of sample sizes. However, the computation of Multiple Comparisons needs to take inequality of sample sizes into account. To compute apriori contrasts with unequal n’s, use the harmonic mean across samples (nh) to replace n in the equation for MScontrast: nh L2 MS contrast SScontrast a 2j Note that nh increasingly deviates from the arithmetic mean of n’s (and thus from n in the case of equal sample sizes), the greater the differences among the sample n’s are. For example, with three groups and overall N = 60, we obtain: nh = 20.00 if n1 = n2 = n3 = 20 nh = 18.00 if n1 = 15, n2 = 15, n3 = 30 nh = 7.14 if n1 = 5, n2 = 5, n3 = 50 Unequal n’s mean a loss of power. 116101773 Page 5 of 20 SPSS post-hoc tests use the harmonic mean as well when sample sizes are unequal. This can lead to a disparity in conclusions based on pairwise comparisons versus homogeneous subsets of means. Consider the following example using the dataset v:\courses\sp801\gssft.dat (a General Social Survey dataset containing only full-time employed respondents, see Norusis’ SPSS 8.0 Guide to Data Analysis, Chapter 14). These post-hoc tests are based on a Oneway ANOVA examining the number of hours worked per week (HRS1) as a function of education level (DEGREE). Table 1 shows the descriptive statistics for five levels of DEGREE. Note the greatly unequal sample sizes: e H d e M o p t w d p o o x e N i i u u a i m E a m 0 2 9 2 1 6 2 0 0 1 7 7 8 4 2 3 8 0 2 4 7 6 9 9 5 5 0 3 2 8 9 1 8 8 5 9 4 6 7 4 3 2 2 4 9 T 1 9 7 1 8 0 5 9 Table 2 shows pairwise post-hoc tests using the Tukey procedure. This table indicates that people with a graduate degree worked more hours than (a) people with less than a high school degree and (b) people with a high school degree. Table 3 shows homogeneous subsets, also computed using the Tukey procedure. Note that only two groups are different when the overall harmonic mean of n’s is used. 116101773 Page 6 of 20 p e l e D u T e d i a e M p w e p r o e U L ) g u ) J E u J i I ( o ( o I . S ( B d B 1 0 3 9 7 3 8 4 5 1 5 0 2 5 1 5 4 8 7 1 5 7 1 3 7 6 7 3 9 1 5 5 8 6 4 * 1 4 7 6 8 2 9 0 6 5 0 1 9 3 7 3 8 5 4 1 5 0 2 4 3 0 5 2 3 5 0 2 0 3 5 7 8 7 1 2 4 7 4 6 4 * 6 3 7 4 9 8 1 0 3 4 0 2 1 5 5 4 8 1 7 5 7 1 1 3 4 0 5 2 5 3 0 2 0 3 8 1 8 8 1 2 3 9 5 5 4 0 0 7 3 0 9 7 5 4 4 0 3 6 7 7 3 9 5 1 5 8 6 1 7 5 8 7 1 4 2 7 4 6 2 1 8 8 8 1 3 2 9 5 5 4 9 6 0 3 8 1 9 7 9 8 0 4 * 4 1 7 6 8 9 2 0 6 5 1 * 3 6 7 4 9 1 8 0 3 4 2 0 0 7 3 0 7 9 5 4 4 3 6 9 0 3 8 9 1 7 9 8 . * h T o u a T s e a = N 1 2 D 0 2 9 1 7 7 7 2 4 7 7 3 2 8 8 4 6 7 S 2 4 M a U b T t h g 116101773 Page 7 of 20 2.3 The F-Test, Directional Hypotheses and Effect Sizes F-distributions are defined for values from 0 to positive infinity, and larger F’s generally indicate a larger effect. Thus, only F values in the right-hand tail of the F-distribution lead to rejection of H0. Does this mean that F-tests are generally onetailed? No! In fact, directional hypotheses can only be tested via the Fdistribution when numerator df = 1, i.e. when only two groups are compared. Recall that this test is equivalent to a t-test, with t2 = F. Thus, the squared values of those t’s found in the leftmost and rightmost 2.5% of a t-distribution, respectively, correspond to the F’s found in the rightmost 5% of an Fdistribution with 1 numerator df. To do a one-tailed test based on F, one would simply look up the critical F-value in a table at p = 2. Or, when interpreting computer output, one would divide the reported 2-tailed probability by 2. Any F-test with numerator df > 1 is necessarily non-directional, as differences among more than 2 means are involved, and a significant result simply indicates that some non-zero difference among the means exists. A common effect size measure associated with F is , which can be defined as: SStreatment SStreatment SSerror F df treatment F df treatment df error or . 116101773 Page 8 of 20 When dftreatment = 1, r , i.e. the product-moment or point-biserial correlation coefficient between the grouping variable and the dependent variable. Just as r2, 2 is a measure of DV variance accounted for by the IV; it is more general than r, however, as it can be used to express variance accounted for by effects of a nominal IV (or by interactions of nominal IV’s). SPSS optionally provides 2 as part of the ANOVA table in procedure GLM (“Options” – “Effect Size”) but not as part of procedure ONEWAY. 3. Multifactor ANOVA 3.1 The Basic Model A multifactor ANOVA considers the effects of two or more nominal IV’s and their interactions on a DV. The levels of these IV’s (or “factors”) can represent independent groups, repeated measurements, or a mixture of both. As a generic case, consider a two-factor ANOVA applied to a randomised experiment with two IV’s whose levels are crossed (also called an orthogonal design): Communicator Expertise high low Argument Quality strong mixed weak 116101773 Page 9 of 20 Rationale: The multi-factor approach has various advantages: allows to study the generality of effects (e.g., do different types of feedback have similar effects on performance for males and females, or across various age levels?); allows examination of interaction effects; allows testing of complex theories that specify distinctive combinations of independent variables (e.g. in the above example, Chaiken’s HSM would predict that high [versus low] expertise leads to clearly more positive attitudes in the mixed arguments condition, does not have much of an effect in the strong arguments condition, and leads to less positive attitudes in the weak arguments condition); increases efficiency – several factors can be examined in a single study. Some Terminology Main Effect: Comparison of the means from the various levels of one particular factor ignoring (or collapsing over) the other factors in the research design. Main Effect Marginal Means: Means for the various levels of a particular factor, ignoring (or collapsing over) other factors in the design. A main effect can be thought of in terms of whether or not the main effect marginal means differ from one another statistically. Interaction Effect: Pattern of cell means indicating that the effects of one factor differ as a function of levels of another factor (or combination of factors). Depending on how many factors are involved, we speak of two-way, three-way etc. interactions. 116101773 Page 10 of 20 Interaction Effect Marginal Means: Means for the various combinations of levels of the particular factors comprising an interaction effect, ignoring (or collapsing over) other factors in the design. Simple Main Effects: Main effects examined within a specific level of one of the other factors (or specific factorial combinations of levels of factors in a more complex research design). Simple Interaction Effects: Interaction effects examined within a specific level of one of the other factors (or within specific factorial combinations of levels of factors in a more complex research design). Computational Example: In a persuasion experiment, students read a message that ostensibly came from a communicator of high or low expertise (Factor 1); the message contained strong, mixed or weak arguments (Factor 2). Table entries are DV scores of participants on a measure of post-message attitudes that can vary between 1 and 7. For simplicity, only 2 observations per condition are included. Communicator Expertise high low Marginal Means Argument Quality Marginal Means strong mixed weak 7 6 1 8 7 3 7 3 2 6 4 4 4.33 7.00 5.00 2.50 4.83 5.33 116101773 Page 11 of 20 The ANOVA model assumes that each score (yijk) is composed of the grand mean (), a main effect for the column factor (j), a main effect for the row factor (k), an interaction effect (jk) and an error term (eijk): yijk j k jk eijk Accordingly, the cell means can be thought of as representing the grand mean plus a column effect plus a row effect plus an interaction effect: jk j k jk The column and row effects can be estimated from the difference between the appropriate marginal mean and the sample grand mean: j M j M k M k M The interaction effect can be estimated indirectly: M jk M ( M j M ) ( M k M ) jk jk M jk M M j M k In our example, we find the following interaction effects jk: Table of Interaction Effect strong mixed weak Residual Marginal Means high 0 +1 -1 0 low 0 -1 +1 0 Resid. Marg. Means 0 0 0 0 Communicator Expertise Argument Quality 116101773 Page 12 of 20 The information in this table can be read as: The mean for high expertise / mixed arguments is 1 scale unit higher than one would expect if only two main effects were present; the mean for low expertise / mixed arguments is 1 scale unit lower than one would expect if only two main effects were present ... etc. Plot of observed means based on full model: Estimated Marginal Means of ATTITUDE 8 7 6 5 4 COMM_EXP 3 1.00 2 1 1.00 2.00 2.00 3.00 ARGU Plot of estimated means based on main effects model: Estimated Marginal Means of ATTITUDE 8 7 6 5 4 COMM_EXP 3 1.00 2 1 1.00 ARGU 2.00 2.00 3.00 116101773 Page 13 of 20 To test for significance, mean squares and F-tests are computed. The total sum of squares consists of four components: SStotal SScolumns SSrows SS IA SSerror Computation of these components proceeds as in Oneway ANOVA, with the exception of SSIA, which is determined indirectly: SS IA SStotal SScolumns SSrows SSerror Mean squares are obtained by dividing the sums of sqares for each effect by its associated df. For and C = number of levels of column factor R = number of levels of row factor, df columns C -1 df rows R -1 df IA (R - 1) (C - 1) df error N - (R C) df total N -1 F-tests are computed as usual, dividing the MS of each effect by MSerror. 116101773 Page 14 of 20 An SPSS analysis (GLM General Factorial; defaults plus descriptive statistics and estimates of effect size) of our example data yields the following output: e D t d e i N C a A a 1 1 0 1 2 2 0 1 2 3 0 2 2 T 3 5 6 2 1 0 1 2 2 0 1 2 3 0 2 2 T 3 9 6 T 1 0 5 4 2 0 7 4 3 0 0 4 T 3 6 2 n - D p e E m e t a S u u d u F S a i a f g a a C 7 5 3 3 7 6 I n 3 1 3 3 0 9 C 0 1 0 0 4 3 A 7 2 3 3 2 1 C 0 2 0 0 9 1 E 0 6 0 T 0 2 C 7 1 a R [Discuss the meaning of each entry in the ANOVA table.] 116101773 Page 15 of 20 3.2 Interpreting Interactions The presence of an interaction effect in a two-way ANOVA means that the size of the effects of one factor on the DV vary with the levels of the other factor. In our example, communicator expertise seems to have a stronger positive effect on attitudes (i.e. higher expertise produces more positive attitudes) in the mixed-arguments condition than in the two other argument conditions. (In fact, in the weak argument condition, the effect of expertise seems to be negative). In graphical representations of the condition means, where levels of one factor define the partitioning of the x-axis and levels of the other factor define separate lines (see example above), non-parallel lines indicate an interaction effect. If the lines cross, the interaction is called disordinal, if they do not cross, it is called ordinal. [NB: Of course, the lines in such a display rarely show a perfectly parallel pattern. Deviations from parallel may be due to chance. Therefore, the graphical display should only be interpreted in combination with the F-test of the interaction.] Higher-order interactions The logic of the two-factor model can easily be generalized to designs with more than two factors. With three factors, we can test the three main effects, three interactions among two factors, and one interaction involving all three factors. [Give examples for higher-order interactions.] In general, when the number of factors is k, the number of Ftests in the ANOVA is (2k – 1). The problem of multiple testing applies to the F-tests in an ANOVA as it does to multiple t-tests! So if you run a three-factor ANOVA, the chance probability that at least one of the seven F-tests will yield a “significant” result at .05 is a little over .30 ! 116101773 Page 16 of 20 In deciding if alpha adjustment is necessary, the same considerations as with multiple t-tests apply. What the multi-factor ANOVA does not tell us Even though a multi-factor ANOVA yields more information than a oneway ANOVA performed on the same data, any F-test with numerator df > 1 is still ambiguous as to which means exactly differ from each other. Even an analysis whose effects all have numerator df = 1 does not always tell us what we want to know. Consider a clinical three-factor design with the IV’s medication (placebo vs. drug), psychotherapy (not given vs. given) and accomodation (inpatient vs. out-patient). Hypothesis: The combination of drug treatment and psychotherapy and in-patient accomodation leads to a reduction in symptoms compared to the other seven conditions. Results are shown in the Table below. Table entries are means of a symptom change index (range -1 to +1; positive values = symptom reduction). Table of Means: In-patient No Psychotherapy Psychotherapy Out-patient Drug Placebo Drug Placebo 0 0 0 0 +0.8 0 0 0 116101773 Page 17 of 20 So the pattern of means is exactly as predicted. However, assuming n = 10 per condition and MSerror = 0.8, a standard three-factor ANOVA yields: Effect Accomodation Psychotherapy Medication AxP AxM PxM AxPxM Error SS 0.8 0.8 0.8 0.8 0.8 0.8 0.8 57.6 df 1 1 1 1 1 1 1 72 MS 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 F 1.0 1.0 1.0 1.0 1.0 1.0 1.0 p .32 .32 .32 .32 .32 .32 .32 None of these seven tests, although focused, really tests our hypothesis! In this case, one a-priori contrast, pinpointing the critical cell against the seven others, would have yielded: MScontrast = SScontrast = 5.6 Fcontrast(1,72) = 7.0, p .01 3.3 Multiple Comparisons As the above example shows, multiple comparisons are as useful in multifactor ANOVA as they are in oneway analyses. But within multi-factor designs, SPSS offers a-priori and posthoc comparisons only with respect to one factor at a time. Thus, we might compare levels of the argument quality factor in our initial example as part of a GLM General Factorial analysis. However, a comparison of, say, the strong arguments / expert communicator condition with the remaining five cells would not be easily accomplished. (Similarly, the contrast relevant to the above clinical example would not be possible). 116101773 Page 18 of 20 To run contrasts across factors anyway, the easiest way is to create a variable that represents all the combinations of factor levels and to run a Oneway ANOVA to test the contrasts one is interested in, using this new variable as the IV (see computing assignment ANOVA 2 for details and an example). 3.4 Odds and Ends: Unequal n; “Types” of Sums of Squares; Fixed vs. Random Factors; Nested Designs Unequal n So far, we discussed equal-n designs. In the case of unequal n’s, the factors in a multi-factor design are not orthogonal any more, and there are various possibilities of computing the sums of squares (just as there are various ways in MRA of treating variability that is commonly explained by correlated predictors). Types of sums of squares In SPSS, three basic types of sum-of-squares computation are available: Type 3, the default, corresponds to a standard regression model with effects-coding of IV’s. Each effect is adjusted for all other effects in the model. The sums of squares of all effects plus error do not add up to the “corrected total” sum of squares if n’s are unequal. Type 2 adjusts higher-order interactions for all lower-order effects, but is invariant to ordering of effects within the same level of interaction (comparable to a hierarchical regression with successive entry of sets of IV’s: main effects first, then two-way IA’s, then three-way IA’s etc.) Type 1 adjusts each effect for all effects that were entered previously (comparable to a hierarchical regression with entry of one IV at a time). 116101773 Page 19 of 20 With equal sample sizes per condition, the three approaches are identical. To avoid any ambiguity in interpreting ANOVA effects, try to always aim for equal n’s per condition when designing a study and do not change the default setting sum-ofsquares type in SPSS. Fixed versus Random Factors There are two ways to conceptualise factors in an ANOVA: Fixed factors are IV’s whose levels have been deliberately selected by the researcher to operationalise a hypothetical variable, or IV’s whose levels exhaust all naturally occurring levels (e.g. male vs. female). Random factors are IV’s whose levels have been randomly drawn to represent a larger population of levels (e.g. 5 randomly drawn presentation orders of 10 items in a questionnaire, to control effects of order in general). Fixed factors are the default in SPSS, and again, leave it at that as long as your design is a standard experimental or quasiexperimental one. The major difference in the two approaches is that different error terms are used for the factors designated as random, which are more conservative. So in the above example, controlling for order effects (which are usually unwanted) would be even more rigid if the orders used were designated as fixed factors. Nested Designs In a nested design, factors are not completely crossed with each other. I.e., not every level of factor A is paired with every level of factor B to form the experimental conditions. An example would be a clinical study in which each ten therapists (T1 to T10) treat patients using two methods (M1 and M2), each method being practiced by five therapists. 116101773 Page 20 of 20 Design: Methods T1 T2 M1 M2 Therapists Therapists T3 T4 T5 T6 T7 T8 T9 T10 Patients In this design, the Therapist factor is nested under Methods. This means that an interaction of Therapist x Method cannot be tested. Effects of Therapist can only be tested within Method, and Method effects would be tested against a “patients within therapists” error term. Therapists should be thought of as a random factor if method effects are of interest. Note that additional factors may be crossed with both Methods and Therapists (e.g. patient’s sex: each therapist may treat equal numbers of women and men). SPSS offers the possibility of testing nested designs, but this application is beyond the scope of this course.