Research Skills for Psychology Majors: Everything You Need to Know to Get Started Analysis of Variance: General Concepts This chapter is designed to present the most basic ideas in analysis of variance in a non-statistical manner. Its intent is to communicate the general idea of the analysis and provide enough information to begin to read research result sections that report ANOVA analyses. Analysis of Variance is a general-purpose statistical procedure that is used to analize a wide range of research designs and to investigate many complex problems. In this chapter we will only discuss the original, basic use of ANOVA: analysis of experiments that include more than two groups. When ANOVA is used in this simple sense, it follows directly from a still simpler procedure, the t-test. The t-test compares two groups, either in a between-subjects design (different subjects in the groups) or a repeated-measures design (same subjects assessed twice). ANOVA can be thought of as an extension of the t-test to situations in which there are more than two groups (one-way design) or where there is more than one independent variable (factorial design). These situations are the most common in research, so ANOVA is used far more frequently than t-tests. Variance is Analyzed The name “analysis of variance” is more representative of what the analysis is about than “t-test” because we are in fact focusing on analyzing variances. The conceptual model for ANOVA follows the familiar pattern first introduced in the Inferential Statistics chapter: a ratio is formed between the differences in the means of the groups and the error variance. In the same way that a variance (or standard deviation) can be calculated from a set of data, a variance can be calculated from a set of means. So the differences among the means is thought of as their variance: higher variance among the means indicates that there are more differences (which is good, right?). The variance among the group means is called the between-groups variance. The ratio, then, is between-groups variance divided by error variance. A larger ratio indicates that the differences between the groups are greater than the error or “noise” going on inside the groups. If this ratio, the F statistic, is large enough given the size of the sample, we can reject the null hypothesis. The whole story in ANOVA is figuring out how to calculate (and understand) these two types of variance. ©2003 W. K. Gabrenya Jr. Version: 1.0 A Deeper Truth Actually, the t-test is a special case of ANOVA. ANOVA is the real thing. A Still Deeper Truth Actually, ANOVA is a simplification of very complex correlations. Correlation is the real thing. Page 2 A Visual Example Here is an example of a one-way, between-groups design that would be analyzed using ANOVA. Four groups of participants are randomly sampled from four majors on campus. We will not identify the majors for the sake of interdepartmental harmony, but the identity of Group 4 is clear. Each sample includes five students. They are each administered the Wechsler Adult Intelligence Scale (WAIS-III) to obtain a measure if IQ. IQs have a mean of 100 in the population as a whole. Our question: which major is smarter? The following table presents the raw data (IQ scores), the means within each group, the standard deviation within each group, and the variance. The variance is simply the SD squared, a more useful number for certain aspects of the calcultions. It is normal that some of the SDs are larger than others. The gray bars below the scale represent the range of the IQs in each major, which is one indication of the within-group variability. (A wider range often produces a higher SD.) In the last column, the mean of the means (grand mean), the standard deviation of the means, and the variance of the means are presented. Data: Group 1 Group 2 80, 85, 90, 90, 93, 96, 95, 100 99, 102 Group 3 Group 4 Grand Mean 97, 100, 103, 105, 110, 115, (the group means could go here) 106, 109 120, 125 Mean: Std. Dev.: Variance: 90.0 7.9 62.5 103.0 4.7 22.5 96.0 4.7 22.5 115.0 7.9 62.5 101.0 Grand Mean 80 1 90 1 1 100 Gp 1 1 110 120 125 Group 4 1 Group 2 Group 3 What’s the null hypothesis? The null condition is that there is no difference between the population means: H0: µ1 = µ2 = µ3 = µ4, where µ is “mu,” the population mean. Our task is to determine if the sample means, presented in the table above, are sufficiently different from each other compared to the error variance within the groups, to reject the null hypothesis. Of course the sample size will also affect the outcome because larger samples allow for better tests of the null hypothesis. In the language of ANOVA, we will look at the ratio of the between-group variance to the within-group (error) variance. F= Between-groups variance Error Variance Within Groups Page 3 In the example, we have included the individual data for group 1 as circles inside the group 1 gray bar. The SD of group 1, 7.9, is computed from these 5 values. Recall that the SD is the variability of the individual data based on how distant each one is from the group mean (90.0). In other words, it is a measure of the extent to which the five students sampled for that major are not exactly of the same intelligence. The student in group 2 are more similar to each other and produce a SD of 4.7. The overall error variance for the sample is computed by combining these four SDs (see sidebar). Calculating the Variances Within-Groups (Error) Variance: The overall amount of error variance is the combined variances of the four groups. Combining the variances from several groups together is called pooling, so the resulting combined variance is termed the pooled variance. Averaging the variances in this study produces a pooled error variance of 6.522 =42.5. Between-Groups Variance: Calculation of the between-groups variance is not as intuitive as the whithin-groups variance. Conceptually, it seems that the SD of the four group means would be a good measure. (The SD of the means is 10.7.) However, the actual between-groups SD is 24.0, so the between-groups variance is 242 = 577. The between-groups variability is computed in the same way, but we look at how much the group means vary from the grand mean (the mean of the means). The higher this variability, the more the means differ from each other and the more the null hyothesis looks “rejectable.” (See sidebar.) Finally, the ANOVA The ANOVA focuses on the ratio of the between group variance to the withingroups variance. SPSS produces an ANOVA source table to report the result of the analysis.This table is called a source table because it identifies the sources of variability in the data. As explained above, there are two kinds of variability, variability between group means, and variability within groups (error variance). The source table provides information about these two sources. The column numbers have been added for our use. Column 3, reflecting the number of groups and the sample size, is discussed in the sidebar. Column 4 presents the variance associated with the mean differences (between groups) and within-group error. These numbers are discussed in the Variances sidebar. Column 5 is the ratio of these two values, the F statistic. Column 6 presents the p-value (see Inferential Statistics chapter) of the F statistic based on the sample size. Because our normal criterion for rejecting the null hypothesis is p<.05, this p value is very good (good = low), and ANOVA Source Table 1 2 3 4 5 6 Sum of Squares df Mean Square F Sig. Between 1730.000 Groups 3 576.667 13.569 .0001 680.000 16 42.500 Total 2410.000 19 Within Groups Degrees of Freedom in ANOVA All statistics, such as F, t, and chi-square, are evaluated in the context of the sample size: larger samples allow lower statistical values to reach the magic .05 level of confidence. The sample size is expressed in terms of degrees of freedom (df). Your statistics class has more to say about df. In a t-test, the df is the sample size minus 2 (N-2). In ANOVA, we use two df values. The df-error is based on the sample size: dfe = ∑(ng-1), where ng is the size of each of the group samples 16 = (5-1) + (5-1) + (5-1) + (5-1) ANOVA also requires a df for the number of groups: dfbg = g-1, where g is the number of groups The F statistic is always presented along with these df values, e.g., F(3,19)=13.6, p<.0001. Page 4 we can reject the null hypothesis. What has been rejected? By rejecting the null hypothesis, we conclude that the four means are not equal in the population, that is, all majors are not created equal. However, what it does not tell us is exactly which major is smarter than which other major. Is Group 4 smarter than Group 2, or just smarter than the hapless Group 1? Just eyeballing the means is not good enough: we need to know if the differences between particular pairs of means are really significantly different. How is this done? One way is to perform t-tests between pairs of means (there are several other ways as well). Using SPSS to Calculate One-Way ANOVA A one-way ANOVA is an analysis in which there is only one independent variable, as the preceding example. This is the simplest kind of ANOVA, and SPSS dedicates a procedure purely to it. (See menu screen illustration.) The dialog window in which the details of the analysis are entered is quite simple. In the dialog illustration, the dependent variable (IQ) and the independent variable (group) have been entered. In the Options... dialog you can ask for descriptive statistics and a rather sorry looking graph of the means. Syntax: ONEWAY iq BY group /STATISTICS DESCRIPTIVES /PLOT MEANS /MISSING ANALYSIS . The principal output of the procedure is the source table shown above. In a paper, the approprate way to report the results of an ANOVA is a variation of: A one-way between-groups ANOVA revealed a significant effect of major, F(3,16) = 13.6, p < .05. Note that the ANOVA used two types of degrees of freedom: the between-groups df and the error df. Page 5 Factorial ANOVA If indeed “the truth lies in the interactions,” then we need to perform more complicated studies that include more than one IV. Factorial designs of this kind were introduced in the research designs chapter. For example, in the study presented above, we might want to know if gender is related to IQ. The obvious design would be a 4x2 between-subjects factorial: four majors crossed with gender. In the table below, the 40 students are indicated by S1...S40 in the 8 cells of the factorial design. Male Female Group 1 S1, S2, S3, S4, S5 S6, S7, S8, S9, S10 Group 2 S11, S12, S13, S14, S15 S16, S17, S18, S19, S20 Group 3 S21, S22, S23, S24, S25 S26, S27, S28, S29, S30 Group 4 S31, S32, S33, S34, S35 S36, S37, S38, S39, S40 The mathematics of a factorial ANOVA are more complicated than those of the one-way ANOVA, but the principles are the same. The ANOVA compares the variability due to between-groups differences to the amount of error variance in the sample. However, in this two-way factorial, we need to look at three types of between-groups variability: the variability between the majors, the variability between the genders, and the interaction effect variability. A ratio (F statistic) of between-subjects variability to error variance is calculated for each of these three types of between-groups variability. How many null hypotheses are there? SPSS and Factorial ANOVA The simple one-way ANOVA procedure cannot be used. Instead, factorial ANOVAs are produced by the SPSS GLM procedure. GLM mean “general linear model.” You will study the GLM in your second year of graduate-level statistics. GLM is a very powerful and flexible procedure that was only introduced to SPSS in the 1980s. Because it is powerful and flexible, it can be configured in many ways and has a large number of options. Univariate refers to the fact that you will be analyzing one dependent variable at a time. The IQ across majors study presented previously was enhanced by adding gender as a second independent variable to serve as an example of a factorial ANOVA. The analysis dialog What Other Goodies are in this Menu? box shown here has been configured to run this 4x2 Multivariate ANOVA (MANOVA) allows you to analyze ANOVA. several DVs simultaneously, in a single set. Use the Fixed Factors box for the IVs. Ignore the boxes below that until you get to graduate school. You can specify in detail which means tables you would like to see display in the output by clicking on Options. Repeated Measures ANOVA analyzes the repeated measures designs introduced in the research designs chapter. Page 6 Double-clicking on the items in the left-side box moves them to the right side ‘Display Means for’ box. In this case, moving ‘group’ to the Display Means box produces a means table that includes just the main effect of group. The ‘group*gender’ item displays a 4x2 table of means from which you can see if there is an interaction effect. Syntax: UNIANOVA iq BY group gender /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /EMMEANS = TABLES(group*gender) /CRITERIA = ALPHA(.05) /DESIGN = group gender group*gender . The Output The source table in a factorial ANOVA expands on that of the one-way ANOVA. Two additional sources are reported: the second IV, and the interaction effect. (See Tests of Between-Subjects Effects table.) The only rows of importance in this source table are those indicating the effects in the factorial model: GROUP, GENDER, and GROUP*GENDER. The F statistics in this type of source table are calculated by dividing a factor’s Mean Square by the Mean Square of the Error row. “Mean Square” is another way of saying “variance.” Hence, F for the Group factor is: MSgroup/MSerror = 477.2/14.167 = 33.685. These results show that the Group and Gender main effects are significant at a very low p value. SPSS will not print all of the significant digits of a very small p value. For Group, the actual p value is .000039, but no one cares because it is so far below .05. The Group X Gender interaction is not significant because the p value is so large (p = .567). In a paper, there are several forms for reporting the results of a factorial ANOVA: A 4 (major) x 2 (gender) between-groups ANOVA revealed Page 7 significant main effects of major, F(3,12) = 33.7, p < .05, and gender, F(1, 12) = 33.9, p < .05. The interaction effect did not approach significance, F < 1. or, if the interaction had been stronger: A 4 (major) X 2 (gender) between-groups ANOVA revealed significant main effects of major, F(3,12) = 33.7, p < .05, and gender, F(1, 12) = 33.9, p < .05. However, these main effects must be interpreted within the significant Major X Gender interaction, F(3,12) = 8.5, p < .05. Note that the ANOVA used two types of degrees of freedom: the between-groups df and the error df. Tests of Between-Subjects Effects Dependent Variable: IQ Source Type III Sum of Squares IQ = ƒ (major, gender) The Corrected Model essentially combines all the predictors of IQ (Group, Gender, and their interaction) to see if, as a whole, they predict the dependent variable. (Hint: add the df.) Of course, we usually don’t care about the whole model, but rather only about its component parts, the individual IVs. The ‘Intercept’ row in the table is Mean Square F Sig. Corrected Model 2240.000 7 320.000 22.588 .000 Intercept 195859.200 1 195859.200 13825.355 .000 GROUP 1431.600 3 477.200 33.685 .000 GENDER 480.000 1 480.000 33.882 .000 GROUP * GENDER 30.000 3 10.000 .706 .567 Error 170.000 12 14.167 Total 206430.000 20 Corrected 2410.000 19 Total R Squared = .929 (Adjusted R Squared = .888) Reprise of “Still Deeper Truth” The intercept reveals a clue to the ridiculous conspiracy theory that ANOVA is just a lot of correlations. Do you remember the equation for a line from algebra? In statistics we call this a regression line, and write the equation as Digging Deeper Overall, do major and gender help us know what students’ IQs are? Said another way, do major and gender predict IQ? The ‘Corrected Model’ row in the source table answers this general question: yes. The idea of a model was introduced in an early chapter. Here, the model is expressed mathematically: df y = a + bx + e y is the dependent variable, IQ x is sort of the independent variables, major, gender, and the interaction, all rolled into one (sort of) a is the y-intercept of the line b is the slope of the line e is the error variance In the ANOVA table, the intercept F-test is testing if the y-intercept (a) is different than zero. In a certain sense, the corrected model F-test is testing whether the slope (b) is different than zero. When the slope is different than zero, the independent variables (x) affect the dependent variable (y). In the manner of a correlation, a (b) near 1.0 and a low (e) gives us a correlation scattergram with a long, skinny oval. (i.e., a good correlation). Error variance (e) is analogous to the fatness of the oval. Line, slope (b) y (DV) Skinny oval and slope near 1.0 indicates high correlation coefficient y-intercept (a) x (IV) Page 8 not usually important. It compares the grand mean (101.0) to zero. Because 101 is so far from zero, the F is enormous. (But see the sidebar for its deeper meaning.) What’s Next? A lot more...