Chapter 8 ANALYSIS OF VARIANCE (ONE WAY) Learning Objectives In this chapter you will learn how to analyze between more than two groups of subjects Analysis of variance (ANOVA) is a statistical test that is designed to examine means across more than two groups by comparing variances, based upon the variability in each sample and in the combined samples SOURCE OF ANOVA With ANOVA there is no limit to the amount of groups that can be compared When there are three or more levels for the nominal (grouping) variable the number of comparisons increases with the number of groups. Therefore having using multiple t-tests would be required for all comparisons two at a time, the results would be very difficult to interpret. SOUCRE OF ANOVA ANOVA reports the variance within the groups, ANOVA then calculates how that variation would translate into differences between the groups while taking into account how many groups there are In your data set the region is a grouping variable, and use the number of prisoners executed between 1975-1995 as the independent variable We see that the mean number of prisoners executed varies across the regions over the time period SOURCE OF ANOVA Under ANOVA the null hypothesis is that the means for all the group (regions) are equal The null hypothesis is that there is no difference in the average number of inmates executed between 1976 and 1995 across the four regions of the country ANOVA tests to see which region (independent variable) is producing the effect, that at least one of the group means is different from the other SOURCES OF ANOVA When scores are divided into three or more groups the variation can be divided into two parts: – 1. The variation of scores within the groups. Variance and standard deviation – 2. The variation of scores between the groups. There is variation from one group to the next. It accounts for the variability of the means between each sample SOURCES OF ANOVA ANOVA compares these two estimates of variability The main function of ANOVA involves whether the between-groups variance is significantly larger than the within-groups variance. If we show that the variance of between-groups is larger than the variance of the within-groups then we reject the null hypothesis If the means of the groups are equal then they will vary little around the total mean ( ) across the groups, if the group means are different they will vary significantly around ( ) more than they vary within their groups SOURCES OF ANOVA ANOVA tests the null hypothesis by comparing the variation between the groups to that within the groups. To compute ANOVA you: – 1. The total amount of variation among all scores combined (total sum of squares – SSt – 2. The amount of variation between the groups (between-groups sum of squares – SSb – 3. The amount of variation within the groups (withingroups sum of squares SSw) THE F TEST (F RATIO) The ‘F’ test is a ratio of the two estimates of variability (between-groups mean square divided by the withingroups mean square variation) If null hypothesis is true then ‘F’ ratio value is one, if null hypothesis is false then ‘F’ ratio is value is greater than one If the ‘F’ test value is statistically significant then you reject the null hypothesis, hence the group means are not equal You can look at he group means to tell this, but inspection does not revel where the differences leading to the ‘F’ value and the rejection of null hypothesis originates The F Test (F Ratio) To locate the source of the difference you must use a multiple comparison procedure, the Bonferroni (discussed latter) procedure is recommended. This test allows you to pinpoint where the difference originate In our example we have four groups, thus eight possible comparisons, the Bonferroni procedure pinpoints where the differences come from, protecting you from concluding that too many of the differences between the group means are statistically significant. The ‘F’ Test (F ratio) ANOVA Requirements: – The data must be a random sample from a population – The single dependent variable must be measured at the interval level ( in order to compute a mean) – The independent variable need only to be measured categorically at either the nominal or ordinal level (to provide group means) ANOVA and SPSS Calculating ANOVA by hand is nearly impossible by hand with large groups, yet the example in the book shows where the numbers that SPSS generates come from Using SPSS, we are going to compare the average number of inmates executed across the four regions of the United States to see if different categories of states (region is measured at the nominal level) vary significantly in the mean number of prisoners executed (a ratio level variable) over this time period (1977-1995). The independent variable is the grouping of the regions, and the dependant variable is number of prisoners executed between 1977-1995. SPSS, One way ANOVA To obtain your SPSS output for ANOVA follow the steps outlined here. 1.Open the existing file, “StateData.sav” figure 8.1. This is the state data set containing crime and other data from the criminal justice system for all fifty states 2. Choose “Analyze”, “Compare means,” and then “One-Way ANOVA” figure 8.2 3. In the “One-Way ANOVA” window (Figure 8.3) Select and enter the dependent variable – “Prisoners Executed between 1977-1995” – in the “Dependant List” box. The program will calculate the mean and other statistics for this variable. Select and enter the independent variable“Region” in the “Factor” box. This variable divided the sample into groups 4. Returning to the One-Way ANOVA window (Figure 8.4), click on the “Options” button to open the One-Way ANOVA: Options window. Under statistics check “Descriptive.” Click on “Continue” to return to the One way ANOVA window. 5. Click on “Post Hoc.” Select the “Bonferroni” method (Figure 8.5). Click on “Continue” to return to the One-Way ANOVA window. 6. Click on “Ok” to generate your output. SPSS & One Way ANOVA In this example we selected “Prisoners Executed between 1977-1995 as the independent variable. The null hypothesis is that there is no regional difference in the average number of prisoners executed between 1977 and 1995. The research hypothesis is that the South has the highest average number of prisoners executed during the this time period. Results The ANOVA table from 8.1 is what we need to focus on, because we are focusing on the null hypothesis (that there is no difference in the average number of executions across the regions during the this time period). Thus we must make the decision to accept or reject the null hypothesis. Results The first column in ANOVA gives us the sum of squares between and within the groups and for the entire sample. The total sum of squares represents the entire variance on the dependent variable for the entire sample. The second column represents the degrees of freedom, (n-1). The total degrees of freedom represent 50-1=49, degrees of freedom between groups equals the number of groups minus one (4-1=3). The within groups degrees of freedom equals 49-3=46. Results The third (mean square) column in figure 8.1 contains the estimates of variability between and within the groups. The mean square estimate is equal to the sum of the squares divided by the degrees of freedom. The between the groups mean square is 2469.194/3=823.065, the within-groups mean square is 10260.426/46=223.053 Results The fourth column, the F ratio, is calculated by dividing the mean square between groups by the mean square within the groups. If the null hypothesis is true, both mean square estimates should be equal and the F ratio should be one, the larger the F ratio the greater the likelihood that he difference between the means does not result from chance. In our example the F ratio is 3.69 (823.065/223.053). Results The last column in figure 8.1 is the Significance level or Sig., and it tells us that the value of our F ratio (3.69) is large enough to reject the null hypothesis. The Sig. level is .018 is less than .05. The mean number of executions in the different regions of the country between 19771995 were significantly different, and this difference is greater than we would expect by chance. We still can not say where the differences lie. Bonferroni Procedure The Bonferroni comes into play here (Post Hoc Tests). It adjusts the observed significance level by multiplying it by the number of comparisons being made, since we are looking at eight possible comparisons of group means the observed sig. level must be 0.05/8 or 0.004 for the difference between group means to be significant at the 0.05 level. Bonferroni Procedure The multiple comparison procedures protect you from calling differences significant when they are not, the more comparisons you make the larger the difference between pairs of means must be for a multiple comparison procedure to call it statistically significant. The multiple comparisons table, table 8.1 gives us all possible combinations here. Each block represents one region compared to all the others by row. We see that the mean number of executions in southern states compared against the western states is statistically significant at 0.05 or less. Conclusion ANOVA is a statistical technique that compares the difference between sample means when you have more than two samples or groups. We are analyzing the variance within and between the samples to determine the significance of any differences The F ratio between the mean squares between the groups (MSb) and the mean squares within the groups (MSw) is the heart of ANOVA. If the null hypothesis is rejected from the F test there is a statistically significant difference between the means of the groups. In order to determine where the significant difference lies, the Bonferroni multiple comparison method must be used. This test tells you which of the multiple comparisons between groups is statistically significant