Sociology 541 Analysis of Variance (ANOVA) So far, we’ve been comparing the means between two groups, but we often want to make comparisons between three or more groups. ANOVA enables us to compare means for multiple (two or more) groups. Individual t tests result in a rapid increase in the probability of a Type I error. ONE-WAY ANOVA Alternative way to compare means is using technique called Analysis of Variance (ANOVA) developed by Ronald Fisher (1920s). ANOVA can be used when: Have quantitative/interval dependent/response variable (hours of exercise per week) Have qualitative independent/explanatory variable (age group) Notation G denotes the number of groups/populations. The means of the response/dependent variable for the g groups/populations are 1, 2,..., g Denote group sample sizes by n1, n2, ..., ng and total sample size N= n1 + n2 + ...+ ng Sample means for each group are: Y 1 , Y 2 ,..., Y g Sample standard deviations for each group are: s1, s2, ..., sg Grand mean (across all groups): Y ANOVA is a test of the Ho: 1= 2=...= g against H1 that at least two means are unequal. Basic logic behind ANOVA: The greater the variability between sample means and the smaller the variability within each group of sample observations, the stronger the evidence that the means across the groups may not be equal. An ANOVA analysis constructs a test statistic called an F statistic that compares the variation between groups (variability of the sample means about the overall mean) to the variation within groups (variability of the sample observations about their separate means). 1 How can we go about measuring these different types of variation? Sum of Squares Sum of squares is found by squaring the deviations of each observation from the mean of a distribution and adding these squared deviations together. Numerator for the variance. (Y Y ) 2 i Remember that this is equivalent to the following expression: Yi2 – ( Yi)2 / n Taking into account that each of the N observations belongs to one of the g groups: g ng (Yi Y ) 2 (Yig Y ) 2 g 1 i 1 To be able to estimate the proportion of variation in the outcome Y (mean hours of exercise) due to group effects (age) and due to unexplained factors or random variation, we can partition the numerator of the total variance into two independent additive components: Variation between groups and variation within groups. How might we partition this total sum of squares? (Yig Y ) Yig (Y g Y g ) Y You can re-group these terms: (Yig Y ) (Yig Y g ) (Y g Y ) The second term in this equation is the estimate of the variation between groups. The first term in the equation is an estimate of the within-group variation. Summing over all observations and groups and squaring terms: SStotal = SSwithin + SSbetween 2 Estimate of Total Variance (ignoring groups) (Y i Y )2 N 1 Numerator of this variance estimate based on the entire sample is called the total sum of squares (TSS). Denominator of this variance estimate based on the entire sample is the total degrees of freedom (DFt) Between Estimate of Variance (or mean square between groups) How much variation is there between the groups in outcome of interest? Estimate of 2 (variance) based on variability between each sample mean and the overall mean: 2 betweengroups n g (Yg Y ) 2 g 1 Numerator for this estimate is called the between sum of squares (BSS). Denominator of between variance is the degrees of freedom between groups. Ratio of BSS to its degrees of freedom, g-1, is the between groups estimate of the variance. Within Estimate of Variance (or mean square within groups) How much variation is there within each group in outcome of interest? Pool together the sum of squares of the observations about their respective means. Since we're pooling these variances, the homogeneity of variance assumption is required. 2 withingroups (Yig Y g ) 2 Ng The numerator for this expression is called the within sum of squares (WSS). The denominator is called the degrees of freedom within groups. WSS has degrees of freedom equal to sum of DF of component parts: n1 -1 + n2 -1 + ... + ng -1 = N-g Note: Kurtz uses the sums-of-squares formulas to calculate the different variances. Remember that this approach and the one presented in this handout for calculating variances are equivalent (review pages 69-74 of Kurtz). 3 F Test Statistic The above formulas enable you to partition the variance in the total sample into the amount of variance between groups and the amount of variance within groups. If a relatively large amount of the variance is explained between groups compared to within groups, we can conclude that differences between groups is probably real. To determine this, calculate an F Statistic. F test statistic for Ho: 1= 2=...= g is the ratio of the between-group variance estimate to the within-group variance estimate. F BetweenEstimate BSS /( g 1) WithinEstimate WSS /( N g ) Known as the Analysis of Variance F statistic, or ANOVA F statistic. We know the sampling distribution of F and therefore know the probability of finding a given F. Thus, we know the magnitudes of F needed to establish statistical significance at various levels. Table b.7 in Appendix B of Kurtz presents the minimum F ratios necessary for significance at different p levels. The probability associated with an F ratio depends on the degrees of freedom. The two degrees of freedom terms are the denominators of the between estimate and the within estimate. This F test statistic has the F sampling distribution with df1=g-1 and df2=N-g. A p value less than 0.05 indicates that the probability is less than 5% on any given test of the null hypothesis that the outcome does not differ by group. NOTE FOR FUTURE REFERENCE: Post-hoc comparisons, such as Tukey's HSD (honestly significant difference) or Bonferroni, can be used to investigate the multiple comparison of means, controlling for the Type I error probabilities. 4 Example Group Young Middle-aged Old Hours of exercise per week 11 12 6 7 3 9 0 8 4 2 5 5 0 2 4 1 3 2 Group Mean Grand Mean Sample size 6 6 6 Are the age groups really different in their propensity to exercise? Or are differences due to chance fluctuations alone? I State null hypothesis. II Calculate mean hours of exercise for each group (group mean) and mean hours of exercise for all groups combined (grand mean). Does there appear to be variation between groups based on this information? Does there appear to variation within groups in hours of exercise? III Calculate the between-group sum of squares (BSS) and the between-group variance. IV Calculate the within-group sum of squares (WSS) and the within-group variance. V Calculate the appropriate test statistic (F) and find corresponding p-value. VI Conclusions. 5 ANOVA Table Common way to summarize the results of analysis of variance. ANOVA HOURS Between Groups W ithin Groups Total Sum of Squares 112.000 104.000 216.000 df 2 15 17 Mean Square 56.000 6.933 F 8.077 Sig. .004 Sum of BSS and WSS is the total sum of squares, denoted by TSS. TSS (Yi Y ) 2 BSS WSS Sums of Squares divided by their degrees of freedom are called mean squares. The two mean squares are the between-groups and within-groups estimates of the population variance 2. Ratio of two mean squares is the F test statistic. Assumptions of ANOVA 1. 2. 3. 4. The dependent variable/outcome is measured at the interval/ratio level. Random samples are selected from the g populations. The g samples are independent of one another. Population distributions on response variable for g groups are normal (One-way independent groups ANOVA is generally considered robust against violation of this assumption if n 30 for all groups). 5. Standard deviations/variances of population distributions for g groups are equal (Homogeneity of variance assumption). 6 Problem Listed below are gains on the SAT for three groups, those who did nothing special to prepare for the test (controls), those who prepared by using a set of print materials designed to improve SAT scores (Print), and those who used a computer-assisted (computer) set of materials designed to improve the scores: Control 4 2 2 3 5 Print 5 7 7 8 5 Computer 7 9 10 10 11 Conduct an ANOVA analysis of these data and display your results in an ANOVA summary table. Interpret all components of the ANOVA table. 7 Analyze-Compare Means-Oneway ANOVA-Options (in file called anova1.sav) ONEWAY hours BY agegrp /STATISTICS DESCRIPTIVES HOMOGENEITY /PLOT MEANS /MISSING ANALYSIS . Oneway Descriptives HOURS N young middle-aged old Total 6 6 6 18 Mean 8.0000 4.0000 2.0000 4.6667 Std. Deviation 3.3466 2.7568 1.4142 3.5645 Std. Error 1.3663 1.1255 .5774 .8402 95% Confidence Interval for Mean Lower Bound Upper Bound 4.4879 11.5121 1.1069 6.8931 .5159 3.4841 2.8941 6.4393 Minimum 3.00 .00 .00 .00 Maximum 12.00 8.00 4.00 12.00 ANOVA HOURS Between Groups W ithin Groups Total Sum of Squares 112.000 104.000 216.000 df Mean Square 56.000 6.933 2 15 17 F 8.077 Sig. .004 Means Plots 9 8 7 6 Mean of HOURS 5 4 3 2 1 young middle-aged old A GEGRP 8 Relationship of t to F The t test and ANOVA comparing two groups yield identical results. Indeed, the square root of an F test with 1 and x degrees of freedom equals a t test with x degrees of freedom for the same set of data. Take example of difference in hours of exercise per week for young and middle-aged groups only (2-group comparison). T-Test Group Statistics HOURS AGEGRP young middle-aged N Mean 8.0000 4.0000 6 6 Std. Deviation 3.3466 2.7568 Std. Error Mean 1.3663 1.1255 Independent Samples Test Levene's Test for Equality of Variances F HOURS Equal variances assumed Equal variances not assumed .488 t-test for Equality of Means Sig. t .501 df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 2.260 10 .047 4.0000 1.7701 5.592E-02 7.9441 2.260 9.646 .048 4.0000 1.7701 3.623E-02 7.9638 Oneway Descriptives HOURS N young middle-aged Total 6 6 12 Mean 8.0000 4.0000 6.0000 Std. Deviation 3.3466 2.7568 3.5929 Std. Error 1.3663 1.1255 1.0372 95% Confidence Interval for Mean Lower Bound Upper Bound 4.4879 11.5121 1.1069 6.8931 3.7172 8.2828 Minimum 3.00 .00 .00 Maximum 12.00 8.00 12.00 ANOVA HOURS Between Groups W ithin Groups Total Sum of Squares 48.000 94.000 142.000 df 1 10 11 Mean Square 48.000 9.400 F 5.106 Sig. .047 9 1998 GSS: Test the hypothesis that marital status (MARITAL) is related to hours per day spent watching TV (TVHOURS) by doing an ANOVA. Set =0.01. Interpret the results. MARITAL What is your current marital status? 1 married 2 widowed 3 divorced 4 separated 5 never married 9 NA TVHOURS Hours per day watching TV? 98 DK 99 NA Remember to make any necessary recodes – treat all NA/DK responses as missing. Test the hypothesis that marital status (MARITAL) is related to happiness (HAPPY) by doing an ANOVA. Set =0.01. Interpret the results. HAPPY General happiness? 0 NAP 1 Very happy 2 Pretty happy 3 Not too happy 8 DK 9 NA 10