ANALYSIS OF VARIANCE (ANOVA) ? = ? = 1 STATITICAL DATA ANALYSIS COMMON TYPES OF ANALYSIS? 1. Examine Strength and Direction of Relationships a. Bivariate (e.g., Pearson Correlation—r) Between one variable and another: rxy or Y = a + b1 x1 b. Multivariate (e.g., Multiple Regression Analysis) Between one dep. var. and each of several indep. variables, while holding all other indep. variables constant: Y = a + b 1 x1 + b 2 x2 + b 3 x3 + . . . + b k xk 2. Compare Groups a. Compare Proportions (e.g., Chi-Square Test—2) H0: P1 = P2 = P3 = … = Pk b. Compare Means (e.g., Analysis of Variance) H0: µ1 = µ2 = µ3 = …= µk 2 ONE-WAY ANOVA ANOVA was developed in 1919 by Sir Ronald Fisher, a British statistician and geneticist/evolutionary biologist Sir Ronald Fisher (1890-1962) When Do You Use ANOVA? • To compare the mean values of a certain characteristic among two or more groups. • To see whether two or more groups are equal (or different) on a given metric characteristic. • To examine whether a metric dependent variable is a function of a categorical independent variable. 3 Remember: Level of measurement determines choice of statistical method. Statistical Techniques and Levels of Measurement: INDEPENDENT NOMINAL/CATEGORICAL N O M I N A L M E T R I C * Chi-Square * Fisher’s Exact Prob. * T-Test * Analysis of Variance (An Example ?) METRIC (ORDERED METRIC or HIGHER) * Discriminant Analysis * Logit Regression * Correlation Analysis * Regression Analysis 4 ONE-WAY ANOVA H0 in ANOVA? H0: There are no differences among the mean values of the groups being compared (i.e., the group means are all equal)– H0: µ1 = µ2 = µ3 = …= µk Ha (Conclusion if H0 rejected)? Not all group means are equal (i.e., at least one group mean is different from the rest). 5 ONE-WAY ANOVA So, the number of steps involved in ANOVA depend on if we are comparing 2 groups or > 2 groups: • Scenario 1. When comparing 2 groups, a one-step test : 2 Groups: A B Step 1: Check to see if the two groups are different or not, and if so, how. • Scenario 2. When comparing >3 groups, if H0 is rejected, it is a two-step test: >3 Groups: A B C Step 1: Overall test that examines if all groups are equal or not. And, if not all are equal (H0 rejected), then: Step 2: Pair-wise (post-hoc) comparison tests to see where (i.e., among 6 which groups) the differences exit, and how. Typical solution presented in statistics classes require… • Constructing an ANOVA TABLE Test Statistic ANOVA TABLE Sum of Squares df SSB K–1 (Between Groups Sum Of Squares) SSW N–K (Within Groups Sum of Squares) SST (Total Sum of Squares) Mean Squares F-Ratio MSB = SSB / K-1 MSB n1 ( x1 x) 2 F = MSB / MSW n2 ( x 2 x) 2 ... nk ( xk x) 2 K 1 corresponding MSW = SSW / N-K MSW ( x x ) 2 ( x x ) 2 ... ( x x ) 2 nK 1i 1 2i 2 ki k N–1 Let’s see the intuitive logic… 7 ONE-WAY ANOVA EXAMPLE: Whether or not average earnings per share (EPS) for commercial banks, retailing operations, & utility companies (variable Industry) was the same last year. • Sample Data: A random sample of 9 banks, 10 retailers, and 10 utilities. • Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries Banking Retailing Utility $6.42 $3.52 4.21 4.36 2.67 3.49 4.68 3.30 2.68 7.25 0.16 2.83 8.94 6.80 5.70 4.65 6.20 2.71 8.34 ----- nB = 9 nR = 10 $3.55 2.13 3.24 6.47 3.06 1.80 5.29 2.96 2.90 1.73 nU = 10 n = 29 H0: There were no differences in average EPS of Banks, Utilities, and Retailers. First logical thing you do? _ xB = 5.84 _ xR = 3.63 _ xU = 3.31 = X = 4.21 8 ONE-WAY ANOVA Why is it called ANOVA? • Differences in EPS (Dep. Var.) among all 29 firms has two components--differences among the groups and differences within the groups. That is, a. There are some differences in EPS among the three groups of firms (Banks vs. Retailers vs. Utilities), and b. There are also some differences/variations in EPS of the firms within each of these groups (among banks themselves, among retailers themselves, and among utilities themselves). • ANOVA will partition/analyze the variance of the dependent variable (i.e., the differences in EPS) and traces it to its two components/sources--i.e., to differences between groups vs. differences within groups. WHY? 9 ONE-WAY ANOVA The underlying intuitive logic in ANOVA: If the groups that are being compared, come from the same population (i.e., if groups are alike/equal): • They should exhibit similar differences (have equal variability) • Hence, the differences among these groups should be no more than the differences within them (i.e., among members within same groups). • That is, groups that are alike/similar are expected to have about as much variability between them as they have within them. 10 ONE-WAY ANOVA On the other hand… If the groups being compared are divergent/dissimilar/unequal ? They would exhibit more difference between them than they show within them. Among members within the same groups That is, they will have greater similarity/commonality internally than they have externally (with members of the other groups). 11 ONE-WAY ANOVA CRITERION USED BY ANOVA: Groups can be considered different if there exists…? …if there exists larger differences among these groups than there are among members within them. QUESTION: Given the above, what would one have to do to conduct ANOVA? • That is, what do you have to do to judge whether or not two or more groups can be considered different/equal (with respect to a given characteristic)? a. Compute the differences that exist among these groups, and b. Compare it with the differences that exist within these groups. And, that is exactly what ANOVA does…. QUESTION: How do we usually measure differences? 12 ONE-WAY ANOVA QUESTION: How do we usually measure differences/variations? • VARIANCE: A useful index of differences/variations/ dispersion among a set of values/scores. – Estimate of average (i.e., per observation) difference from the mean • Computation? Sum of squared deviations from the mean S2 = Sample Size – 1 2 S ( x x) 2 n 1 13 ONE-WAY ANOVA • So, steps in performing ANOVA: a. Compute the BETWEEN-GROUP VARIANCE for the characteristic under study (i.e., the dependent variable), b. Compute the WITHIN-GROUP VARIANCE for the same characteristic/variable, and then c. COMPARE the two (i.e., check to see if Between Group var. > Within Group Var.) NOTE: In ANOVA the term “MEAN SQUARE,” rather than variance, is utilized. 14 ONE-WAY ANOVA • Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries Banking Retailing 6.42 2.83 8.94 6.80 5.70 4.65 6.20 2.71 8.34 ----- 3.52 4.21 4.36 2.67 3.49 4.68 3.30 2.68 7.25 0.16 nB = 9 nR = 10 _ xB = 5.84 Utility 3.55 2.13 3.24 6.47 3.06 1.80 5.29 2.96 2.90 1.73 _ xR = 3.63 nU = 10 n = 29 _ = xU = 3.31 X = 4.21 Total WITHIN Group Variance (or Mean Square WITHIN)? 2 2 2 2 2 (6.42 - 5.84) ... (8.34 5.84) (3.52 3.63) ... (0.16 3.63) (3.55 3.31) (1.73 3.31) MSW 15 (9 10 10 3) 2 ONE-WAY ANOVA Mean Square WITHIN Groups (MSW): (6.42 - 5.84) 2 ... (8.34 5.84) 2 (3.52 3.63) 2 ... (0.16 3.63) 2 (3.55 3.31) 2 (1.73 3.31) 2 MSW (9 10 10 3) 87.112 MSW 3.350 26 Let’s see what we just did: MSW Sum of Squared Deviationsof All Observations From T heir RespectiveGroup Means SS Within T otalSample Size - Number of Groups (N - K) The generic mathematical formula for MSW: MSW ( x Bi x B ) 2 ( x Ri x R ) 2 ( xUi x U ) 2 Called “Degrees of Freedom”= (nB-1)+(nR-1)+(nU-1) nK 16 ONE-WAY ANOVA • Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries Banking Retailing _ 6.42 2.83 8.94 6.80 5.70 4.65 6.20 2.71 8.34 ----- 3.52 4.21 4.36 2.67 3.49 4.68 3.30 2.68 7.25 0.16 nB = 9 nR = 10 xB = 5.84 _ xR = 3.63 Utility 3.55 2.13 3.24 6.47 3.06 1.80 5.29 2.96 2.90 1.73 nU = 10 n = 29 _ = xU = 3.31 x = 4.21 Let’s now compute the BETWEEN Group Variance (Mean Square BETWEEN--MSB)? 9(5.84 4.21)2 10(3.63 4.21)2 10(3.31 4.21)2 35.397 17 MSB 17.698 3 1 2 ONE-WAY ANOVA Mean Square BETWEEN Groups (MSB): 9(5.84 4.21)2 10(3.63 4.21)2 10(3.31 4.21)2 35.397 MSB 17.698 3 1 2 Let’s see what we just did: MSB Weighted by respective group sizes Sum of Squared Deviations of Group Means from the Grand Mean SS Between Number of Groups - 1 K -1 Mathematical formula for MSB: MSB Called Degrees of Freedom nB ( x B x) 2 nR ( x R x) 2 nu ( xu x) 2 K 1 18 ONE-WAY ANOVA Mean Square Between Groups = MSB = 17.698 MSB represents the portion of the total differences/variations in EPS (the dependent variable) that is attributable to (or explained by) differences BETWEEN groups (e.g., industries) • That is, the part of differences in companies’ EPS that result from whether they are banks, retailers, or utilities. 19 ONE-WAY ANOVA Mean Square Within Groups (MS Residual/Error) = MSW = 3.35 MSW represents: a. The differences in EPS (the dependent variable) that are due to all other factors that are not examined and not controlled for in the study (e.g., diversification level, firm size, etc.) Plus . . . b. The natural variability of EPS (the dependent variable) among members within each of the comparison groups (Note that even banks with the same size and same level of diversification would 20 have different EPS levels). ONE-WAY ANOVA Now, let’s compare MSB & MSW: MSB = 17.6 and MSW = 3.35. QUESTION: Based on the logic of ANOVA, when would we consider two (or more) groups as different/unequal? When MSB is significantly larger than MSW. QUESTION: What would be a reasonable index (a single number) that will show how large MSB is compared to MSW? (i.e., a single number that will show if MSB is larger than, equal to, or smaller than MSW)? 21 Compare BETWEEN and WITHIN Group Variances/Mean Squares--Compute the F-Ratio: • Ratio of MSB and MSW (Call it F-Ratio): MSB F MSW • What can we infer when F-ratio is close to 1? – MSB and MSW are likely to be equal and, thus, there is a strong likelihood that NO difference exists among the comparison groups. • How about when F-ratio is significantly larger than 1? – The more F-ratio exceeds 1, the larger MSB is compared to MSW and, thus, the stronger would be the likelihood/evidence that group difference(s) exist. MSB 17.698 F 5.282 MSW 3.350 • Results of the above computations are usually summarized in an ANOVA TABLE such as the one that follows: 22 2 2 2 2 2 (6.42 - 5.84) ... (8.34 5.84) (3.52 3.63) ... (0.16 3.63) (3.55 3.31) (1.73 3.31) MSW (9 10 10 3) MSW 2 87.112 3.350 26 9(5.84 4.21) 2 10(3.63 4.21) 2 10(3.31 4.21) 2 35.397 MSB 17.698 3 1 2 ANOVA TABLE Source Sum of Squares df Mean Squares F Between Groups 35.397 K–1=2 35.39 / 2 = 17.698 17.698 / 3.35 = 5.282 Within Groups 87.112 N – K = 26 87.11 / 26 = 3.350 Total 122.509 N – 1 = 28 23 ONE-WAY ANOVA Interpretation and Conclusion: QUESTION: What does the F = 5.28 mean, intuitively? For our sample companies, EPS difference across the three industries (MSB) is more than 5 times the EPS difference among firms within the industries (MSW) • QUESTION: What is our null Hypothesis? • QUESTION: Is the above F-ratio of 5.28 large enough to warrant rejecting the null? – ANSWER: It would be if the chance of being wrong (in rejecting the null) does not exceed 5%. – So, look up the F-value in the table of F-distribution (under appropriate degrees of freedom) to find out what the -level will be if, given this Fvalue, we decide to reject the null. • Degrees of Freedom: v1 = k – 1 = 2 v2 = n – k = 26 24 11 F = 3.37 is significant at = 0.05 25 (If F=3.37 and we reject H0, 5% chance of being wrong) • Our F = 5.28 > 4.27 –So, what can we say about our -level? F = 4.27 is significant at = 0.025. That is, if F=4.27 and we reject H0, we would face 5% chance of being wrong. But, our F = 5.28 > 4.27 So, what can we say about our -level? Will it be larger or smaller than 0.025? 26 ONE-WAY ANOVA • Our F = 5.28 > 4.27 • The odds of being wrong, if we decide to reject the null, would be less than 2.5% (i.e., < 0.025) . Would rejecting the null be a safe bet? Conclusion? Reject the null and conclude that the average EPS is NOT EQUAL FOR ALL GROUPS (industries) being compared. Is the analysis complete? 27 ONE-WAY ANOVA • Is our analysis complete? – It would be if we were comparing only two groups; simply examine which sample mean is larger than which and report!! HOWEVER, … – If null is rejected and more than two groups are being compared: • REMAINING QUESTION: Where exactly (i.e., between which groups) do the differences lie? And, which group(s) of firms exhibit relatively higher, lower, or equal EPS levels? • ANSWER: Perform post hoc, multiple comparison tests. – SPSS (and other software packages) offer a variety of options (e.g., LSD, Bonferroni, Tukey, etc.) to choose28from. Let’s now review the steps involved… ONE-WAY ANOVA Overall Ho: All Group Means Are Equal Is overall F significant? (i.e., < 0.05) H1: Not All Groups Are Equal No ( > .05) Don’t reject Ho; No group diff. found; stop Yes ( < .05) Reject Ho; Not all group means are equal. (i.e., at least 2 groups are diff.) How many groups are being compared? If only 2 Examine the group means. Report which group has higher/lower mean Stop If more than 2 Conduct post-hoc pairwise comparison tests to see where the differences lie. Examine the results. Examine the group means. Report which groups have higher/lower means. 29 Stop ANOVA in SPSS Let’s now use SPSS to perform the same analysis. NOTE: Students are supposed to have printed and brought the “SPSS OUTPUT One-Way ANOVA” PDF file with them to class. ONE_WAY_EPS_SPSS_FILE 30 TWO-WAY ANOVA (with Interaction) In our EPS example, suppose you suspect that a company’s size category (small vs large) also may have a sig. effect on EPS. As such, since you did not attempt to control for company size when selecting your sample firms, small and large companies may not have been equally represented in the three industry groups (e.g., what if compared to the banks in the sample, all or a much greater % of retailers and utilities were small?). As such you are concerned that the potential confounding effect of company size may have distorted your earlier results. So, you now wish to examine possible EPS differences among the 3 industries while controlling for the possible confounding effect of company size (i.e., holding size constant/equal for the firms in our three industries). In other words, you wish to know if there are any differences among average EPS of banks, retailers, and utilities of equal size. SOURCES OF BETWEEN GROUP DIFFERENCES . COMPANY SIZE Bank INDUSTRY Retailer Utility Small Large 31 TWO-WAY ANOVA (with Interaction) • So, Two-Way ANOVA will help us learn if banks in general, even after controlling for co. size, would, on average, have higher EPS than retailers and utilities. • But an additional advantage of Two-Way ANOVA is that it can also show us whether a particular group of banks (i.e., CERTAIN COMBINATIONS of industry and size category) are more/less conducive to EPS than others combinations of the two characteristics. As just one example, it can show us if only the larger banks (and not all banks in general) have significantly higher EPS compared to firms in the other two industries (or compared to only the smaller firms in the other two industries). 32 ANOVA Using SPSS • TWO-WAY ANOVA (with Main & Interaction Effects): –Analyze: General Linear Models –Univariate: Y to “Dependent” box, Categorical X1 & X2 to the “Fixed Factors” box –Model: Full, Continue –Plots: X1 to “Horizontal”, X2 to “Separate Lines”, Add, Continue –Post Hoc: Move factors (IVs) with >2 groups to “Post Hoc Tests” box, select “Tukey or Bonferoni”, Continue –Options: Move Overall, X1, X2, and X1*X2 to “Display Means” Box, check “Descriptive Stats.”, Continue –OK NOTE: Students are supposed to have printed and brought the “SPSS OUTPUT Two-Way ANOVA with Interaction” PDF file with them to class. TWO_WAY_EPS_SPSS_FILE 33 TWO-WAY ANOVA (Main & Interaction Effects Model) Ho: There are no differences among the groups represented by either variable No Don’t reject Ho; No group diff. found; STOP Is overall F significant? (i.e., < 0.05) Reject Ho; Some differences among the groups Yes represented by at least one of the var. Determine if the interaction effect is significant? NO YES Examine plot of interaction effect for results a. Examine which main effect, if any, is significant (i.e., differences exist across categories of which independent variable). STOP b. Is the significant indep. var. dichotomous (i.e. represents only 2 groups)? Yes, only 2 groups No, more than 2 groups Examine the group means for that variable; report which group has higher/lower mean. Conduct post-hoc pairwise comparison tests for that var. to see where the differences lie. Examine the results. Examine the group means for that variable; report which groups have higher/lower means. 34 STOP STOP ANOVA CAUTION: Don’t get carried away with the number of factors (independent categorical variables); DON’T DO N-WAY ANOVA !!! 35 ANOVA Using SPSS ANOTHER EXAMPLES: • Using the gss.sav data file, we wish to find out if the age at which one gets married (agewed) is a function of one’s gender (sex) and highest educational degree (degree). That is, if average marriage age is different among the two genders and various educational groups. If so, in what way? • NOTE: Here, we are considering/treating educational degree as a nominal/categorical variable, and NOT as an ordered metric variable. 36 ASSIGNMENT 4 1. Suppose, as a social scientist, you are interested in studying gender differences in preference for different types of music. Specifically, you wish to know if there are differences between men and women relative to how much they like classical music (variables classical). The gss.sav data file (on your SPSS Data Disk) includes data regarding such issues. This data set represents 1500 randomly selected cases from the 1993 General Social Survey. Use the data from this SPSS file to address the above questions. NOTE: If you check the value labels for the variables classical, opera, and country in the gss.sav file, you will see that they were measured on 5-point scales (1=Like Very Much, 5=Dislike Very Much) and, thus, can be considered metric. 37 ASSIGNMENT 4 2. As a staff researcher in the HR Department of a major company, you are interested in learning if there are differences among male and female employees and among employees who have different levels of education regarding the level of importance that they attach (a) to having a fulfilling job. Data regarding such issues have been obtained through the General Social Research Survey using a representative sample of approximately 1500 working men and women in the U.S. You have access to the resulting data (see gss.sav SPSS data file, variables sex, impjob, and degree). Use this data set to address the above issues. 38 IMPORTANT NOTES FOR QUESTIONS 2, 3, AND 4: Treat variable “degree” as a categorical/nominal variable. When interpreting the results, please pay attention to the fact that if you check the value labels for the dependent variables, you will notice that it was measured on 5-point scales (1=One of Most Important, 5=Not at All Important). If you find it necessary to conduct ad-hoc multiple comparison tests, use the Tukey option. IMPORTANT: If alpha level for a given test is just slightly higher than 0.05 (e.g., 0.054) consider that difference statistically significant. REMINDERS: – For each analysis, include the Notes part of the SPSS output in the printout. Also edit the first page of every output to include your name. Make sure that you state your complete interpretations and explanations on the appropriate pages of the output. Be specific as to how you have used what parts of the output to reach your conclusions. Make sure that your explanations are complete. For example, it is not enough to say that there is a difference between groups A and B regarding characteristic C. You have to go on to indicate how the two groups are different on characteristic C (e.g., 39 “on average, group A exhibits more/less of the characteristic C”). QUESTIONS OR COMMENTS 40