Chapter 12 Analysis of Variance 1 Goals 1. List the characteristics of the F distribution 2. Conduct a test of hypothesis to determine whether the variances of two populations are equal 3. Discuss the general idea of analysis of variance 4. Organize data into a ANOVA table 5. Conduct a test of hypothesis among three or more treatment means 2 F Distribution 1. Used to test whether two samples are from populations having equal variances 2. Applied when we want to compare several population means simultaneously to determine if they came from equal population ANOVA Analysis of variance In both situations: Populations must be Normally distributed Data must be Intervalscale or higher 3 Characteristics Of The F Distribution 1. Family of F Distributions 1. Each is determined by: 1. df in numerator 2. df in denominator 2. 3. F Value can assume an infinite number of values from 0 to ∞ Value for F Distribution cannot be negative Smallest value = 0 Positively skewed 1. 2. 5. comes from pop. 2 which has smaller sample variation F distribution is continuous 4. comes from pop. 1 which has larger sample variation Long tail is always to right As # of df increases in both the numerator and the denominator, the distribution approaches normal Asymptotic 1. As X increases the F curve approaches the X-axis 4 Why Do We Want To Compare To See If Two Population Have Equal Variances? What if two machines are making the same part for an airplane? Do we want the parts to be identical or nearly identical? Yes! We would test to see if the means are the same: Chapter 10 & 11 We would test to see if the variation is the same for the two machines: Chapter 12 What if two stocks have similar mean returns? Would we like to test and see if one stock has more variation than the other? 5 Why Do We Want To Compare To See If Two Population Have Equal Variances? Remember Chapter 11: Assumptions for small sample tests of means: 1. Sample populations must follow the normal distribution 2. Two samples must be from independent (unrelated) populations 3. The variances & standard deviations of the two populations are equal 6 Conduct A Test Of Hypothesis To Determine Whether The Variances Of Two Populations Are Equal To conduct a test: Always list Conduct two random samples the sample List population 1 as the sample with the largest variance: n1 = # of observations s1^2 = sample variance n1 – 1 = df1 = degree of freedom (numerator for critical value lookup) List population 2 as the sample with the smaller variance: n2 = # of observations s2^2 = sample variance n2 – 1 = df2 = degree of freedom (denominator for critical value lookup) with the larger sample variance as population 1 (allows us to use fewer tables) 7 Step 1: State null and alternate hypotheses • List the population with the suspected largest variance as population 1 • Because we want to limit the number of F tables we need to use to look up values, we always put the larger variance in the numerator and the smaller variance in the denominator • This will force the F value to be at least 1 • We will only use the right tail of the F distribution • Examples of Step 1: H 0 : 12 22 H 0 : 12 22 H1 : 12 22 H 1 : 12 22 8 Step 2: Select a level of significance: • Appendix G only lists significance levels: .05 and .01 H 0 : 12 22 H 0 : 12 22 H1 : 12 22 H 1 : 12 22 Significance level = .10 .10/2 = .05 Use .05 table in Appendix G Significance level = .05 Use .05 table in Appendix G 9 Step 3: Identify the test statistic (F), find critical value and draw picture • Look up Critical value in Appendix G and draw your picture Level of Significance 0.05 Degrees of Freedom for Denominator (From Pop 2) Degrees of Freedom for Numerator (From Pop 1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 161 18.5 10.1 7.7 6.61 5.99 5.59 5.32 5.12 4.96 4.84 4.75 4.67 4.6 4.54 4.49 4.45 2 200 19.0 9.55 6.94 5.79 5.14 4.74 4.46 4.26 4.1 3.98 3.89 3.81 3.74 3.68 3.63 3.59 3 216 19.2 9.28 6.59 5.41 4.76 4.35 4.07 3.86 3.71 3.59 3.49 3.41 3.34 3.29 3.24 3.2 4 225 19.3 9.12 6.39 5.19 4.53 4.12 3.84 3.63 3.48 3.36 3.26 3.18 3.11 3.06 3.01 2.96 5 230 19.3 9.01 6.26 5.05 4.39 3.97 3.69 3.48 3.33 3.2 3.11 3.03 2.96 2.9 2.85 2.81 6 234 19.3 8.94 6.16 4.95 4.28 3.87 3.58 3.37 3.22 3.09 3 2.92 2.85 2.79 2.74 2.7 7 237 19.4 8.89 6.09 4.88 4.21 3.79 3.5 3.29 3.14 3.01 2.91 2.83 2.76 2.71 2.66 2.61 If you have a df that is not listed in the border, calculate your F by estimating a value between two values. HW #5: df = 11, use value Between 10 & 12 Book says: (3.14+3.07)/2 = 3.105 3.10 10 Step 4 • Step 4: Formulate a decision rule: • Example: • If our calculated test statistic is greater than 3.87, reject Ho and accept H1, otherwise fail to reject Ho 11 Step 5 • Step 5: Take a random sample, compute the test statistic, compare it to critical value, and make decision to reject or not reject null and hypotheses 2 Larger variance s 1 Test Statistic F: F in numerator, s 2 2 • Example Conclusion for a two tail test: always!! Let’s • Fail to reject null • “The evidence suggests that there is not a Look at Handout difference in variation” • Reject null and accept alternate • “The evidence suggests that there is a difference 12 in variation” Colin, a stockbroker at Critical Securities, reported that the mean rate of return on a sample of 10 software stocks was 12.6 percent with a standard deviation of 3.9 percent. The mean rate of return on a sample of 8 utility stocks was 10.9 percent with a standard deviation of 3.5 percent. At the .05 significance level, can Colin conclude that there is more variation in the software stocks? Example 1 Step 1: The hypotheses are 2 H0 : I H1 : I2 2 U U2 Step 2: The significance level is .05. Step 3: The test statistic is the F distribution. Example 1 continued Step 4: H0 is rejected if F>3.68 or if p < .05. The degrees of freedom are n1-1 or 9 in the numerator and n1-1 or 7 in the denominator. Step 5: The value of F is computed as follows. F (3.9) 2 (3.5) 2 1.2416 H0 is not rejected. There is insufficient evidence to show more variation in the software stocks. Example 1 continued ANOVA Analysis Of Variance Technique in which we compare three or more population means to determine whether they could be equal Assumptions necessary: Populations follow the normal distribution Populations have equal standard deviations () Populations are independent Why ANOVA? Using t-distribution leads to build up of type 1 error “Treatment” = different populations being examined 16 Case Where Treatment Means Are Different 17 Case Where Treatment Means Are The Same 18 Example Of ANOVA Test To See If Four Treatment Means Are The Equal 22 students earned the following grades in Professor Rad’s class. The grades are listed under the classification the student gave to the instructor Is there a difference in the mean score of the students in each of the four categories? Use significance level α = .01 # of "Treatments" Rating of Instructor 1 2 3 Excellent Good Fair Course Grades 94 75 78 90 81 77 80 83 88 4 Poor 70 80 76 89 80 75 65 68 82 72 73 74 65 19 Conduct A Test Of Hypothesis Among Four Treatment Means Step 1: State H0 and H1 H0 : µ1 = µ2 = µ3 = µ4 H1 : The Mean scores are not all equal (at least one treatment mean is different) Step 2: Significance Level? α = .01 20 Step 3: Determine Test Statistic And Select Critical Value # of "Treatments" 1 2 3 Rating of Instructor Excellent Good Fair Course Grades 94 75 78 90 81 77 80 83 88 4 Poor 70 80 76 89 80 75 65 k = Number of treatments = n = Total number of observations from all the treatments = Degrees of Freedom in the numerator = k - 1 Degrees of Freedom in the denominator = n - k α = Level of significance = F from appendix G ((df = 3, 18), α = 0.01) 68 82 72 73 74 65 4 22 3 18 0.01 5.09 21 Step 4: State Decision Rule If our calculated test statistic is greater than we reject H0 and accept 5.09 H1, otherwise we fail to reject H0 Now we move on to Step 5: Select the sample, perform calculations, and make a decision… Are you ready for a lot of procedures?!! 22 ANOVA Table Sources of Variations Treatments Error Total Sum of Squares SST SSE SS Total (Total Variation) Degrees of Freedoms (k - 1) (n - k) Mean Square (Estimate of Variation) SST/(k - 1) = MST SSE/(n - k) = MSE F MST/MSE (n - 1) The idea is: If we estimate variation in two ways and use one estimate in the numerator and the other estimate in the denominator: If we divide and get 1 or close to 1, the sample means are assumed to be the same If we get a number far from 1, we say that the means are assumed to be different The F critical value will determined whether we are close to 1 or not 23 ANOVA Table So Far ANOVA Table Sources of Variations Treatments Error Total Mean Square Degrees of (Estimate of Sum of Squares Freedoms Variation) SST 3 SST/3 = MST SSE 18 SSE/18 = MSE SS Total (Total Variation) 21 F MST/MSE Let’s go calculate this! 24 Calculation 1: Treatment Means and Overall Mean # of "Treatments" Rating of Instructor 1 2 3 Excellent Good Fair Course Grades 94 75 78 90 81 77 80 83 88 Treatment Mean 83.25 82.60 Grand Mean = Overall Mean for all the data = 70 80 76 89 80 75 65 76.43 4 Poor 68 82 72 73 74 65 72.33 77.95 XG OverallMean Grand Mean 25 Calculation 2: Total Variation (X - X ) SS T otal 2 G Sum of Squares T otal T otalVariation X A ParticularObservation XG OverallMean 26 Calculation 2: Total Variation Excellent 94 78 81 80 Example of 1st Example of 2nd 94-77.95 = 16.05 16.05^2 = 257.6025 (X - 77.95) (X - 77.95)^2 Good (X - 77.95) (X - 77.95)^2 16.05 257.6025 75 -2.95 8.7025 0.05 0.0025 90 12.05 145.2025 3.05 9.3025 77 -0.95 0.9025 2.05 4.2025 83 5.05 25.5025 88 10.05 101.0025 Totals 271.11 Fair (X - 77.95) 70 80 76 89 80 75 65 Totals (X - 77.95)^2 -7.95 2.05 -1.95 11.05 2.05 -2.95 -12.95 281.3125 Poor (X - 77.95) (X - 77.95)^2 68 -9.95 99.0025 82 4.05 16.4025 72 -5.95 35.4025 73 -4.95 24.5025 74 -3.95 15.6025 65 -12.95 167.7025 63.2025 4.2025 3.8025 122.1025 4.2025 8.7025 167.7025 373.9175 Total Variation = SS Total = 358.615 1284.96 27 ANOVA Table So Far ANOVA Table Sources of Variations Treatments Error Total Sum of Squares SST SSE 1284.96 Degrees of Freedoms Mean Square (Estimate of Variation) 3 SST/3 = MST 18 SSE/18 = MSE 21 F MST/MSE Let’s go calculate this! 28 Calculation 3: Random Variation (X - X ) SSE 2 C Sum of Squares Error Random Variation X A P articularObservation X C Sample Mean for T reatmentC C P articularT reatment 29 Calculation 3: Random Variation Excellent (X - 83.25) 94 78 81 80 Xbar Totals 83.25 75 90 77 83 88 (X - 76.43) 70 80 76 89 80 75 65 76.43 (X - 82.6) (X - 82.6)^2 -7.6 57.76 7.4 54.76 -5.6 31.36 0.4 0.16 5.4 29.16 82.6 158.75 Fair Xbar Totals 10.75 -5.25 -2.25 -3.25 (X - 83.25)^2 Good 115.5625 27.5625 5.0625 10.5625 (X - 76.43)^2 -6.43 3.57 -0.43 12.57 3.57 -1.43 -11.43 41.3449 12.7449 0.1849 158.0049 12.7449 2.0449 130.6449 173.2 Poor 68 82 72 73 74 65 (X - 72.33) (X - 72.33)^2 -4.33 18.7489 9.67 93.5089 -0.33 0.1089 0.67 0.4489 1.67 2.7889 -7.33 53.7289 72.33 357.7143 Sum of Squares Error = SSE = 169.3334 859 30 ANOVA Table So Far ANOVA Table Sources of Variations Treatments Error Total Sum of Squares SST 859 1284.96 Degrees of Freedoms Mean Square (Estimate of Variation) F 3 SST/3 = MST MST/47.72 18 859/18 = 47.72 21 Let’s go calculate this! 31 Calculation 4: Treatment Variation SST SS T otal- SSE Sum of Squares T reatment T reatmentVariation T heSum of theSqaure differences between each treatment mean and thegrand overall mean ANOVA Table Sources of Variations Treatments Error Total Sum of Squares 425.96 859 1284.96 Degrees of Freedoms Mean Square (Estimate of Variation) F 3 425.96/3 = 141.99 141.99/47.72 = 2.98 18 859/18 = 47.72 21 Simple Subtraction! 32 Calculation 5: Mean Square (Estimate of Variation) ANOVA Table Sources of Variations Treatments Error Total Sum of Squares 425.96 859 1284.96 Degrees of Freedoms Mean Square (Estimate of Variation) F 3 425.96/3 = 141.99 141.99/47.72 = 2.98 18 859/18 = 47.72 21 33 Calculation 6: F ANOVA Table Sources of Variations Treatments Error Total Sum of Squares 425.96 859 1284.96 Degrees of Freedoms Mean Square (Estimate of Variation) F 3 425.96/3 = 141.99 141.99/47.72 = 2.98 18 859/18 = 47.72 21 34 Step 5: Make A Decision Because 2.98 is less than 5.09, we fail to reject H0 The evidence suggests that the mean score of the students in each of the four categories are equal (no difference) 35 Summarize Chapter 12 1. List the characteristics of the F distribution 2. Conduct a test of hypothesis to determine whether the variances of two populations are equal 3. Discuss the general idea of analysis of variance 4. Organize data into a ANOVA table 5. Conduct a test of hypothesis among three or more treatment means 36