ANOVA ANOVA - What is it? Analysis of variance. A method for splitting the total variation of a data into meaningful components that measure different sources of variation One-Way Classification (equal samples) Assumption Random samples of size n are selected from each of the k populations. The k populations are independent and normally distributed with means 𝝁𝟏 , 𝝁𝟐 , ⋯ 𝝁𝒌 and common variance 𝝈𝟐 Ho: & Ha: Ho: 𝝁𝟏 = 𝝁𝟐 = ⋯ = 𝝁𝒌 Ha: at least two of the means are not equal Critical Region and ANOVA Table [equal samples] 𝑓 > 𝑓𝛼 [𝑘 − 1, 𝑘(𝑛 − 1)] Computational Formulas EXAMPLE A company has three manufacturing plants, and company officials want to determine whether there is a difference in the average age of workers at the three locations. The following data are the ages of five randomly selected workers at each plant. Perform a oneway ANOVA to determine whether there is a significant difference in the mean ages of the workers at the three plants. Use 0.01level of significance. Between Groups = Column Means Within Groups = Error EXAMPLE Between Groups = Column Means Within Groups = Error Critical Region and ANOVA Table [unequal samples] If the sample size for the k populations are𝑛1 , 𝑛2 , … , 𝑛𝑘 then the critical region is given 𝑓 > 𝑓𝛼 𝑘 − 1, 𝑁 − 𝑘 where 𝑘 𝑁= 𝑛𝑖 𝑖=1 Computational Formulas Example It is suspected that higher-priced automobiles are assembled with greater care than lowerpriced automobiles. To investigate whether there is any basis for this feeling, a large luxury model A, a medium-size sedan B, and a subcompact hatchback C were compared for defects when they arrived at the dealer’s showroom. All cars were manufactured by the same company. The number of defects for several of the three models are recorded. Test the hypothesis at 0.05 level of significance that the average number of defects is the same for the three models. A 4 7 6 6 TOTAL 23 MODEL B 5 1 3 5 3 4 21 C 8 6 8 9 5 36 80 EXAMPLE A milk company has four machines that fill gallon jugs with milk. The quality control manager is interested in determining whether the average fill for these machines is the same. The following data represent random samples of fill measures (in quarts) for 19 jugs of milk filled by the different machines. Use 𝛼 = 0.01 to test the hypotheses. Discuss the business implications of your findings. MACHINE 1 MACHINE 2 MACHINE 3 MACHINE 4 4.05 3.99 3.97 4 4.01 4.02 3.98 4.02 4.02 4.01 3.97 3.99 4.04 3.99 3.95 4.01 4 4 4 Tukey’s Honestly Significant Difference Test : (HSD) Equal Samples 𝐻𝑆𝐷 = 𝑞𝛼,𝑘,𝑘(𝑛−1) 𝑀𝑆𝐸 𝑛 Unequal Samples 𝐻𝑆𝐷 = 𝑞𝛼,𝑘,𝑘(𝑛−1) 𝑀𝑆𝐸 1 1 + 2 𝑛𝑟 𝑛𝑠 If 𝒙𝒓 − 𝒙𝒔 > 𝑯𝑺𝑫 then 𝝁𝒓 is is significantly different from 𝝁𝒔 Example (Milk) ROWS COLUMNS TOTAL MEANS 𝑇1. 𝑥1. 𝑇2. 𝑥2. ⋮ ⋮ ⋮ 𝑥𝑖𝑐 𝑇𝑖. 𝑥𝑖. ⋮ ⋮ ⋮ 𝑥𝑟. 1 2 ⋯ j ⋯ c 1 𝑥11 𝑥12 ⋯ 𝑥1𝑗 ⋯ 𝑥1𝑐 2 𝑥21 𝑥22 ⋯ ⋮ ⋮ ⋮ 𝑥𝑖1 𝑥𝑖2 ⋮ ⋮ ⋮ r 𝑥𝑟1 𝑥𝑟2 ⋯ 𝑥𝑟𝑗 ⋯ 𝑥𝑟𝑐 𝑇𝑟. TOTAL 𝑇.1 𝑇.2 ⋯ 𝑇.𝑗 ⋯ 𝑇.𝑐 𝑇.. MEAN 𝑥.1 𝑥.2 ⋯ 𝑥.𝑗 ⋯ 𝑥.𝑐 i ⋯ ⋮ ⋯ 𝑥𝑖𝑗 ⋯ ⋮ 𝑥.. Two-Way ANOVA (w/o replication) We wish to test the following hypotheses: Ho: The row means are all equal H1: The row means are significantly different Ho: The column means are all equal H1: The column means are significantly different Computational Formulas 𝑆𝑆𝑇 𝑟 𝑐 2 𝑥𝑖𝑗 = 𝑖=1 𝑗=1 𝑇..2 − 𝑟𝑐 1 𝑆𝑆𝐶 = 𝑟 1 𝑆𝑆𝑅 = 𝑐 𝑐 𝑇.𝑗2 𝑗=1 𝑇..2 − 𝑟𝑐 𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝐶 − 𝑆𝑆𝑅 𝑟 𝑇𝑖.2 𝑖=1 𝑇..2 − 𝑟𝑐 Example 4 The yields of three types of wheat using four different kinds of fertilizer were recorded and are shown on the next page: Test the hypothesis at the 0.05 level of significance that there is no difference in the average yield of wheat when different kinds of fertilizer are used. Also, test the hypothesis that there is no difference in the average yield of the three varieties of wheat. Example 4 Two-Way ANOVA (with Replication) We wish to test the following hypotheses: Ho: The row means are all equal H1: The row means are significantly different Ho: The column means are all equal H1: The column means are significantly different Ho: There is no significant interaction effect. H1: There is a significant interaction effect. Computational Formulas Example 5 Aside from testing the difference in the yields according to fertilizer and variety of wheat, try to determine if there is a significant interaction effect on the two variables, given the following data set. Use a 0.05 level of significance. Example 5 ANOVA : PLBautista