Factorial Experiments Analysis of Variance Experimental Design • Dependent variable Y • k Categorical independent variables A, B, C, … (the Factors) • Let – – – – a = the number of categories of A b = the number of categories of B c = the number of categories of C etc. The Completely Randomized Design • We form the set of all treatment combinations – the set of all combinations of the k factors • Total number of treatment combinations – t = abc…. • In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. – Total number of experimental units N = nt=nabc.. The treatment combinations can thought to be arranged in a k-dimensional rectangular block B 1 1 2 A a 2 b C B A Another way of representing the treatment combinations in a factorial experiment C B ... A ... D Example In this example we are examining the effect of The level of protein A (High or Low) and The source of protein B (Beef, Cereal, or Pork) on weight gains Y (grams) in rats. We have n = 10 test animals randomly assigned to k = 6 diets The k = 6 diets are the 6 = 3×2 Level-Source combinations 1. High - Beef 2. High - Cereal 3. High - Pork 4. Low - Beef 5. Low - Cereal 6. Low - Pork Table Gains in weight (grams) for rats under six diets differing in level of protein (High or Low) and s ource of protein (Beef, Cereal, or Pork) Level of Protein High Protein Low protein Source of Protein Beef Cereal Pork Beef Cereal Pork Diet 1 2 3 4 5 6 73 98 94 90 107 49 102 74 79 76 95 82 118 56 96 90 97 73 104 111 98 64 80 86 81 95 102 86 98 81 107 88 102 51 74 97 100 82 108 72 74 106 87 77 91 90 67 70 117 86 120 95 89 61 111 92 105 78 58 82 Mean 100.0 85.9 99.5 79.2 83.9 78.7 Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55 Example – Four factor experiment Four factors are studied for their effect on Y (luster of paint film). The four factors are: 1) Film Thickness - (1 or 2 mils) 2) Drying conditions (Regular or Special) 3) Length of wash (10,30,40 or 60 Minutes), and 4) Temperature of wash (92 ˚C or 100 ˚C) Two observations of film luster (Y) are taken for each treatment combination The data is tabulated below: Regular Dry Minutes 92 C 1-mil Thickness 20 3.4 3.4 30 4.1 4.1 40 4.9 4.2 60 5.0 4.9 2-mil Thickness 20 5.5 3.7 30 5.7 6.1 40 5.5 5.6 60 7.2 6.0 100 C 92C Special Dry 100 C 19.6 17.5 17.6 20.9 14.5 17.0 15.2 17.1 2.1 4.0 5.1 8.3 3.8 4.6 3.3 4.3 17.2 13.5 16.0 17.5 13.4 14.3 17.8 13.9 26.6 31.6 30.5 29.5 30.2 30.2 4.5 4.5 5.9 5.9 5.5 5.8 25.6 29.2 32.6 22.5 29.8 27.4 31.4 29.6 8.0 9.9 33.5 29.5 Notation Let the single observations be denoted by a single letter and a number of subscripts yijk…..l The number of subscripts is equal to: (the number of factors) + 1 1st subscript = level of first factor 2nd subscript = level of 2nd factor … Last subsrcript denotes different observations on the same treatment combination Notation for Means When averaging over one or several subscripts we put a “bar” above the letter and replace the subscripts by • Example: y241 • • Profile of a Factor Plot of observations means vs. levels of the factor. The levels of the other factors may be held constant or we may average over the other levels Definition: A factor is said to not affect the response if the profile of the factor is horizontal for all combinations of levels of the other factors: No change in the response when you change the levels of the factor (true for all combinations of levels of the other factors) Otherwise the factor is said to affect the response: Definition: • Two (or more) factors are said to interact if changes in the response when you change the level of one factor depend on the level(s) of the other factor(s). • Profiles of the factor for different levels of the other factor(s) are not parallel • Otherwise the factors are said to be additive . • Profiles of the factor for different levels of the other factor(s) are parallel. • If two (or more) factors interact each factor effects the response. • If two (or more) factors are additive it still remains to be determined if the factors affect the response • In factorial experiments we are interested in determining – which factors effect the response and – which groups of factors interact . Factor A has no effect 70 60 50 40 B 30 20 10 0 0 20 A 40 60 Additive Factors 70 60 50 40 30 20 B 10 0 0 20 A 40 60 Interacting Factors 70 60 50 40 30 B 20 10 0 0 20 A 40 60 The testing in factorial experiments 1. Test first the higher order interactions. 2. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact 3. The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response. Example: Diet Example Summary Table of Cell means Source of Protein Level of Protein Beef High 100.00 Low 79.20 Overall 89.60 Cereal 85.90 83.90 84.90 Pork Overall 99.50 95.13 78.70 80.60 89.10 87.87 Profiles of Weight Gain for Source and Level of Protein 110 High Protein Low Protein Overall Weight Gain 100 90 80 70 Beef Cereal Pork Profiles of Weight Gain for Source and Level of Protein 110 Beef Cereal Pork Weight Gain 100 Overall 90 80 70 High Protein Low Protein Models for factorial Experiments Single Factor: A – a levels Random error – Normal, mean 0, std-dev. s yij = m + ai + eij i = 1,2, ... ,a; j = 1,2, ... ,n mi mi themean of y when A i m ai Overall mean Effect on y of factor A when A = i a a i 1 i 0 1 observations Levels of A 2 3 a y11 y12 y13 y21 y22 y23 y31 y32 y33 ya1 ya2 ya3 y1n y2n y3n yan m1 m2 Normal dist’n Mean of observations Definitions m + a1 m + a2 m3 m + a3 1 a m overallmean m mi a i 1 ma m + aa 1 a a i (Effect when A i) mi m mi mi a i 1 Two Factor: A (a levels), B (b levels yijk = m + ai + bj+ (ab)ij + eijk mij i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,n mij themean of y when A i and B j m a i b j ab ij Overall mean Main effect of A a a i 1 i Main effect of B Interaction effect of A and B b a b j 1 i 1 j 1 0, b j 0, ab ij 0, ab ij 0 Table of Means Table of Effects – Overall mean, Main effects, Interaction Effects Three Factor: A (a levels), B (b levels), C (c levels) yijkl = m + ai + bj+ abij + gk + (ag)ik + (bg)jk+ abgijk + eijkl = m + ai + bj+ gk + abij + (agik + (bgjk + abgijk + eijkl Main effects Two factor Three factor Interaction Interactions Random error i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,c; l = 1,2, ... ,n a a i 1 i b c a c j 1 k 1 i 1 k 1 0, b j 0, g k 0, ab ij 0,, abg ijk 0 mijk = the mean of y when A = i, B = j, C = k = m + ai + bj+ gk + abij + (agik + (bgjk + abgijk Two factor Overall mean Main effects Three factor Interaction Interactions i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,c; l = 1,2, ... ,n a a i 1 i b c a c j 1 k 1 i 1 k 1 0, b j 0, g k 0, ab ij 0,, abg ijk 0 No interaction Levels of C Levels of B Levels of B Levels of A Levels of A A, B interact, No interaction with C Levels of C Levels of B Levels of B Levels of A Levels of A A, B, C interact Levels of C Levels of B Levels of B Levels of A Levels of A Four Factor: yijklm = m + ai + bj+ (ab)ij + gk + (ag)ik + (bg)jk+ (abg)ijk + dl+ (ad)il + (bd)jl+ (abd)ijl + (gd)kl + (agd)ikl + (bgd)jkl+ (abgd)ijkl + eijklm Overall mean =m Two factor Main effects +ai + bj+ gk + dl Interactions + (ab)ij + (ag)ik + (bg)jk + (ad)il + (bd)jl+ (gd)kl +(abg)ijk+ (abd)ijl + (agd)ikl + (bgd)jkl Three factor Interactions + (abgd)ijkl + eijklm Four factor Interaction Random error i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,c; l = 1,2, ... ,d; m = 1,2, ... ,n where 0 = S ai = S bj= S (ab)ij S gk = S (ag)ik = S(bg)jk= S (abg)ijk = S dl= S (ad)il = S (bd)jl = S (abd)ijl = S (gd)kl = S (agd)ikl = S (bgd)jkl = S (abgd)ijkl and S denotes the summation over any of the subscripts. Estimation of Main Effects and Interactions • Estimator of Main effect of a Factor = Mean at level i of the factor - Overall Mean • Estimator of k-factor interaction effect at a combination of levels of the k factors = Mean at the combination of levels of the k factors - sum of all means at k-1 combinations of levels of the k factors +sum of all means at k-2 combinations of levels of the k factors - etc. Example: • The main effect of factor B at level j in a four factor (A,B,C and D) experiment is estimated by: bˆ j y j y • The two-factor interaction effect between factors B and C when B is at level j and C is at level k is estimated by: bg jk y jk y j yk y • The three-factor interaction effect between factors B, C and D when B is at level j, C is at level k and D is at level l is estimated by: bgd jkl y jkl y jk y jl ykl y j yk yl y • Finally the four-factor interaction effect between factors A,B, C and when A is at level i, B is at level j, C is at level k and D is at level l is estimated by: abgdijkl yijkl yijk yijl yikl y jkl yij yik yil y jk y jl ykl yi y j yk yl y Anova Table entries • Sum of squares interaction (or main) effects being tested = (product of sample size and levels of factors not included in the interaction) × (Sum of squares of effects being tested) • Degrees of freedom = df = product of (number of levels - 1) of factors included in the interaction. Analysis of Variance (ANOVA) Table Entries (Two factors – A and B) a SSA nb aˆ i2 i 1 b SSB na bˆ j2 j 1 a b SSAB n ab ij 2 i 1 j 1 SSError yijk yij a b n i 1 j 1 k 1 2 The ANOVA Table Analysis of Variance (ANOVA) Table Entries (Three factors – A, B and C) a SSA nbc aˆ i 1 a b SSB nac bˆ 2 i j 1 SSAB nc ab i 1 j 1 c SSC nab gˆk2 b a 2 j c k 1 b SSAC nb ag 2 ij i 1 k 1 a b c j 1 k 1 SSABC n abg ijk 2 i 1 j 1 k 1 SSError yijkl yijk a b c n i 1 j 1 k 1 l 1 c SSBC na bg jk 2 ik 2 2 The ANOVA Table Source SS df A B C AB AC BC SS A SS B SS C SS AB SS AC SSBC SSABC SS Error a-1 b-1 c-1 (a-1)(b-1) (a-1)(c-1) (b-1)(c-1) ABC Error (a-1)(b-1)(c-1) abc(n-1) • The Completely Randomized Design is called balanced • If the number of observations per treatment combination is unequal the design is called unbalanced. (resulting mathematically more complex analysis and computations) • If for some of the treatment combinations there are no observations the design is called incomplete. (some of the parameters - main effects and interactions - cannot be estimated.) Example: Diet example Mean mˆ y = 87.867 Main Effects for Factor A (Source of Protein) aˆ i yi y Beef 1.733 Cereal -2.967 Pork 1.233 Main Effects for Factor B (Level of Protein) bˆ j y j y High 7.267 Low -7.267 AB Interaction Effects ab ij yij-yi-y j y Source of Protein Beef Cereal Pork Level High 3.133 -6.267 3.133 of Protein Low -3.133 6.267 -3.133 Example 2 Paint Luster Experiment Table: Means and Cell Frequencies Means and Frequencies for the AB Interaction (Temp - Drying) Profiles showing Temp-Dry Interaction 25 Regular Dry 20 Special Dry Luster Overall 15 10 5 0 92 100 Te mpe rature Means and Frequencies for the AD Interaction (Temp- Thickness) Profiles showing Temp-Thickness Interaction 30 1-mil 25 2-mil Overall Luster 20 15 10 5 0 92 100 Te mpe rature The Main Effect of C (Length) Profile of Effect of Length on Luster 16 Luster 15 14 13 12 10 20 30 40 Le ngth 50 60 70 Factorial Experiments Analysis of Variance Experimental Design • Dependent variable Y • k Categorical independent variables A, B, C, … (the Factors) • Let – – – – a = the number of categories of A b = the number of categories of B c = the number of categories of C etc. Objectives • Determine which factors have some effect on the response • Which groups of factors interact The Completely Randomized Design • We form the set of all treatment combinations – the set of all combinations of the k factors • Total number of treatment combinations – t = abc…. • In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. – Total number of experimental units N = nt=nabc.. Factor A has no effect 70 60 50 40 B 30 20 10 0 0 20 A 40 60 Additive Factors 70 60 50 40 30 20 B 10 0 0 20 A 40 60 Interacting Factors 70 60 50 40 30 B 20 10 0 0 20 A 40 60 The testing in factorial experiments 1. Test first the higher order interactions. 2. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact 3. The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response. Anova table for the 3 factor Experiment Source SS df MS F A SSA a-1 MSA MSA/MSError B SSB b-1 MSB MSB/MSError C SSC c-1 MSC MSC/MSError AB SSAB (a - 1)(b - 1) MSAB MSAB/MSError AC SSAC (a - 1)(c - 1) MSAC MSAC/MSError BC SSBC (b - 1)(c - 1) MSBC MSBC/MSError ABC SSABC (a - 1)(b - 1)(c - 1) MSABC MSABC/MSError Error SSError abc(n - 1) MSError p -value Sum of squares entries a a SSA nbcaˆ nbc yi y i 1 2 i 2 i 1 Similar expressions for SSB , and SSC. SSAB ncab nc yij yi y j y a i 1 a 2 ij b i 1 j 1 Similar expressions for SSBC , and SSAC. 2 Sum of squares entries a 2 SSABC nabgikj i 1 a b n yijk yij yik y jk yi c y j yk yi i 1 j 1 k 1 2 Finally SSError yijkl yijk a b c n i 1 j 1 k 1 l 1 2 The statistical model for the 3 factor Experiment yijk/ m ai b j g k mean effect main effects ab ij ag ik bg jk abg ijk 2 factor interactions 3 factor interaction e ijk/ random error Anova table for the 3 factor Experiment Source SS df MS F A SSA a-1 MSA MSA/MSError B SSB b-1 MSB MSB/MSError C SSC c-1 MSC MSC/MSError AB SSAB (a - 1)(b - 1) MSAB MSAB/MSError AC SSAC (a - 1)(c - 1) MSAC MSAC/MSError BC SSBC (b - 1)(c - 1) MSBC MSBC/MSError ABC SSABC (a - 1)(b - 1)(c - 1) MSABC MSABC/MSError Error SSError abc(n - 1) MSError p -value The testing in factorial experiments 1. Test first the higher order interactions. 2. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact 3. The testing continues with lower order interactions and main effects for factors which have not yet been determined to affect the response. Examples Using SPSS Example In this example we are examining the effect of • the level of protein A (High or Low) and • the source of protein B (Beef, Cereal, or Pork) on weight gains (grams) in rats. We have n = 10 test animals randomly assigned to k = 6 diets The k = 6 diets are the 6 = 3×2 Level-Source combinations 1. High - Beef 2. High - Cereal 3. High - Pork 4. Low - Beef 5. Low - Cereal 6. Low - Pork Table Gains in weight (grams) for rats under six diets differing in level of protein (High or Low) and s ource of protein (Beef, Cereal, or Pork) Level of Protein High Protein Low protein Source of Protein Beef Cereal Pork Beef Cereal Pork Diet 1 2 3 4 5 6 73 98 94 90 107 49 102 74 79 76 95 82 118 56 96 90 97 73 104 111 98 64 80 86 81 95 102 86 98 81 107 88 102 51 74 97 100 82 108 72 74 106 87 77 91 90 67 70 117 86 120 95 89 61 111 92 105 78 58 82 Mean 100.0 85.9 99.5 79.2 83.9 78.7 Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55 The data as it appears in SPSS To perform ANOVA select Analyze->General Linear Model-> Univariate The following dialog box appears Select the dependent variable and the fixed factors Press OK to perform the Analysis The Output Tests of Between-Subjects Effects Dependent Variable: WTGN Type III Sum of Source Squares Corrected Model 4612.933 a Intercept 463233.1 SOURCE 266.533 LEVEL 3168.267 SOURCE * LEVEL 1178.133 Error 11586.000 Total 479432.0 Corrected Total 16198.933 df 5 1 2 1 2 54 60 59 Mean Square 922.587 463233.1 133.267 3168.267 589.067 214.556 a. R Squared = .285 (Adjusted R Squared = .219) F 4.300 2159.036 .621 14.767 2.746 Sig. .002 .000 .541 .000 .073 Example – Four factor experiment Four factors are studied for their effect on Y (luster of paint film). The four factors are: 1) Film Thickness - (1 or 2 mils) 2) Drying conditions (Regular or Special) 3) Length of wash (10,30,40 or 60 Minutes), and 4) Temperature of wash (92 ˚C or 100 ˚C) Two observations of film luster (Y) are taken for each treatment combination The data is tabulated below: Regular Dry Minutes 92 C 1-mil Thickness 20 3.4 3.4 30 4.1 4.1 40 4.9 4.2 60 5.0 4.9 2-mil Thickness 20 5.5 3.7 30 5.7 6.1 40 5.5 5.6 60 7.2 6.0 100 C 92C Special Dry 100 C 19.6 17.5 17.6 20.9 14.5 17.0 15.2 17.1 2.1 4.0 5.1 8.3 3.8 4.6 3.3 4.3 17.2 13.5 16.0 17.5 13.4 14.3 17.8 13.9 26.6 31.6 30.5 31.4 29.5 30.2 30.2 29.6 4.5 5.9 5.5 8.0 4.5 5.9 5.8 9.9 25.6 29.2 32.6 33.5 22.5 29.8 27.4 29.5 The Data as it appears in SPSS The dialog box for performing ANOVA The output Tests of Between-Subjects Effects Dependent Variable: LUSTRE Type III Sum of Source Squares Corrected Model 6548.020 a Intercept 12586.035 TEMP 5039.225 COND 5.700 LENGTH 70.285 THICK 844.629 TEMP * COND 15.504 TEMP * LENGTH 3.155 COND * LENGTH 9.890 TEMP * COND * LENGTH 6.422 TEMP * THICK 511.325 COND * THICK 1.410 TEMP * COND * THICK .150 LENGTH * THICK 15.642 TEMP * LENGTH * THICK 11.520 COND * LENGTH * 7.320 THICK TEMP * COND * LENGTH 5.840 * THICK Error 87.995 Total 19222.050 Corrected Total 6636.015 31 1 1 1 3 1 1 3 3 3 1 1 1 3 3 Mean Square 211.226 12586.035 5039.225 5.700 23.428 844.629 15.504 1.052 3.297 2.141 511.325 1.410 .150 5.214 3.840 F 76.814 4577.000 1832.550 2.073 8.520 307.155 5.638 .383 1.199 .778 185.947 .513 .055 1.896 1.396 Sig. .000 .000 .000 .160 .000 .000 .024 .766 .326 .515 .000 .479 .817 .150 .262 3 2.440 .887 .458 3 1.947 .708 .554 32 64 63 2.750 df a. R Squared = .987 (Adjusted R Squared = .974) Random Effects and Fixed Effects Factors • So far the factors that we have considered are fixed effects factors • This is the case if the levels of the factor are a fixed set of levels and the conclusions of any analysis is in relationship to these levels. • If the levels have been selected at random from a population of levels the factor is called a random effects factor • The conclusions of the analysis will be directed at the population of levels and not only the levels selected for the experiment Example - Fixed Effects Source of Protein, Level of Protein, Weight Gain Dependent – Weight Gain Independent – Source of Protein, • Beef • Cereal • Pork – Level of Protein, • High • Low Example - Random Effects In this Example a Taxi company is interested in comparing the effects of three brands of tires (A, B and C) on mileage (mpg). Mileage will also be effected by driver. The company selects b = 4 drivers at random from its collection of drivers. Each driver has n = 3 opportunities to use each brand of tire in which mileage is measured. Dependent – Mileage Independent – Tire brand (A, B, C), • Fixed Effect Factor – Driver (1, 2, 3, 4), • Random Effects factor The Model for the fixed effects experiment yijk m ai b j ab ij e ijk where m, a1, a2, a3, b1, b2, (ab)11 , (ab)21 , (ab)31 , (ab)12 , (ab)22 , (ab)32 , are fixed unknown constants And eijk is random, normally distributed with mean 0 and variance s2. Note: a n a b a b ab ab i 1 i j 1 j i 1 ij j 1 ij 0 The Model for the case when factor B is a random effects factor yijk m ai b j ab ij e ijk where m, a1, a2, a3, are fixed unknown constants And eijk is random, normally distributed with mean 0 and variance s2. bj is normal with mean 0 and variance s B2 and 2 (ab)ij is normal with mean 0 and variance s AB a Note: a i 1 i 0 This model is called a variance components model The Anova table for the two factor model yijk m ai b j ab ij e ijk Source SS df a -1 A SSA b-1 B SSA AB SSAB (a -1)(b -1) Error SSError ab(n – 1) MS SSA/(a – 1) SSB/(a – 1) SSAB/(a – 1) (a – 1) SSError/ab(n – 1) The Anova table for the two factor model (A, B – fixed) yijk m ai b j ab ij e ijk Source SS df MS EMS F nb a 2 s ai a 1 i 1 MSA/MSError A SSA a -1 MSA B SSA b-1 MSB AB SSAB (a -1)(b -1) MSAB Error SSError ab(n – 1) MSError 2 s2 na b 2 bj b 1 j 1 a b n ab ij2 s a 1b 1 i 1 j 1 2 EMS = Expected Mean Square s2 MSB/MSError MSAB/MSError The Anova table for the two factor model (A – fixed, B - random) yijk m ai b j ab ij e ijk Source SS df MS A SSA a -1 MSA B SSA b-1 MSB AB SSAB (a -1)(b -1) MSAB Error SSError ab(n – 1) MSError EMS s ns 2 2 AB nb a 2 ai a 1 i 1 s 2 nas B2 2 s 2 ns AB s2 Note: The divisor for testing the main effects of A is no longer MSError but MSAB. F MSA/MSAB MSB/MSError MSAB/MSError Rules for determining Expected Mean Squares (EMS) in an Anova Table Both fixed and random effects Formulated by Schultz[1] 1. Schultz E. F., Jr. “Rules of Thumb for Determining Expectations of Mean Squares in Analysis of Variance,”Biometrics, Vol 11, 1955, 123-48. 1. The EMS for Error is s2. 2. The EMS for each ANOVA term contains two or more terms the first of which is s2. 3. All other terms in each EMS contain both coefficients and subscripts (the total number of letters being one more than the number of factors) (if number of factors is k = 3, then the number of letters is 4) 4. The subscript of s2 in the last term of each EMS is the same as the treatment designation. 5. The subscripts of all s2 other than the first contain the treatment designation. These are written with the combination involving the most letters written first and ending with the treatment designation. 6. When a capital letter is omitted from a subscript , the corresponding small letter appears in the coefficient. 7. For each EMS in the table ignore the letter or letters that designate the effect. If any of the remaining letters designate a fixed effect, delete that term from the EMS. 8. Replace s2 whose subscripts are composed entirely of fixed effects by the appropriate sum. a s 2 A a by i 1 a 1 a s 2 AB 2 i ab by i 1 2 ij a 1 b 1 Example: 3 factors A, B, C – all are random effects Source A B C AB AC BC ABC Error EMS F 2 2 2 s 2 ns ABC ncs AB nbs AC nbcs A2 2 2 2 s 2 ns ABC ncs AB nas BC nacs B2 2 2 2 s 2 ns ABC nas BC nbs AC nabs C2 2 2 s 2 ns ABC ncs AB MS AB MS ABC 2 2 s 2 ns ABC nbs AC MS AC MS ABC 2 2 s 2 ns ABC nas BC MSBC MS ABC 2 s 2 ns ABC s2 MS ABC MSError Example: 3 factors A fixed, B, C random Source A B C AB AC BC ABC Error EMS s ns 2 2 ABC ncs 2 AB nbs s nas 2 2 BC F a 2 AC nbc ai2 a 1 i 1 nacs B2 MSB MSBC 2 s 2 nas BC nabs C2 MSC MSBC 2 2 s 2 ns ABC ncs AB MS AB MS ABC 2 2 s 2 ns ABC nbs AC MS AC MS ABC 2 s 2 nas BC MSBC MSError 2 s 2 ns ABC s2 MS ABC MSError Example: 3 factors A , B fixed, C random Source A B C AB AC BC ABC Error EMS F a s nbs 2 AC nbc a i2 a 1 MS A MS AC s nas 2 BC nac b j2 b 1 MSB MSBC 2 2 i 1 a i 1 s 2 nabs C2 s ns 2 a 2 ABC b nc ab ij 2 i 1 j 1 MSC MSError a 1b 1 MS AB MS ABC 2 s 2 nbs AC MS AC MSError 2 s 2 nas BC MSBC MSError 2 s 2 ns ABC s2 MS ABC MSError Example: 3 factors A , B and C fixed Source A B C AB AC BC ABC Error EMS a F s nbc a i2 a 1 MS A MSError s nac b j2 b 1 MSB MSError s 2 nbc g k2 c 1 MSC MSError 2 i 1 a 2 i 1 c k 1 a b s nc ab ij a 1b 1 2 2 i 1 j 1 a c s 2 nb ag ij a 1 c 1 2 i 1 k 1 b c s 2 na bg ij a b 2 j 1 k 1 c s 2 n abg ijk i 1 j 1 k 1 s2 2 MS AB MSError MS AC MSError b 1 c 1 MSBC MSError a 1b 1 c 1 MS ABC MSError Example - Random Effects In this Example a Taxi company is interested in comparing the effects of three brands of tires (A, B and C) on mileage (mpg). Mileage will also be effected by driver. The company selects at random b = 4 drivers at random from its collection of drivers. Each driver has n = 3 opportunities to use each brand of tire in which mileage is measured. Dependent – Mileage Independent – Tire brand (A, B, C), • Fixed Effect Factor – Driver (1, 2, 3, 4), • Random Effects factor The Data Driver 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 Tire A A A B B B C C C A A A B B B C C C Mileage 39.6 38.6 41.9 18.1 20.4 19 31.1 29.8 26.6 38.1 35.4 38.8 18.2 14 15.6 30.2 27.9 27.2 Driver 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 Tire A A A B B B C C C A A A B B B C C C Mileage 33.9 43.2 41.3 17.8 21.3 22.3 31.3 28.7 29.7 36.9 30.3 35 17.8 21.2 24.3 27.4 26.6 21 Asking SPSS to perform Univariate ANOVA Select the dependent variable, fixed factors, random factors The Output Tests of Between-Subj ects Effects Dependent Variable: MILEAGE Source Intercept TIRE DRIVER TIRE * DRIVER Hypothesis Error Hypothesis Error Hypothesis Error Hypothesis Error Type III Sum of Squares 28928.340 68.290 2072.931 87.129 68.290 87.129 87.129 170.940 df 1 3 2 6 3 6 6 24 Mean Square 28928.340 22.763a 1036.465 14.522b 22.763 14.522b 14.522 7.123c F 1270.836 Sig . .000 71.374 .000 1.568 .292 2.039 .099 a. MS(DRIVER) b. MS(TIRE * DRIVER) c. MS(Error) The divisor for both the fixed and the random main effect is MSAB This is contrary to the advice of some texts The Anova table for the two factor model (A – fixed, B - random) yijk m ai b j ab ij e ijk Source SS df MS EMS A SSA a -1 MSA B SSA b-1 MSB s 2 nas B2 MSB/MSError AB SSAB (a -1)(b -1) MSAB 2 s 2 ns AB MSAB/MSError Error SSError ab(n – 1) MSError s2 2 s 2 ns AB nb a 2 ai a 1 i 1 F MSA/MSAB Note: The divisor for testing the main effects of A is no longer MSError but MSAB. References Guenther, W. C. “Analysis of Variance” Prentice Hall, 1964 The Anova table for the two factor model (A – fixed, B - random) yijk m ai b j ab ij e ijk Source SS df MS EMS A SSA a -1 MSA B SSA b-1 MSB 2 s 2 ns AB nas B2 MSB/MSAB AB SSAB (a -1)(b -1) MSAB 2 s 2 ns AB MSAB/MSError Error SSError ab(n – 1) MSError s2 2 s 2 ns AB nb a 2 ai a 1 i 1 F MSA/MSAB Note: In this case the divisor for testing the main effects of A is MSAB . This is the approach used by SPSS. References Searle “Linear Models” John Wiley, 1964 Crossed and Nested Factors The factors A, B are called crossed if every level of A appears with every level of B in the treatment combinations. Levels of B Levels of A Factor B is said to be nested within factor A if the levels of B differ for each level of A. Levels of A Levels of B Example: A company has a = 4 plants for producing paper. Each plant has 6 machines for producing the paper. The company is interested in how paper strength (Y) differs from plant to plant and from machine to machine within plant Plants Machines Machines (B) are nested within plants (A) The model for a two factor experiment with B nested within A. yijk m overall mean ai effect of factor A b a j i e ijk effect of B within A random error The ANOVA table Source SS df MS F A SSA a-1 MSA MSA/MSError B(A) SSB(A) a(b – 1) MSB(A) MSB(A) /MSError Error SSError ab(n – 1) MSError p - value Note: SSB(A ) = SSB + SSAB and a(b – 1) = (b – 1) + (a - 1)(b – 1) Example: A company has a = 4 plants for producing paper. Each plant has 6 machines for producing the paper. The company is interested in how paper strength (Y) differs from plant to plant and from machine to machine within plant. Also we have n = 5 measurements of paper strength for each of the 24 machines The Data Plant machine Plant machine 1 1 2 3 4 5 98.7 59.2 84.1 72.3 83.5 93.1 87.8 86.3 110.3 89.3 100.0 84.1 83.4 81.6 86.1 3 13 14 15 16 17 83.6 76.1 64.2 69.2 77.4 84.6 55.4 58.4 86.7 63.3 90.6 92.3 75.4 60.8 76.6 2 6 7 60.6 33.6 84.8 48.2 83.6 68.9 8 44.8 57.3 66.5 9 58.9 51.6 45.2 10 63.9 62.3 61.1 11 63.7 54.6 55.3 12 48.1 50.6 39.9 22 37.0 47.8 41.0 23 43.8 62.4 60.8 24 30.0 43.0 56.9 4 18 19 61.0 64.2 81.3 50.3 73.8 32.1 20 35.5 30.8 36.3 21 46.9 43.1 40.8 Anova Table Treating Factors (Plant, Machine) as crossed Tests of Between-Subjects Effects Dependent Variable: STRENGTH Type III Sum of Source Squares Corrected Model 21031.065 a Intercept 298531.4 PLANT 18174.761 MACHINE 1238.379 PLANT * MACHINE 1617.925 Error 5505.469 Total 325067.9 Corrected Total 26536.534 df 23 1 3 5 15 48 72 71 Mean Square 914.394 298531.4 6058.254 247.676 107.862 114.697 a. R Squared = .793 (Adjusted R Squared = .693) F 7.972 2602.776 52.820 2.159 .940 Sig. .000 .000 .000 .074 .528 Anova Table: Two factor experiment B(machine) nested in A (plant) Source Plant Machine(Plant) Error Sum of Squares 18174.76119 2856.303672 5505.469467 df Mean Square F 3 6058.253731 52.819506 20 142.8151836 1.2451488 48 114.6972806 p - value 0.00000 0.26171 Analysis of Variance Factorial Experiments • Dependent variable Y • k Categorical independent variables A, B, C, … (the Factors) • Let – – – – a = the number of categories of A b = the number of categories of B c = the number of categories of C etc. The Completely Randomized Design • We form the set of all treatment combinations – the set of all combinations of the k factors • Total number of treatment combinations – t = abc…. • In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. – Total number of experimental units N = nt=nabc.. Random Effects and Fixed Effects Factors fixed effects factors • he levels of the factor are a fixed set of levels and the conclusions of any analysis is in relationship to these levels. random effects factor • If the levels have been selected at random from a population of levels. • The conclusions of the analysis will be directed at the population of levels and not only the levels selected for the experiment Example: 3 factors A, B, C – all are random effects Source A B C AB AC BC ABC Error EMS F 2 2 2 s 2 ns ABC ncs AB nbs AC nbcs A2 2 2 2 s 2 ns ABC ncs AB nas BC nacs B2 2 2 2 s 2 ns ABC nas BC nbs AC nabs C2 2 2 s 2 ns ABC ncs AB MS AB MS ABC 2 2 s 2 ns ABC nbs AC MS AC MS ABC 2 2 s 2 ns ABC nas BC MSBC MS ABC 2 s 2 ns ABC s2 MS ABC MSError Example: 3 factors A fixed, B, C random Source A B C AB AC BC ABC Error EMS s ns 2 2 ABC ncs 2 AB nbs s nas 2 2 BC F a 2 AC nbc ai2 a 1 i 1 nacs B2 MSB MSBC 2 s 2 nas BC nabs C2 MSC MSBC 2 2 s 2 ns ABC ncs AB MS AB MS ABC 2 2 s 2 ns ABC nbs AC MS AC MS ABC 2 s 2 nas BC MSBC MSError 2 s 2 ns ABC s2 MS ABC MSError Example: 3 factors A , B fixed, C random Source A B C AB AC BC ABC Error EMS F a s nbs 2 AC nbc a i2 a 1 MS A MS AC s nas 2 BC nac b j2 b 1 MSB MSBC 2 2 i 1 a i 1 s 2 nabs C2 s ns 2 a 2 ABC b nc ab ij 2 i 1 j 1 MSC MSError a 1b 1 MS AB MS ABC 2 s 2 nbs AC MS AC MSError 2 s 2 nas BC MSBC MSError 2 s 2 ns ABC s2 MS ABC MSError Example: 3 factors A , B and C fixed Source A B C AB AC BC ABC Error EMS a F s nbc a i2 a 1 MS A MSError s nac b j2 b 1 MSB MSError s 2 nbc g k2 c 1 MSC MSError 2 i 1 a 2 i 1 c k 1 a b s nc ab ij a 1b 1 2 2 i 1 j 1 a c s 2 nb ag ij a 1 c 1 2 i 1 k 1 b c s 2 na bg ij a b 2 j 1 k 1 c s 2 n abg ijk i 1 j 1 k 1 s2 2 MS AB MSError MS AC MSError b 1 c 1 MSBC MSError a 1b 1 c 1 MS ABC MSError Crossed and Nested Factors Factor B is said to be nested within factor A if the levels of B differ for each level of A. Levels of A Levels of B The Analysis of Covariance ANACOVA Multiple Regression 1. Dependent variable Y (continuous) 2. Continuous independent variables X1, X2, …, Xp The continuous independent variables X1, X2, …, Xp are quite often measured and observed (not set at specific values or levels) Analysis of Variance 1. Dependent variable Y (continuous) 2. Categorical independent variables (Factors) A, B, C,… The categorical independent variables A, B, C,… are set at specific values or levels. Analysis of Covariance 1. Dependent variable Y (continuous) 2. Categorical independent variables (Factors) A, B, C,… 3. Continuous independent variables (covariates) X1, X2, …, Xp Example 1. Dependent variable Y – weight gain 2. Categorical independent variables (Factors) i. A = level of protein in the diet (High, Low) ii. B = source of protein (Beef, Cereal, Pork) 3. Continuous independent variables (covariates) i. X1= initial wt. of animal. Dependent variable is continuous Statistical Technique Multiple Regression ANOVA ANACOVA Independent variables continuous categorical × × × × It is possible to treat categorical independent variables in Multiple Regression using Dummy variables. The Multiple Regression Model Y b0 b1 X1 bp X p e The ANOVA Model Y m ai b j Main Effects ab ij Interactions e The ANACOVA Model Y m ai b j Main Effects ab ij Interactions g 1 X1 g 1 X1 Covariate Effects e ANOVA Tables The Multiple Regression Model Source S.S. d.f. Regression SSReg p Error SSError n–p-1 Total SSTotal n-1 The ANOVA Model Source S.S. d.f. A SSA a-1 B SSB b-1 SSAB (a – 1)(b – 1) Main Effects Interactions AB ⁞ Error SSError n–p-1 Total SSTotal n-1 The ANACOVA Model Source S.S. d.f. Covariates SSCovaraites p A SSA a-1 B SSB b-1 SSAB (a – 1)(b – 1) Main Effects Interactions AB ⁞ Error SSError n–p-1 Total SSTotal n-1 Example 1. Dependent variable Y – weight gain 2. Categorical independent variables (Factors) i. A = level of protein in the diet (High, Low) ii. B = source of protein (Beef, Cereal, Pork) 3. Continuous independent variables (covariates) X = initial wt. of animal. The data wtgn 112 126 88 97 91 78 86 83 108 104 42 93 102 77 85 88 82 41 63 88 104 114 78 111 109 115 47 124 80 97 initial wt 1031 1087 890 1089 894 917 972 899 821 846 1041 1108 1132 1023 1090 921 909 1091 838 935 1098 888 1000 993 1043 992 834 1005 905 1059 Level High High High High High High High High High High High High High High High High High High High High High High High High High High High High High High Source Beef Beef Beef Beef Beef Beef Beef Beef Beef Beef Cereal Cereal Cereal Cereal Cereal Cereal Cereal Cereal Cereal Cereal Pork Pork Pork Pork Pork Pork Pork Pork Pork Pork wtgn 56 86 78 69 76 65 60 80 78 41 68 67 71 76 85 37 119 91 51 57 96 67 85 17 67 54 105 64 92 62 initial wt 1044 1025 878 1193 1024 1078 965 958 1135 847 986 1003 968 1035 1018 882 1053 978 1057 1035 965 1025 970 836 961 931 1017 845 1092 932 Level Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Low Source Beef Beef Beef Beef Beef Beef Beef Beef Beef Beef Cereal Cereal Cereal Cereal Cereal Cereal Cereal Cereal Cereal Cereal Pork Pork Pork Pork Pork Pork Pork Pork Pork Pork The ANOVA Table Source Initial (Covariate) LEVEL SOURCE LEVEL * SOURCE Error Total Sum of Squares 3357.8165 6523.4815 2013.6469 2528.0163 19609.4835 31966.8500 df 1 1 2 2 53 59 Mean Square 3357.82 6523.48 1006.82 1264.01 369.99 F 9.075 17.631 2.721 3.416 Sig. 0.00397 0.0001 0.07499 0.04022 Using SPSS to perform ANACOVA The data file Select Analyze->General Linear Model -> Univariate Choose the Dependent Variable, the Fixed Factor(s) and the Covaraites The following ANOVA table appears Tests of Between-Subjects Effects Dependent Variable: WTGN Source Corrected Model Intercept INITIAL LEVEL SOURCE LEVEL * SOURCE Error Total Corrected Total Type III Sum of Squares 12357.366a 24.883 3357.816 6523.482 2013.647 2528.016 19609.484 421265.0 31966.850 df 6 1 1 1 2 2 53 60 59 Mean Square 2059.561 24.883 3357.816 6523.482 1006.823 1264.008 369.990 a. R Squared = .387 (Adjusted R Sq uared = .317) F 5.567 .067 9.075 17.631 2.721 3.416 Sig . .000 .796 .004 .000 .075 .040 The Process of Analysis of Covariance 140 Dependent variable 120 100 80 60 40 700 800 900 1000 1100 Covariate 1200 1300 1400 The Process of Analysis of Covariance 140 Adjusted Dependent variable 120 100 80 60 40 700 800 900 1000 1100 Covariate 1200 1300 1400 • The dependent variable (Y) is adjusted so that the covariate takes on its average value for each case • The effect of the factors ( A, B, etc) are determined using the adjusted value of the dependent variable. • ANOVA and ANACOVA can be handled by Multiple Regression Package by the use of Dummy variables to handle the categorical independent variables. • The results would be the same. Analysis of unbalanced Factorial Designs Type I, Type II, Type III Sum of Squares Sum of squares for testing an effect SS Effect RSS modelReduced RSS modelComplete modelComplete ≡ model with the effect in. modelReduced ≡ model with the effect out. Type I SS • Type I estimates of the sum of squares associated with an effect in a model are calculated when sums of squares for a model are calculated sequentially Example • Consider the three factor factorial experiment with factors A, B and C. The Complete model • Y = m + A + B + C + AB + AC + BC + ABC A sequence of increasingly simpler models 1. Y = m + A + B + C + AB + AC + BC + ABC 2. Y = m + A+ B + C + AB + AC + BC 3. Y = m + A + B+ C + AB + AC 4. Y = m + A + B + C+ AB 5. Y = m + A + B + C 6. Y = m + A + B 7. Y = m + A 8. Y = m Type I S.S. SS I ABC RSS model2 RSS model1 I SSBC RSS model3 RSS model2 SS I AC RSS model4 RSS model3 I SSAB RSS model5 RSS model4 SSCI RSS model6 RSS model5 SS RSS model7 RSS model6 I B SS RSS model8 RSS model7 I A Type II SS • Type two sum of squares are calculated for an effect assuming that the Complete model contains every effect of equal or lesser order. The reduced model has the effect removed , The Complete models 1. Y = m + A + B + C + AB + AC + BC + ABC (the three factor model) 2. Y = m + A+ B + C + AB + AC + BC (the all two factor model) 3. Y = m + A + B + C (the all main effects model) The Reduced models For a k-factor effect the reduced model is the all k-factor model with the effect removed II SSABC RSS model2 RSS model1 II SSAB RSS Y m A B C AC BC RSS model2 II SSAC RSS Y m A B C AB BC RSS model2 II SSBC RSS Y m A B C AB AC RSS model2 SSAII RSS Y m B C RSS model3 SSBII RSS Y m A C RSS model3 SSCII RSS Y m A B RSS model3 Type III SS • The type III sum of squares is calculated by comparing the full model, to the full model without the effect. Comments • When using The type I sum of squares the effects are tested in a specified sequence resulting in a increasingly simpler model. The test is valid only the null Hypothesis (H0) has been accepted in the previous tests. • When using The type II sum of squares the test for a k-factor effect is valid only the all kfactor model can be assumed. • When using The type III sum of squares the tests require neither of these assumptions. An additional Comment • When the completely randomized design is balanced (equal number of observations per treatment combination) then type I sum of squares, type II sum of squares and type III sum of squares are equal. Example • A two factor (A and B) experiment, response variable y. • The SPSS data file Using ANOVA SPSS package Select the type of SS using model ANOVA table – type I S.S Tests of Between-Subjects Effects Dependent Variable: Y Ty pe I Sum Source of Squares Correc ted Model 11545. 858a Intercept 61603. 201 A 3666.552 B 809.019 A*B 7070.287 Error 760.361 Total 73909. 420 Correc ted Tot al 12306. 219 df 8 1 2 2 4 24 33 32 Mean Square 1443.232 61603. 201 1833.276 404.509 1767.572 31.682 a. R Squared = .938 (Adjusted R Squared = .918) F 45.554 1944.440 57.865 12.768 55.792 Sig. .000 .000 .000 .000 .000 ANOVA table – type II S.S Tests of Between-Subjects Effects Dependent Variable: Y Ty pe II Sum of Source Squares Correc ted Model 11545. 858a Intercept 61603. 201 A 3358.643 B 809.019 A*B 7070.287 Error 760.361 Tot al 73909. 420 Correc ted Tot al 12306. 219 df 8 1 2 2 4 24 33 32 Mean Square 1443.232 61603. 201 1679.321 404.509 1767.572 31. 682 a. R Squared = .938 (Adjusted R Squared = .918) F 45. 554 1944.440 53. 006 12. 768 55. 792 Sig. .000 .000 .000 .000 .000 ANOVA table – type III S.S Tests of Between-Subjects Effects Dependent Variable: Y Ty pe III Sum of Source Squares Correc ted Model 11545. 858a Intercept 52327. 002 A 2812.027 B 1010.809 A*B 7070.287 Error 760.361 Tot al 73909. 420 Correc ted Tot al 12306. 219 df 8 1 2 2 4 24 33 32 Mean Square 1443.232 52327. 002 1406.013 505.405 1767.572 31. 682 a. R Squared = .938 (Adjusted R Squared = .918) F 45. 554 1651.647 44. 379 15. 953 55. 792 Sig. .000 .000 .000 .000 .000 Next Topic Other Experimental Designs