IE341: Introduction to Design of Experiments Table of Contents Single factor ANOVAs with more than 2 levels Completely randomized designs Fixed effects models Decomposition of total sum of squares ANOVA table Testing model adequacy Multiple comparison methods Random effects model Repeated measures design Randomized complete blocks design Latin Square Graeco Latin Square Regression approach to ANOVA Factorial designs Regression model version 2-factor experiments 3-factor experiments Tests for quadratic effects Blocking 2k factorial designs Single replicate designs Addition of center points Fractional factorials Design resolution Complementary ½ fractions ¼ fractions 7 12 14 18 24 31 38 48 53 59 69 74 78 86 98 110 122 133 144 150 165 177 189 196 208 217 Taguchi approach to experimentation Random Effects model Mixed models ANCOVA Nested designs 2-stage nested designs 3- stage and m-stage nested designs Split-plot designs Split-split-plot designs Transformations Log transformation of standard deviations Logic transformations of odds ratios Kruskal-Wallis rank transformation Response Surface Methodology First order Second order and CCD Canonical analysis Response Surface designs Central Composite Design (CCD) Box-Behnken Designs Mixture Experiments Simplex designs 225 236 251 258 270 271 281 287 299 307 307 313 316 322 327 342 353 360 363 365 368 372 Last term we talked about testing the difference between two independent means. For means from a normal population, the test statistic is XA XB XA XB t sdiff s A2 s B2 n A nB where the denominator is the estimated standard deviation of the difference between two independent means. This denominator represents the random variation to be expected with two different samples. Only if the difference between the sample means is much greater than the expected random variation do we declare the means different. We also covered the case where the two means are not independent, and what we must do to account for the fact that they are dependent. And finally, we talked about the difference between two variances, where we used the F ratio. The F distribution is a ratio of two chi-square variables. So if s21 and s22 possess independent chi-square distributions with v1 and v2 df, respectively, then 2 1 2 2 s F s has the F distribution with v1 and v2 df. All of this is valuable if we are testing only two means. But what if we want to test to see if there is a difference among three means, or four, or ten? What if we want to know whether fertilizer A or fertilizer B or fertilizer C is best? In this case, fertilizer is called a factor, which is the condition under test. A, B, C, the three types of fertilizer under test, are called levels of the factor fertilizer. Or what if we want to know if treatment A or treatment B or treatment C or treatment D is best? In this case, treatment is called a factor. A,B,C,D, the four types of treatment under test, are called levels of the factor treatment. It should be noted that the factor may be quantitative or qualitative. Enter the analysis of variance! ANOVA, as it is usually called, is a way to test the differences between means in such situations. Previously, we tested single-factor experiments with only two treatment levels. These experiments are called single-factor because there is only one factor under test. Single-factor experiments are more commonly called one-way experiments. Now we move to single-factor experiments with more than two treatment levels. Let’s start with some notation. Yij = ith observation in the jth level N = total number of experimental observations Y = the grand mean of all N experimental observations Y N Y i 1 ij N nj Y j = the mean of the observations in the jth level Yj Y i 1 ij nj nj = number of observations in the jth level; the nj are called replicates. Replication of the design refers to using more than one experimental unit for each level. If there are the same number n replicates for each treatment, the design is said to be balanced. Designs are more powerful if they are balanced, but balance is not always possible. Suppose you are doing an experiment and the equipment breaks down on one of the tests. Now, not by design but by circumstance, you have unequal numbers of replicates for the levels. In all the formulas, we used nj as the number of replicates in treatment j, not n, so there is no problem. Notation continued j = the effect of the jth level j Yj Y J = number of treatment levels eij = the “error” associated with the ith observation in the jth level, assumed to be independent normally distributed random variables with mean = 0 and variance = σ2, which are constant for all levels of the factor. For all experiments, randomization is critical. So to draw any conclusions from the experiment, we must require that the treatments be run in random order. We must also assign the experimental units to the treatments randomly. If all this randomization occurs, the design is called a completely randomized design. ANOVA begins with a linear statistical model Yij Y j e ij This model is for a one-way or singlefactor ANOVA. The goal of the model is to test hypotheses about the treatment effects and to estimate them. If the treatments have been selected by the experimenter, the model is called a fixed-effects model. For fixed-effects models, we are interested in differences between treatment means. In this case, the conclusions will apply only to the treatments under consideration. Another type of model is the random effects model or components of variance model. In this situation, the treatments used are a random sample from large population of treatments. Here the τi are random variables and we are interested in their variability, not in the differences among the means being tested. First, we will talk about fixed effects, completely randomized, balanced models. In the model we showed earlier, the τj are defined as deviations from the grand mean so J j 1 j 0 It follows that the mean of the jth treatment is Yj Y j Now the hypothesis under test is: Ho: μ1= μ2 = μ3 = … μJ Ha: μj≠ μk for at least one j,k pair The test procedure is ANOVA, which is a decomposition of the total sum of squares into its components parts according to the model. The total SS is nj J SSTotal (Yij Y ) 2 i 1 j 1 and ANOVA is about dividing it into its component parts. SS = variability of the differences among the J levels J SS n j (Y j Y )2 j 1 SSε = pooled variability of the random error within levels J nj SSerror (Yij Y j ) 2 j 1 i 1 This is easy to see because J nj (Y j 1 i 1 2 Y j Y (Yij Y j ) ) Y ij J nj 2 j 1 i 1 n j Y j Y (Yij Y j ) 2 Y j Y (Yij Y j ) J j 1 2 J nj J nj 2 j 1 i 1 j 1 i 1 But the cross-product term vanishes because nj (Y i 1 ij Yj ) 0 So SStotal = SS treatments + SS error Most of the time, this is called SStotal = SS between + SS within Each of these terms becomes an MS (mean square) term when divided by the appropriate df. MS treatments MS error SStreatments SStreatments df treatments J 1 SSerror SSerror df error N J The df for SSerror = N-J because J nj J 2 ( Y Y ) ( n 1 ) s ij j j j 2 j 1 i 1 j 1 (n1 1) s12 (n2 1) s22 ... (nJ 1) sJ2 SSerror (n1 1) (n2 1) ... (nJ 1) N J and the df for SSbetween = J-1 because there are J levels. Now the expected values of each of these terms are E(MSerror) = σ2 E(MStreatments) = J 2 2 n j j j 1 J 1 Now if there are no differences among the treatment means, then j 0 for all j. So we can test for differences with our old friend F F MS treatments MS error with J -1 and N -J df. Under Ho, both numerator and denominator are estimates of σ2 so the result will not be significant. Under Ha, the result should be significant because the numerator is estimating the treatment effects as well as σ2. The results of an ANOVA are presented in an ANOVA table. For this one-way, fixed-effects, balanced model: Source Model Error Total SS SSbetween SSwithin SStotal df J-1 N-J N-1 MS MSbetween MSwithin p p Let’s look at a simple example. A product engineer is investigating the tensile strength of a synthetic fiber to make men’s shirts. He knows from prior experience that the strength is affected by the weight percent of cotton in the material. He also knows that the percent should range between 10% and 40% so that the shirts can receive permanent press treatment. The engineer decides to test 5 levels: 15%, 20%, 25%, 30%, 35% and to have 5 replicates in this design. His data are % Yj 15 7 7 15 11 9 9.8 20 12 17 12 18 18 15.4 25 14 18 18 19 19 17.6 30 19 25 22 19 23 21.6 35 7 10 11 15 11 10.8 Y 15.04 In this tensile strength example, the ANOVA table is Source Model Error Total SS df MS 475.76 4 118.94 161.20 20 8.06 636.96 24 p <0.01 In this case, we would reject Ho and declare that there is an effect of the cotton weight percent. We can estimate the treatment parameters by subtracting the grand mean from the treatment means. In this example, τ1 = 9.80 – 15.04 = -5.24 τ2 = 15.40 – 15.04 = +0.36 τ3 = 17.60 – 15.04 = -2.56 τ4 = 21.60 – 15.04 = +6.56 τ5 = 10.80 – 15.04 = -4.24 Clearly, treatment 4 is the best because it provides the greatest tensile strength. Now you could have computed these values from the raw data yourself instead of doing the ANOVA. You would get the same results, but you wouldn’t know if treatment 4 was significantly better. But if you did a scatter diagram of the original data, you would see that treatment 4 was best, with no analysis whatsoever. In fact, you should always look at the original data to see if the results do make sense. A scatter diagram of the raw data usually tells as much as any analysis can. S catter plo t o f tensile strength data 30 tensile str ength 25 20 15 10 5 0 10 15 20 25 30 weight percent co tto n 35 40 How do you test the adequacy of the model? The model assumes certain assumptions that must hold for the ANOVA to be useful. Most importantly, that the errors are distributed normally and independently. The error for each observation, sometimes called the residual, is e ij Yij Y j A residual check is very important for testing for nonconstant variance. The residuals should be structureless, that is, they should have no pattern whatsoever, which, in this case, they do not. S catter plo t o f residuals v s. fitted v alues 6 5 4 3 r esidual 2 1 0 -1 -2 -3 -4 -5 9 12 15 fitted v alue 18 21 These residuals show no extreme differences in variation because they all have about the same spread. They also do not show the presence of any outlier. An outlier is a residual value that is vey much larger than any of the others. The presence of an outlier can seriously jeopardize the ANOVA, so if one is found, its cause should be carefully investigated. A histogram of residuals shows that the distribution is slightly skewed. Small departures from symmetry are of less concern than heavy tails. Histo gram o f R esiduals Fr equency 3 2 1 -6 -4 -2 0 R esidual 2 4 6 Another check is for normality. If we do a normal probability plot of the residuals, we can see whether normality holds. Normal probability plot 100 Cum Normal probability 90 80 70 60 50 40 30 20 10 0 -4 -2 0 2 Res idual 4 6 A normal probability plot is made with ascending ordered residuals on the x-axis and their cumulative probability points, 100(k-.5)/n, on the y-axis. k is the order of the residual and n = number of residuals. There is no evidence of an outlier here. The previous slide is not exactly a normal probability plot because the y-axis is not scaled properly. But it does gives a pretty good suggestion of linearity. A plot of residuals vs run order is useful to detect correlation between the residuals, a violation of the independence assumption. Runs of positive or of negative residuals indicates correlation. None is observed here. R esiduals v s R un O rder 6 5 4 Residuals 3 2 1 0 -1 -2 -3 -4 -5 0 5 10 15 R un O rder 20 25 30 One of the goals of the analysis is to choose among the level means. If the results of the ANOVA shows that the factor is significant, we know that at least one of the means stands out from the rest. But which one or ones? The procedures for making these mean comparisons are called multiple comparison methods. These methods use linear combinations called contrasts. A contrast is a particular linear combination of level means, such as Y4 Y5 to test the difference between level 4 and level 5. Or if one wished to test the average of levels 1 and 3 vs the average of levels 4 and 5, he would use (Y Y ) (Y Y ) . 1 In general, J C n c jY j j 1 3 4 5 J where c j 1 j 0 An important case of contrasts is called orthogonal contrasts. Two contrasts with coefficients cj and dj are orthogonal if J n c d j 1 j j j 0 or in a balanced design if J c d j 1 j j 0 There are many ways to choose the orthogonal contrast coefficients for a set of levels. For example, if level 1 is a control and levels 2 and 3 are two real treatments, a logical choice is to compare the average of the two treatments with the control: 1Y2 1Y3 2Y1 and then the two treatments against one another: 1Y2 1Y3 0Y1 These two contrasts are orthogonal because c d (1*1) (1* 1) (2 * 0) 0 3 j 1 j Only J-1 orthogonal contrasts may be chosen because the J levels have only J-1 df. So for only three levels, the contrasts chosen exhaust those available for this experiment. Contrasts must be chosen before seeing the data so that experimenters aren’t tempted to contrast the levels with the greatest differences. For the tensile strength experiment with 5 levels and thus 4 df, the 4 contrasts are: C1= 0(5)(9.8)+0(5)(15.4)+0(5)(17.6)-1(5)(21.6)+1(5)(10.8) =-54 C2= +1(5)(9.8)+0(5)(15.4)+1(5)(17.6)-1(5)(21.6)-1(5)(10.8) =-25 C3= +1(5)(9.8)+0(5)(15.4)-1(5)(17.6)+0(5)(21.6)+0(5)(10.8) =-39 C4= -1(5)(9.8)+4(5)(15.4)-1(5)(17.6)-1(5)(21.6)-1(5)(10.8) = 9 These 4 contrasts completely partition the SStreatments. Then the SS for each contrast is formed: J n j c jY j j 1 SSC J n j c 2j j 1 2 So for the 4 contrasts we have: SS C1 SS C 2 SS C 3 SS C 4 542 5[(0 (0 ) (0 ) (1 ) (1 )] 2) 2 2 2 2 291.6 252 5[(1 ) (0 ) (1 ) (1 ) (1 )] 2 2 2 2 2 392 5[(1 ) (0 ) (1 ) (0 ) (0 )] 2 2 2 2 2 31.25 31.25 92 5[( 1 ) (4 ) (1 ) (1 ) (1 )] 2 2 2 2 2 0.81 Now the revised ANOVA table is Source Weight % C1 C2 C3 C4 Error Total SS 475.76 291.60 31.25 152.10 0.81 161.20 636.96 df MS p 4 118.94 <0.001 1 291.60 <0.001 1 31.25 <0.06 1 152.10 <0.001 1 0.81 <0.76 20 8.06 24 So contrast 1 (level 5 – level 4) and contrast 3 (level 1 – level 3) are significant. Although the orthogonal contrast approach is widely used, the experimenter may not know in advance which levels to test or they may be interested in more than J-1 comparisons. A number of other methods are available for such testing. These methods include: Scheffe’s Method Least Significant Difference Method Duncan’s Multiple Range Test Newman-Keuls test There is some disagreement about which is the best method, but it is best if all are applied only after there is significance in the overall F test. Now let’s look at the random effects model. Suppose there is a factor of interest with an extremely large number of levels. If the experimenter selects J of these levels at random, we have a random effects model or a components of variance model. The linear statistical model is Yij Y j e ij as before, except that both j and e ij are random variables instead of simply e ij . Because j and e ij are independent, the variance of any observation is Var (Yij ) Var ( ) Var (e ij ) These two variances are called variance components, hence the name of the model. The requirements of this model are that the e ij are NID(0,σ2), as before, and that the j are NID(0, 2 ) and that e ij and j are independent. The normality assumption is not required in the random effects model. As before, SSTotal = SStreatments + SSerror And the E(MSerror) = σ2. 2 2 But now E(MStreatments) = σ + n So the estimate of 2 is ˆ2 MS treatments MS error n The computations and the ANOVA table are the same as before, but the conclusions are quite different. Let’s look at an example. A textile company uses a large number of looms. The process engineer suspects that the looms are of different strength, and selects 4 looms at random to investigate this. The results of the experiment are shown in the table below. Loom Yj 1 98 97 99 96 97.5 2 91 90 93 92 91.5 3 96 95 97 95 95.75 4 95 96 99 98 97.0 Y The ANOVA table is Source SS df Looms 89.19 3 Error 22.75 12 Total 111.94 15 95.44 MS p 29.73 <0.001 1.90 In this case, the estimates of the variances are: e2 =1.90 29.73 1.90 ˆ 6.96 4 2 ij2 e2 2 1.90 6.96 8.86 Thus most of the variability in the observations is due to variability in loom strength. If you can isolate the causes of this variability and eliminate them, you can reduce the variability of the output and increase its quality. When we studied the differences between two treatment means, we considered repeated measures on the same individual experimental unit. With three or more treatments, we can still do this. The result is a repeated measures design. Consider a repeated measures ANOVA partitioning the SSTotal. n J n J n J 2 ( Y Y ) ( Y Y ) ( Y Y ) ij i ij i 2 i 1 j 1 2 i 1 j 1 i 1 j 1 This is the same as SStotal = SSbetween subjects + SSwithin subjects The within-subjects SS may be further partitioned into SStreatment + SSerror . In this case, the first term on the RHS is the differences between treatment effects and the second term on the RHS is the random error. n J n J n J 2 ( Y Y ) ( Y Y ) ( Y Y Y Y ) ij i j ij i j 2 i 1 j 1 2 i 1 j 1 i 1 j 1 Now the ANOVA table looks like this. Source SS n J Between subjects (Y Y ) i 1 j 1 n Within Subjects Treatments Error Total J i 1 j 1 ij J (Y J-1 ij Yi Y j Y ) 2 (J-1)(n-1) (Y n J (Y i 1 j 1 n(J-1) Y )2 J i 1 j 1 Yi ) 2 n-1 j i 1 j 1 n 2 i (Y n df ij Y )2 Jn-1 MS p The test for treatment effects is the usual MS treatment MS error but now it is done entirely within subjects. This design is really a randomized complete block design with subjects considered to be the blocks. Now what is a randomized complete blocks design? Blocking is a way to eliminate the effect of a nuisance factor on the comparisons of interest. Blocking can be used only if the nuisance factor is known and controllable. Let’s use an illustration. Suppose we want to test the effect of four different tips on the readings from a hardness testing machine. The tip is pressed into a metal test coupon, and from the depth of the depression, the hardness of the coupon can be measured. The only factor is tip type and it has four levels. If 4 replications are desired for each tip, a completely randomized design would seem to be appropriate. This would require assigning each of the 4x4 = 16 runs randomly to 16 different coupons. The only problem is that the coupons need to be all of the same hardness, and if they are not, then the differences in coupon hardness will contribute to the variability observed. Blocking is the way to deal with this problem. In the block design, only 4 coupons are used and each tip is tested on each of the 4 coupons. So the blocking factor is the coupon, with 4 levels. In this setup, the block forms a homogeneous unit on which to test the tips. This strategy improves the accuracy of the tip comparison by eliminating variability due to coupons. Because all 4 tips are tested on each coupon, the design is a complete block design. The data from this design are shown below. Test coupon Tip type 1 2 3 4 1 9.3 9.4 9.6 10.0 2 9.4 9.3 9.8 9.9 3 9.2 9.4 9.5 9.7 4 9.7 9.6 10.0 10.2 Now we analyze these data the same way we did for the repeated measures design. The model is Y jk Y j k e jk where βk is the effect of the kth block and the rest of the terms are those we already know. Since the block effects are deviations from the grand mean, K k 1 just as k J j 1 j 0 0 We can express the total SS as J K J K 2 ( Y Y ) [( Y Y ) ( Y Y ) ( Y Y Y Y )] jk j k jk j k 2 j 1 k 1 J j 1 k 1 K J K J K (Y j Y ) (Yk Y ) (Y jk Y j Yk Y ) 2 2 j 1 k 1 2 j 1 k 1 j 1 k 1 which is equivalent to SStotal = SStreatments + SSblocks + SSerror with df N-1 = J-1 + K-1 + (J-1)(K-1) The test for equality of treatment means is F MSMS treatments error and the ANOVA table is Source Treatments Blocks Error Total SS SStreatments SSblocks SSerror SStotal df J-1 K-1 (J-1)(K-1) N-1 MS MStreatments MSblocks MSerror p For the hardness experiment, the ANOVA table is Source Tip type Coupons Error Total SS df 38.50 3 82.50 3 8.00 9 129.00 15 MS 12.83 27.50 .89 p 0.0009 As is obvious, this is the same analysis as the repeated measures design. Now let’s consider the Latin Square design. We’ll introduce it with an example. The object of study is 5 different formulations of a rocket propellant on the burning rate of aircraft escape systems. Each formulation comes from a batch of raw material large enough for only 5 formulations. Moreover, the formulations are prepared by 5 different operators, who differ in skill and experience. The way to test in this situation is with a 5x5 Latin Square, which allows for double blocking and therefore the removal of two nuisance factors. The Latin Square for this example is Batches of raw material Operators 1 2 3 4 5 1 A B C D E 2 B C D E A 3 C D E A B 4 D E A B C 5 E A B C D Note that each row and each column has all 5 letters, and each letter occurs exactly once in each row and column. The statistical model for a Latin Square is Y jkl Y j k l e jkl where Yjkl is the jth treatment observation in the kth row and the lth column. Again we have SStotal = SSrows+ SScolumns+ SStreatments+ SSerror with df = N = R-1 + C-1 + J-1 + (R-2)(C-1) The ANOVA table for propellant data is Source SS df MS p Formulations 330.00 4 82.50 0.0025 Material batches Operators Error Total 68.00 150.00 128.00 676.00 4 4 12 24 17.00 37.50 10.67 0.04 So both the formulations and the operators were significantly different. The batches of raw material were not, but it still is a good idea to block on them because they often are different. This design was not replicated, and Latin Squares often are not, but it is possible to put n replicates in each cell. Now if you superimposed one Latin Square on another Latin Square of the same size, you would get a GraecoLatin Square. In one Latin Square, the treatments are designated by roman letters. In the other Latin Square, the treatments are designated by Greek letters. Hence the name Graeco-Latin Square. A 5x5 Graeco-Latin Square is Batches of raw material Operators 1 2 3 4 5 1 Aα Bγ Cε Dβ Eδ 2 Bβ Cδ Dα Eγ Aε 3 Cγ Dε Eβ Aδ Bα 4 Dδ Eα Aγ Bε Cβ 5 Eε Aβ Bδ Cα Dγ Note that the five Greek treatments appear exactly once in each row and column, just as the Latin treatments did. If Test Assemblies had been added as an additional factor to the original propellant experiment, the ANOVA table for propellant data would be Source SS Formulations Material batches Operators Test Assemblies Error Total 330.00 68.00 150.00 62.00 66.00 676.00 df 4 4 4 4 8 24 MS 82.50 17.00 37.50 15.50 8.25 p 0.0033 0.0329 The test assemblies turned out to be nonsignificant. Note that the ANOVA tables for the Latin Square and the Graeco-Latin Square designs are identical, except for the error term. The SS(error) for the Latin Square design was decomposed to be both Test Assemblies and error in the Graeco-Latin Square. This is a good example of how the error term is really a residual. Whatever isn’t controlled falls into error. Before we leave one-way designs, we should look at the regression approach to ANOVA. The model is Yij j e ij Using the method of least squares, we rewrite this as nj J nn J E e (Yij j ) 2 i 1 j 1 2 ij i 1 j 1 Now to find the LS estimates of μ and τj, E 0 E 0 When we do this differentiation with respect to μ and τj, and equate to 0, we obtain nj J 2 (Yij ˆ ˆ j ) 0 i 1 J 1 J 2 (Yij ˆ ˆ j ) 0 j 1 for all j After simplification, these reduce to Nˆ nˆ1 nˆ2 ... nˆJ Y .. nˆ nˆ1............................. Y.1 nˆ ............ nˆ2 ................. Y.2 . . . nˆ ............................ nˆJ Y.J In these equations, Y.. NY Y. j nY j These j + 1 equations are called the least squares normal equations. J If we add the constraint ˆ j 1 j 0 we get a unique solution to these normal equations. ˆ Y ˆ j Y j Y It is important to see that ANOVA designs are simply regression models. If we have a oneway design with 3 levels, the regression model is Yij 0 1 X i1 2 X i 2 e ij where Xi1 = 1 if from level 1 = 0 otherwise and Xi2 = 1 if from level 2 = 0 otherwise Although the treatment levels may be qualitative, they are treated as “dummy” variables. Since Xi1 = 1 and Xi2 = 0, Yi1 0 1 (1) 2 (0) e ij 0 1 e ij so 0 1 Y 1 Similarly, if the observations are from level 2, Yi 2 0 1 (0) 2 (1) e ij 0 2 e ij so 0 2 Y 2 Finally, consider observations from level 3, for which Xi1 = Xi2 = 0. Then the regression model becomes Yi 3 0 1 (0) 2 (0) e ij 0 e ij so 0 Y 3 Thus in the regression model formulation of this one-way ANOVA with 3 levels, the regression coefficients describe comparisons of the first two level means with the third. So 0 Y3 1 Y1 Y3 2 Y 2 Y3 Thus, testing β1= β2 = 0 provides a test of the equality of the three means. In general, for J levels, the regression model will have J-1 variables Yij 0 1 X i1 2 X i 2 ... J 1 X i ,J 1 eij and 0 YJ j Y j YJ Now what if you have two factors under test? Or three? Or four? Or more? Here the answer is the factorial design. A factorial design crosses all factors. Let’s take a two-way design. If there are J levels of factor A and K levels of factor B, then all JK treatment combinations appear in the experiment. Most commonly, J = K = 2. In a two-way design, with two levels of each factor, we have, where -1 and +1 are codes for low and high levels, respectively Factor A Factor B -1 (low level) -1 (low level) +1 (high level) -1 (low level) -1 (low level) +1 (high level) +1 (high level) +1 (high level) Response 20 40 30 52 We can have as many replicates as we want in this design. With n replicates, there are n observations in each cell of the design. SStotal = SSA + SSB + SSAB + SSerror This decomposition should be familiar by now except for SSAB. What is this term? Its official name is interaction. This is the magic of factorial designs. We find out about not only the effect of factor A and the effect of factor B, but the effect of the two factors in combination. How do we compute main effects? The main effect of factor A is the difference between the average response at A high and the average response at A low, 40 52 20 30 46 25 21 2 2 Similarly, the B effect is the difference between the average response at B high and the average response at B low 30 52 20 40 41 30 11 2 2 You can always find main effects from the design matrix. Just multiply the mean response by the +1 and -1 codes and divide by the number of +1 codes in the column. For example, Aeffect (-1)(20) (1)(40) (-1)(30) (1)(52) 21 2 Beffect (-1)(20) (1)(40) (1)(30) (1)(52) 11 2 So the main effect of factor A is 21 and the main effect of factor B is 11. That is, changing the level of factor A from the low level to the high level brings a response increase of 21 units. And changing the level of factor B from the low level to the high level increases the response by 11 units. The plots below show the main effects of factors A and B. M ai n E ffe c t of Fac tor A 50 40 M ain Effect of Factor B 35 50 30 45 25 1 Fac tor A L e v e l Response Re sponse 45 2 40 35 30 25 1 2 Factor B Level Both A and B are significant, which you can see by the fact that the slope is not 0. A 0 slope in the effect line that connects the response at the high level with the response at the low level indicates that it doesn’t matter to the response whether the factor is set at its high value or its low value, so the effect of such a factor is not significant. Of course, the p value from the F test gives the significance of the factors precisely, but it is usually evident from the effects plots. Now how do you compute the interaction effect? Interaction occurs when the difference in response between the levels of one factor are different at the different levels of the other factor. ( A B A B ) ( A B A B ) 2 2 1 2 2 1 1 1 The first term here is the difference between the two levels of factor A at the high level of factor B. That is, 52 -30 = 22. And the difference between the two levels of factor A at the low level of factor B is 40-20 = 20. Then the interaction effect is (22-20)/ 2 = 1. Of course, you can compute the interaction effect from the interaction column, just as we did with main effects. But how do you get the interaction column +1 and -1 codes? Simply multiply the codes for factor A by those for factor B. Factor A Factor B AB Response -1 -1 +1 20 +1 -1 -1 40 -1 +1 -1 30 +1 +1 +1 52 Now you can compute the interaction effect by multiplying the response by the interaction codes and dividing by the number of +1 codes. (1)( 20) (1)( 40) (1)(30) (1)(52) ABeffect 1 2 And, of course, the interaction effect is again 1. Because the interaction effect =1, which is very small, it is not significant. The interaction plot below shows almost parallel lines, which indicates no interaction. Interaction of Factors A and B 60 B High Response 50 40 B Low 30 B High 20 B Low 10 0 -1 0 Lev el of Factor A 1 Now suppose the two factors are quantitative, like temperature, pressure, time, etc. Then you could write a regression model version of the design. Y 0 1 X 1 2 X 2 12 X 1 X 2 As before, X1 represents factor A and X2 represents factor B. X1X2 is the interaction term, and e is the error term. The parameter estimates for this model turn out to be ½ of the effect estimates. The β estimates are: 0 Y 20 40 30 52 35.5 4 1 2 21 10.5 2 1 2 11 5.5 2 1 ( Aeffect ) 2 ( Beffect ) 1 2 12 ( ABeffect ) 1 0.5 2 So the model is Yˆ 35.5 10.5 X 1 5.5 X 2 0.5 X 1 X 2 With this equation, you can find all the effects of the design. For example, if you want to know the mean when both A and B are at the high (+1) level, the equation is Yˆ 35.5 10.5(1) 5.5(1) 0.5(1)( 1) 52 Now if you want the mean when A is at the high level and B is at the low level, the equation is Yˆ 35.5 10.5(1) 5.5(1) 0.5(1)( 1) 40 All you have to do is fill in the values of X1 and X2 with the appropriate codes, +1 or -1. Now suppose the data in this experiment are: Factor A Factor B AB Response -1 -1 +1 20 +1 -1 -1 50 -1 +1 -1 40 +1 +1 +1 12 Now let’s look at the main and interaction effects. The main effects are Aeffect (-1)(20) (1)(50) (-1)(40) (1)(12) 1 2 Beffect (-1)(20) (1)(50) (1)(40) (1)(12) 9 2 The interaction effect is ABeffect (1)( 20) (1)(50) (1)( 40) (1)(12) 29 2 which is very high and is significant. Now let’s look at the main effects of the factors graphically. M ain Effect o f Facto r A 40 30 M ain Effect o f Facto r B 40 25 35 20 1 2 Facto r A lev el Response Response 35 30 25 20 1 2 Facto r B lev el Clearly, factor A is not significant, which you can see by the approximately 0 slope. Factor B is probably significant because the slope is not close to 0. The p value from the F test gives the actual significance. Now let’s look at the interaction effect. This is the effect of factors A and B in combination, and is often the most important effect. Interaction of Factors A and B 60 B Low Response 50 B High 40 30 B Low 20 B High 10 0 -1 0 Lev el of Factor A 1 Now these two lines are definitely not parallel, so there is an interaction. It probably is very significant because the two lines cross. Only the p value associated with the F test can give the actual significance, but you can see with the naked eye that there is no question about significance here. Interaction of factors is the key to the East, as we say in the West. Suppose you wanted the factor levels that give the lowest possible response. If you picked by main effects, you would pick A low and B high. But look at the interaction plot and it will tell you to pick A high and B high. This is why, if the interaction term is significant, you never, never, never interpret the corresponding main effects. They are meaningless in the presence of interaction. And it is because factorial designs provide the ability to test for interactions that they are so popular and so successful. You can get response surface plots for these regression equations. If there is no interaction, the response surface is a plane in the 3rd dimension above the X1,X2 Cartesian space. The plane may be tilted, but it is still a plane. If there is interaction, the response surface is a twisted plane representing the curvature in the model. The simplest factorials are two-factor experiments. As an example, a battery must be designed to be robust to extreme variations in temperature. The engineer has three possible choices for the plate material. He decides to test all three plate materials at three temperatures. He tests four batteries at each combination of material type and temperature. The response variable is battery life. Here are the data he got. Plate material type 1 2 3 Temperature (˚F) -15 70 125 130 34 20 74 40 70 155 80 82 180 75 58 150 136 25 159 122 70 188 106 58 126 115 45 138 174 96 110 120 104 168 150 82 160 139 60 The model here is Yijk Y j k j k ijk Both factors are fixed so we have the same constraints as before J and K j 0 k 0 j i In addition, k i J K j i j k k 1 j k 0 The experiment has n = 4 replicates, so there are nJK total observations. n Y J Y i 1 j 1 k 1 nJK n Yj K K Y i 1 k 1 nK n Yk J Y i 1 j 1 nJ n Y jk ijk Y ijk i 1 n ijk ijk The total sum of squares can be partitioned into four components: n J K (Y i 1 j 1 k 1 ijk J K J K n J K Y ) nK (Y j Y ) nJ (Yk Y ) n (Y jk Y j Yk Y ) (Yijk Y jk )2 2 2 j 1 2 k 1 2 j 1 k 1 SStotal = SSA + SSB + SSAB +SSe i 1 j 1 k 1 The expectation of the MS due to each of these components is E ( MS E ) 2 J E ( MS AB ) 2 K n ( j k ) 2 j 1 k 1 ( J 1)( K 1) J E ( MS A ) 2 Kn 2j j 1 J 1 K E ( MS B ) 2 Jn k2 k 1 K 1 So the appropriate F-ratio for testing each of these effects is Test of A effect: MS A F MS E MS B Test of B effect: F MS E Test of AB interaction: MS AB F MS E and the ANOVA table is Source SS df MS A SSA J-1 B SSB K-1 AB SSAB (J-1)(K-1) Error SSe JK(n-1) Total SStotal JKn -1 p For the battery life experiment, Material type Temperature (˚F) -15 70 125 1 134.75 57.25 57.50 83.17 2 155.75 119.75 49.50 108.33 3 144.00 145.75 85.50 125.08 Yk 144.83 107.58 64.17 Y 105.53 Yj The ANOVA table is Source SS Material 10,683.72 Temperature 39,118.72 Interaction 9,613.78 Error 18,230.75 Total 77,646.97 df MS 2 5,341.86 2 19,558.36 4 2,403.44 27 675.21 35 p 0.0020 0.0001 0.0186 Because the interaction is significant, the only plot of interest is the interaction plot. Interaction Plot for Material Ty pe v s Temperature 170 Type 2 Battery Life in Hours 150 Type 3 Type 3 Type 1 130 Type 2 110 90 Type 3 70 Type 1 50 Type 1 Type 2 30 15 70 Temperature 125 Although it is not the best at the lowest temperature, Type 3 is much better than the other two at normal and high temperatures. Its life at the lowest temperature is just an average of 12 hours less than the life with Type 2. Type 3 would probably provide the design most robust to temperature differences. Suppose you have a factorial design with more than two factors. Take, for example, a three-way factorial design, where the factors are A, B, and C. All the theory is the same, except that now you have three 2-way interactions, AB, AC, BC, and one 3-way interaction, ABC. Consider the problem of soft-drink bottling. The idea is to get each bottle filled to a uniform height, but there is variation around this height. Not every bottle is filled to the same height. The process engineer can control three variables during the filling process: percent carbonation (A), operating pressure (B), and number of bottles produced per minute or line speed (C). The engineer chooses three levels of carbonation (factor A), two levels of pressure (factor B), and two levels for line speed (factor C). This is a fixed effects design. He also decides to run two replicates. The response variable is the average deviation from the target fill height in a production run of bottles at each set of conditions. Positive deviations are above the target and negative deviations are below the target. The data are Operating pressure (B) Percent carbonation (A) 10 12 14 25 psi 30 psi line speed (C) line speed (C) 200 250 200 250 -3 -1 -1 1 -1 0 0 1 0 2 2 6 1 1 3 5 5 7 7 10 4 6 9 11 The 3–way means are Operating pressure (B) Percent carbonation (A) 25 psi 30 psi line speed (C) line speed (C) 200 250 200 250 10 -2 -.5 -.5 1 12 .5 1.5 2.5 5.5 14 4.5 6.5 8 10.5 The 2-way means are B (low) B (high) C (low) C (high) A 25 psi 30 psi A 200 250 10 -1.25 0.25 10 -1.25 0.25 12 1.00 4.00 12 1.50 3.50 14 5.50 9.25 14 6.25 8.50 C (low) C (high) B 200 250 25 psi 1.00 2.50 30 psi 3.33 5.67 The main effect means are Factor A Mean Factor B Mean 10 % -0.500 25 psi 1.75 12 % 14 % 2.500 7.375 30 psi 4.50 Factor C Mean 200 2.167 250 4.083 The ANOVA table is Source SS df MS A 252.750 2 126.375 B 45.375 1 45.375 C 22.042 1 22.042 AB 5.250 2 2.625 AC 0.583 2 0.292 BC 1.042 1 1.042 ABC 1.083 2 0.542 Error 8.500 12 0.708 Total 336.625 23 p <0.0001 <0.0001 0.0001 0.0557 0.6713 0.2485 0.4867 So the only significant effects are those for A, B, C, AB. The AB interaction is barely significant, so interpretation must be tempered by what we see in the A and B main effects. The plots are shown next. The plots are Factor B Factor A 8 7 7 6 6 5 Response 8 Response 5 4 3 2 4 3 2 1 1 0 0 -1 -1 25 10 12 30 14 Lev el of B Lev el of A Factor C AB Interactio n 8 10 B=30 psi 7 8 5 4 Response Re spon se 6 3 2 1 6 B=25 psi 4 B=30 psi 2 B=25 psi 0 B=30 psi 0 -1 200 250 L e v e l of C B=25 psi -2 10 12 L ev el o f A 14 Our goal is to minimize the response. Given the ANOVA table and these plots, we would choose the low level of factor A, 10% carbonation, and the low level of factor B, 25 psi. This is true whether we look at the two main effects plots or the interaction plot. This is because the interaction is barely significant. We would also choose the slower line speed, 200 bottles per minute. Now suppose you do an experiment where you suspect nonlinearity and want to test for both linear and quadratic effects. Consider a tool life experiment, where the life of a tool is thought to be a function of cutting speed and tool angle. Three levels of each factor are used. So this is a 2-way factorial fixed effects design. The three levels of cutting speed are 125, 150, 175. The three levels of tool angle are 15˚, 20˚, 25˚. Two replicates are used and the data are shown below. Tool Angle (degrees) 15 20 25 Cutting Speed (in/min) 125 150 175 -2 -3 2 -1 0 3 0 1 4 2 3 6 -1 5 0 0 6 -1 The ANOVA table for this experiment is Source SS df MS p Tool Angle 24.33 2 12.17 0.0086 Cut Speed 25.33 2 12.67 0.0076 TC 61.34 4 15.34 0.0018 Error 13.00 9 1.44 Total 124.00 17 The table of cell and marginal means is Factor T Factor C Yj 125 150 175 15˚ -1.5 -1.5 2.5 -0.167 20˚ 1.0 2.0 5.0 2.667 25˚ -0.5 5.5 -0.5 1.500 Yk -0.33 2.0 2.33 Cutting Speed Factor 3 3 2 .5 2 .5 2 2 Response Response Tool Angle Factor 1 .5 1 1 .5 1 0 .5 0 .5 0 0 -0 .5 15 20 Lev el of T 25 -0 .5 125 150 Lev el of C 175 Clearly there is reason to suspect quadratic effects here. So we can break down each factor’s df into linear and quadratic components. We do this by using orthogonal contrasts. The contrast for linear is -1, 0, =1 and the contrast for quadratic is +1, -2, +1. We need a table of factor totals to proceed. For factor T, Factor T Sum of Obs 15 -1 20 16 25 9 Now applying the linear and quadratic contrasts to these sums, Factor Sum T of Obs Linear Quadratic 15 -1 -1 +1 20 16 0 -2 25 9 +1 +1 10 -24 Contrast Now to find the SS due to these two new contrasts, 2 c jY j 2 10 j 1 SSlin 8.33 3 (2)(3)( 2) 2 nJ c j 3 j 1 2 c jY j 2 24 j 1 16 3 (2)(3)(6) 2 nJ c j 3 SSquad j 1 Now we can do the same thing for factor C. The table of sums with the contrasts Factor C Sum of Obs Linear Quadratic included is 125 -2 -1 +1 150 12 0 -2 175 14 +1 +1 16 -12 Contrast Now for the SS due to each contrast, 2 2 ckYk 2 16 k 1 SS lin 21.33 3 ( 2 )( 3 )( 2 ) nK ck2 3 k 1 SSquad 3 ckYk 2 12 k 1 4.0 3 (2)(3)(6) 2 nK ck k 1 Now we can write the new ANOVA table Source SS df Tool angle 24.33 2 Linear 8.33 1 Quad 16.00 1 Cut Speed 25.33 2 Linear 21.33 1 Quad 4.00 1 TC 61.34 4 Error 13.00 9 Total 124.00 17 MS 12.17 8.33 16.00 12.67 21.33 4.00 15.34 1.44 p 0.0086 0.0396 0.0088 0.0076 0.0039 0.1304 0.0018 Now see how the df for each of the factors has been split into its two components, linear and quadratic. It turns out that everything except the quadratic for Cutting Speed is significant. Now guess what! There are 4 df for the interaction term and why not split them into linear and quadratic components as well. It turns out that you can get TlinClin, TlinCquad, TquadClin, and TquadCquad. These 4 components use up the 4 df for the interaction term. There is reason to believe the quadratic component in the interaction, as shown below, but we’ll pass on this for now. Interaction of Tool Angle and Cutting Speed 6 Speed 150 5 Speed 175 3 Speed 175 2 Speed 150 1 Speed 125 Interaction of Tool Angle and Cutting Speed 6 Angle25 5 0 Speed 175 125 Speed 125 150 -2 15 20 Tool Angle 25 Angle 20 4 -1 Response Response 4 3 Angle 15 2 1 Angle 20 Angle 20 0 Angle25 Angle25 -1 Angle 15 -2 125 Angle 15 150 Cutting Speed 175 Now let’s talk about blocking in a factorial design. The concept is identical to blocking in a 1-way design. There is either a nuisance factor or it is not possible to completely randomize all the runs in the design. For example, there simply may not be enough time to run the entire experiment in one day, so perhaps the experimenter could run one complete replicate on one day and another complete replicate on the second day, etc. In this case, days would be a blocking factor. Let’s look at an example. An engineer is studying ways to improve detecting targets on a radar scope. The two factors of importance are background clutter and the type of filter placed over the screen. Three levels of background clutter and two filter types are selected to be tested. This is a fixed effects 2 x 3 factorial design. To get the response, a signal is introduced into the scope and its intensity is increased until an operator sees it. Intensity at detection is the response variable. Because of operator availability, an operator must sit at the scope until all necessary runs have been made. But operator differ in skill and ability to use the scope, so it makes sense to use operators as a blocking variable. Four operators are selected for use in the experiment. So each operator receives the 2 x 3 = 6 treatment combinations in random order, and the design is a completely randomized block design. The data are: Operators Filter type 1 2 3 4 1 2 1 2 1 2 1 2 Low 90 86 96 84 100 92 92 81 Medium 102 87 106 90 105 97 96 80 High 114 93 112 91 108 95 98 83 Ground clutter Since each operator (block) represents the complete experiment, all effects are within operators. The ANOVA table is Source Within blocks Ground clutter Filter type GF interaction Between blocks Error Total SS df MS 1479.33 5 295.87 335.58 2 167.79 1066.67 1 1066.67 77.08 2 38.54 402.17 3 134.06 166.33 15 11.09 2047.83 23 p <0.000001 <0.0003 <0.0001 0.0573 <0.0002 The effects of both the background clutter and the filter type are highly significant. Their interaction is marginally significant. As suspected, the operators are significantly different in their ability to detect the signal, so it is good that they were used as blocks. Now let’s look at the 2k factorial design. This notation means that there are k factors, each at 2 levels, usually a high and a low level. These factors may be qualitative or quantitative. This is a very important class of designs and is widely used in screening experiments. Because there are only 2 levels, it is assumed that the response is linear over the range of values chosen. Let’s look at an example of the simplest of these designs, the 22 factorial design. Consider the effect of reactant concentration (factor A) and amount of catalyst (factor B) on the yield in a chemical process. The 2 levels of factor A are: 15% and 25%. The 2 levels of factor B are: 1 pound and 2 pounds. The experiment is replicated three times. Here are the data. Factor A Factor B Replicate 1 Replicate 2 Replicate 3 -1 (low) -1 (low) 28 25 27 26.67 +1 (high) -1 (low) 36 32 32 33.33 -1 (low) +1 (high) 18 19 23 20.00 +1 (high) +1 (high) 31 30 29 30.00 Y jk This design can be pictured as rectangle. 20 30 +1 factor B -1 26.67 -1 factor A 33.33 +1 The interaction codes can also be derived from this table. Factor A Factor B AB interaction -1 (low) -1 (low) (-1)(-1)= +1 +1 (high) -1 (low) (+1)(-1)= -1 -1 (low) +1 (high) (-1)(+1)= -1 +1 (high) +1 (high) (+1)(+1)= +1 Multiplying the A and B factor level codes gets the AB interaction codes. This is always the way interaction codes are obtained. Now averaging according to the AB codes gives the interaction effect. Now we can find the effects easily from the table below. A B AB Replicate average -1 -1 +1 26.67 +1 -1 -1 33.33 -1 +1 -1 20 +1 +1 +1 30 Aeffect (1)( 26.67) (1)(33.33) (1)( 20) (1)(30) 16.67 8.33 2 2 Beffect (1)( 26.67) (1)(33.33) (1)( 20) (1)(30) 10 5 2 2 ABeffect (1)( 26.67) (1)(33.33) (1)( 20) (1)(30) 3.34 1.67 2 2 Because there are only first-order effects, the response surface is a plane. Yield increases with increasing reactant concentration (factor A) and decreases with increasing catalyst amount (factor B). The ANOVA table is Source A B AB Error Total SS 208.33 75.00 8.33 31.34 323.00 df MS p 1 208.33 <0.0001 1 75.00 <0.0024 1 8.33 0.1826 8 3.92 11 It is clear that both main effects are significant and that there is no AB interaction. The regression model is 8.33 5.00 ˆ Y 27.5 X1 X2 2 2 where the β coefficients are ½ the effects, as before. 27.5 is the grand mean of all 12 observations. Now let’s look at the 23 factorial design. In this case, there are three factors, each at 2 levels. The design is Run A B C AB AC BC ABC 1 -1 -1 -1 (-1)(-1)= +1 (-1)(-1)= +1 (-1)(-1)= +1 (-1)(-1)(-1)= -1 2 +1 -1 -1 (+1)(-1)= -1 (+1)(-1)= -1 (-1)(-1)= +1 (+1)(-1)(-1)= +1 3 -1 +1 -1 (-1)(+1)= -1 (-1)(-1)= +1 (+1)(-1)= -1 (-1)(+1)(-1)= +1 4 +1 +1 -1 (+1)(+1)= +1 (+1)(-1)= -1 (+1)(-1)= -1 (+1)(+1)(-1)= -1 5 -1 -1 +1 (-1)(-1)= +1 (-1)(+1)= -1 (-1)(+1)= -1 (-1)(-1)(+1)= +1 6 +1 -1 +1 (+1)(-1)= -1 (+1)(+1)= +1 (-1)(+1)= -1 (+1)(-1)(+1)= -1 7 -1 +1 +1 (-1)(+1)= -1 (-1)(+1)= -1 (+1)(+1)= +1 (-1)(+1)(+1)= -1 8 +1 +1 +1 (+1)(+1)= +1 (+1)(+1)= +1 (+1)(+1)= +1 (+1)(+1)(+1)= +1 Remember the beverage filling study we talked about earlier? Now assume that each of the 3 factors has only two levels. So we have factor A (% carbonation) at levels 10% and 12%. Factor B (operating pressure) is at levels 25 psi and 30 psi. Factor C (line speed) is at levels 200 and 250. Now our experimental matrix becomes Run A: Percent carbonation B: Operating pressure C: Line speed Replicate 1 Replicate 2 Mean of obs 1 10 25 200 -3 -1 -2 2 12 25 200 0 1 0.5 3 10 30 200 -1 0 -0.5 4 12 30 200 2 3 2.5 5 10 25 250 -1 0 -0.5 6 12 25 250 2 1 1.5 7 10 30 250 1 1 1 8 12 30 250 6 5 5.5 And our design matrix is Run A B C AB AC BC ABC Replicate 1 Replicate 2 Mean of obs 1 -1 -1 -1 +1 +1 +1 -1 -3 -1 -2 2 +1 -1 -1 -1 -1 +1 +1 0 1 0.5 3 -1 +1 -1 -1 +1 -1 +1 -1 0 -0.5 4 +1 +1 -1 +1 -1 -1 -1 2 3 2.5 5 -1 -1 +1 +1 -1 -1 +1 -1 0 -0.5 6 +1 -1 +1 -1 +1 -1 -1 2 1 1.5 7 -1 +1 +1 -1 -1 +1 -1 1 1 1 8 +1 +1 +1 +1 +1 +1 +1 6 5 5.5 From this matrix, we can determine all our effects by applying the linear codes and dividing by 4, the number of +1 codes in the column. The effects are Aeffect (1)( 2) (1)(0.5) (1)( 0.5) (1)( 2.5) (1)( 0.5) (1)(1.5) (1)(1) (1)(5.5) 12 3 4 4 Beffect (1)( 2) (1)(0.5) (1)( 0.5) (1)( 2.5) (1)( 0.5) (1)(1.5) (1)(1) (1)(5.5) 9 2.25 4 4 Ceffect (1)( 2) (1)(0.5) (1)( 0.5) (1)( 2.5) (1)( 0.5) (1)(1.5) (1)(1) (1)(5.5) 7 1.75 4 4 ABeffect (1)( 2) (1)(0.5) (1)( 0.5) (1)( 2.5) (1)( 0.5) (1)(1.5) (1)(1) (1)(5.5) 3 0.75 4 4 ACeffect (1)( 2) (1)(0.5) (1)( 0.5) (1)( 2.5) (1)( 0.5) (1)(1.5) (1)(1) (1)(5.5) 1 0.25 4 4 BCeffect (1)( 2) (1)(0.5) (1)( 0.5) (1)( 2.5) (1)( 0.5) (1)(1.5) (1)(1) (1)(5.5) 2 0.5 4 4 ABCeffect (1)( 2) (1)(0.5) (1)( 0.5) (1)( 2.5) (1)( 0.5) (1)(1.5) (1)(1) (1)(5.5) 2 0.5 4 4 The ANOVA table is Source A: Percent carb B: Op Pressure C: Line speed AB AC BC ABC Error Total SS df MS p 36.00 20.25 12.25 2.25 0.25 1.00 1.00 5.00 78.00 1 1 1 1 1 1 1 8 15 36.00 20.25 12.25 2.25 0.25 1.00 1.00 0.625 <0.0001 <0.0005 0.0022 0.0943 0.5447 0.2415 0.2415 There are only 3 significant effects, factors A, B, and C. None of the interactions is significant. The regression model for soft-drink fill height deviation is Yˆ 0 1 X 1 2 X 2 3 X 3 3 2.25 1.75 1.00 X 1 X2 X3 2 2 2 Because the interactions are not significant, they are not included in the regression model. So the response surface here is a plane at each level of line speed. All along we have had at least 2 replicates for each design so we can get an error term. Without the error term, how do we create the F-ratio to test for significance? But think about it. A 24 design has 16 runs. With 2 replicates, that doubles to 32 runs. The resources need for so many runs are often not available, so some large designs are run with only 1 replicate. Now what do we do for an error term to test for effects? The idea is to pool some high-level interactions under the assumption that they are not significant anyway and use them as an error term. If indeed they are not significant, this is OK. But what if you pool them as error and they are significant? This is not OK. So it would be nice to know before we pool, which terms are actually poolable. Thanks to Cuthbert Daniel, we can do this. Daniel’s idea is to do a normal probability plot of the effects. All negligible effects will fall along a line and those that do not fall along the line are significant. So we may pool all effects that are on the line. The reasoning is that the negligible effects, like error, are normally distributed with mean 0 and variance σ2 and so will fall along the line. Let’s look at an example of a chemical product. The purpose of this experiment is to maximize the filtration rate of this product, and it is thought to be influenced by 4 factors: temperature (A), pressure (B), concentration of formaldehyde (C), and stirring rate (D). The design matrix and response are: Run A B C D AB AC BC AD BD CD ABC ABD ACD BCD ABCD Filt rate 1 -1 -1 -1 -1 +1 +1 +1 +1 +1 +1 -1 -1 -1 -1 +1 45 2 +1 -1 -1 -1 -1 -1 +1 -1 +1 +1 +1 +1 +1 -1 -1 71 3 -1 +1 -1 -1 -1 +1 -1 +1 -1 +1 +1 +1 -1 +1 -1 48 4 +1 +1 -1 -1 +1 -1 -1 -1 -1 +1 -1 -1 +1 +1 +1 65 5 -1 -1 +1 -1 +1 -1 -1 +1 +1 -1 +1 -1 +1 +1 -1 68 6 +1 -1 +1 -1 -1 +1 -1 -1 +1 -1 -1 +1 -1 +1 +1 60 7 -1 +1 +1 -1 -1 -1 +1 +1 -1 -1 -1 +1 +1 -1 +1 80 8 +1 +1 +1 -1 +1 +1 +1 -1 -1 -1 +1 -1 -1 -1 -1 65 9 -1 -1 -1 +1 +1 +1 +1 -1 -1 -1 -1 +1 +1 +1 -1 43 10 +1 -1 -1 +1 -1 -1 +1 +1 -1 -1 +1 -1 -1 +1 +1 100 11 -1 +1 -1 +1 -1 +1 -1 -1 +1 -1 +1 -1 +1 -1 +1 45 12 +1 +1 -1 +1 +1 -1 -1 +1 +1 -1 -1 +1 -1 -1 -1 104 13 -1 -1 +1 +1 +1 -1 -1 -1 -1 +1 +1 +1 -1 -1 +1 75 14 +1 -1 +1 +1 -1 +1 -1 +1 -1 +1 -1 -1 +1 -1 -1 86 15 -1 +1 +1 +1 -1 -1 +1 -1 +1 +1 -1 -1 -1 +1 -1 70 16 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 96 From this matrix, we can estimate all the effects and then do a normal probability plot of them. The effects are: A= 21.625 B= 3.125 C= 9.875 D= 14.625 AB= 0.125 AC=-18.125 AD= 16.625 BC= 2.375 BD= -0.375 CD= -1.125 ABC= 1.875 ABD= 4.125 ACD=-1.625 BCD=-2.625 ABCD=1.375 The best stab at a normal probability plot is normal probability plot of effects 100 cumulative probability 90 80 70 60 50 40 30 20 10 0 -3 0 -2 0 -1 0 0 effects 10 20 30 There are only 5 effects that are off the line. These are, in the upper right corner: C, D, AD, A, and in the lower left corner, AC. All of the points on the line are negligible, behaving like residuals. Because we drop factor B and all its interactions, we now get an ANOVA table with the extra observations as error. Source SS df MS A 1870.56 1 1870.56 C 390.06 1 390.06 D 855.56 1 855.56 AC 1314.06 1 1314.06 AD 1105.56 1 1105.56 CD 5.06 1 5.06 ACD 10.56 1 10.56 Error 179.52 8 22.44 Total 5730.94 15 p <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 Essentially, we have changed the design from a 24 design with only 1 replicate to a 23 design with two replicates. This is called projecting a higher-level design into a lower-level design. If you start with an unreplicated 2k design, then drop h of the factors, you can continue with a 2k-h design with 2h replicates. In this case, we started with a 24 design, dropped h=1 factor, and ended up with a 24-1 design with 21 replicates. The main effects plots are Factor A Factor C 85 80 75 response 75 70 70 65 65 60 60 55 55 -1 0 1 -1 Lev el of A 0 Lev el of C Factor D 80 75 response response 80 70 65 60 55 -1 0 Lev el of D 1 1 The two significant interaction plots are AD Interaction AC Interaction 100 90 85 90 80 C high 75 C high 70 65 60 85 response response D high 95 C lo w 80 75 70 65 D lo w 60 55 55 50 50 45 45 C lo w -1 D lo w D high 0 Lev el of A 1 -1 0 Lev el of A 1 Now we are going to talk about the addition of center points to a 2k design. In this case, we are looking for quadratic curvature, so we must have quantitative factors. The center points are run at 0 for each of the k factors in the design. So now the codes are -1, 0, +1. We have n replicates at the center points. Now let’s go back to the box we used earlier to describe a 22 design. +1 factor B -1 -1 +1 factor A At each corner, we have a point of the design, for example, (A-,B-), (A-,B+), (A+,B-), and (A+,B+). Now we can add center points to this design to see if there is quadratic curvature. +1 factor B 0 o -1 -1 0 +1 factor A Now in addition to the 4 points we had earlier, we have n observations at the center point: (A=0, B=0). Now if we average the 4 factorial points to get Y factorial, and then average the n center points to get Ycenter , we can tell if there is a quadratic effect by the size of Y factorial Ycenter. If this difference is small, then the center points lie on the plane established by the factorial points. If this difference is large, then there is quadratic curvature present. A single df SS for pure quadratic curvature is given by SSquad nF nc (Y factorial Ycenter ) 2 nF nc where nF is the number of design points in the 2k design and nC is the number of replicates at the center point. This SSquad = MSquad can be tested for significance by MSerror. Let’s look at an example. A chemical engineer is studying the yield of a process. The two factors of interest are reaction time (A) and reaction temperature (B). The engineer decides to run a 22 unreplicated factorial and include 5 center points to test for quadratic curvature. The design then has reaction time at 30, 35, 40 minutes and reaction temp at 150˚C, 155˚C, 160˚C. So the design points and the yield data are 40 41.5 +1 factor B 40.3 40.5 40.7 40.2 40.6 0 -1 39.3 -1 0 factor A 40.9 +1 The ANOVA table for this experiment is Source SS df MS p A (time) 2.4025 1 2.4025 0.0017 B (temp) 0.4225 1 0.4225 0.0350 AB 0.0025 1 0.0025 0.8185 Pure quad 0.0027 1 0.0027 0.8185 Error 0.1720 4 0.0430 Total 3.0022 8 In this design, Y factorial = 40.425 and Y center = 40.46. Since the difference is very small, there is no quadratic effect, so the center points may be used to get an error term to test each of the effects. 5 MS error SS error df error (Y center 1 c Yc ) nc 1 5 2 (Y center 1 c 40.46) 2 5 1 0.0430 So now this unreplicated design has an error term from the replicated center points that lie on the same plane as the factorial points. In this experiment, a first-order model is appropriate because there is no quadratic effect and no interaction. But suppose we have a situation where quadratic terms will be required and we have the following second-order model Y 0 1 X 1 2 X 2 12 X 1 X 2 11 X 12 22 X 22 But this gives 6 values of β to estimate and the 22 design with center points has only 5 independent runs. So we cannot estimate the 6 parameters unless we change the design. So we augment the design with 4 axial runs and create a central composite design to fit the second-order model. The central composite design for a 22 factorial looks like this in our box format x2, *(0,α) (-,+) * (+,+) *(α,0) o (0,0) (-α,0) (-,-) (+,-) *(0,-α) X1 We’ll talk about central composite designs later, when we cover response surface methodology. Now we’ll move on to fractional 2k factorials. A fractional factorial is a ½ fraction or a ¼ fraction or an 1/8 fraction of a complete factorial. Fractional factorials are used when a large number of factors need to be tested and higher-order interactions are considered unlikely. Fractional factorials are widely used as screening experiments, where we try to identify which factors have a major effect and which factors are not relevant. They are often used in the early stages of a project when the major features of the project are little understood. They are often followed by sequential studies to explore the project further. A ½ fraction is obtained as follows. Suppose you have three factors of interest and need a 23 design (8 runs) but for whatever reason, you cannot make 8 runs. You can however make 4 runs. So instead of a 23 design, you use a 23-1 design or 4 runs. This 23-1 design is called a ½ fraction of the 23 design. To create the 23-1 design, set up a 22 design and put the third factor in the AB interaction column. Run Factor A Factor B Factor C = AB 1 -1 -1 +1 2 +1 -1 -1 3 -1 +1 -1 4 +1 +1 +1 Now factor C is confounded with AB. You cannot separate the effect of the AB interaction from the effect of the C factor. In other words, C is aliased with AB. What may be the consequences of this confounding? 1. The AB effect and the C effect may both be large and significant but are in the opposite direction so they cancel each other out. You would never know. 2. The AB effect and the C effect may both be small but in the same direction, so the effect looks significant, but neither AB nor C separately is significant. Again you wouldn’t know. 3. One effect may be significant and the other may not, but you cannot tell which one is significant. But this isn’t all. Now where are the AC and the BC interactions? Well, multiplying the codes we get Run Factor A + BC Factor B + AC Factor C + AB 1 -1 -1 +1 2 +1 -1 -1 3 -1 +1 -1 4 +1 +1 +1 So a fractional factorial doesn’t just confound the AB effect with the C effect, it also confounds all main effects with 2-way interactions. When effects are confounded, they are called aliased. Now since A is aliased with BC, the first column actually estimates A+BC. Similarly, the second column estimates B+AC because B and AC are aliases of one another. The third column estimates the sum of the two aliases C and AB. Now there are some better fractional designs, but you have to look at the generator to see them. C=AB is called the generator of the design. Since C = AB, multiply both sides by C to get I= ABC, so I= ABC is the defining relation for the design. The defining relation is the set of columns equal to I. ABC is also called a word. The length of the defining relation word tells you the resolution of the design. The defining relation ABC is a 3-letter word so this design is of resolution III. What does design resolution mean? Design resolution tells you the degree of confounding in the design. There are three levels of resolution. Resolution III: Main effects are aliased with 2-factor interactions and 2-factor interactions may be aliased with one another. Resolution IV: Main effects are not aliased with 2-factor interactions, but 2-factor interactions are aliased with one another. Resolution V: main effects are not aliased with 2-factor interactions and 2-factor interactions are not aliased with one another. But main effects and 2-way interactions may be aliased with higher-way interactions. Of course, we would like to have the highest resolution design possible under the circumstances. You can also use the defining relation to get the aliasing. In this example, where I = ABC, we can get the aliases by multiplying any column by the defining relation. Alias of A: A*ABC = IBC=BC so A is aliased with BC Alias of B: B*ABC = AIC= AC so B is aliased with AC. Let’s look at 24-1 factorial, a resolution IV design. First we create a 23 design. Run A B C AB AC BC D=ABC 1 -1 -1 -1 +1 +1 +1 -1 2 +1 -1 -1 -1 -1 +1 +1 3 -1 +1 -1 -1 +1 -1 +1 4 +1 +1 -1 +1 -1 -1 -1 5 -1 -1 +1 +1 -1 -1 +1 6 +1 -1 +1 -1 +1 -1 -1 7 -1 +1 +1 -1 -1 +1 -1 8 +1 +1 +1 +1 +1 +1 +1 Then we alias the 4th factor with the highest level interaction. The generator for this design is D=ABC. To get the defining relation, multiply both sides by D to get I = ABCD. Since the defining relation word here is length 4, this is a resolution IV design. Now let’s look at the aliases for this design. Alias Alias Alias Alias for for for for A: B: C: D: A*ABCD = BCD B*ABCD = ACD C*ABCD = ABD D*ABCD = ABC Alias for AB: AB*ABCD = CD Alias for AC: AC*ABCD = BD Alias for BC: BC*ABCD = AD After all the aliasing, the design is Run A+BCD B+ACD C+ABD AB+CD AC+BD BC+AD D+ABC 1 -1 -1 -1 +1 +1 +1 -1 2 +1 -1 -1 -1 -1 +1 +1 3 -1 +1 -1 -1 +1 -1 +1 4 +1 +1 -1 +1 -1 -1 -1 5 -1 -1 +1 +1 -1 -1 +1 6 +1 -1 +1 -1 +1 -1 -1 7 -1 +1 +1 -1 -1 +1 -1 8 +1 +1 +1 +1 +1 +1 +1 Note that the main effects are aliased with 3-factor interactions and the 2-factor interactions are aliased with one another, so this is a resolution IV design. Now let’s look at a 25-1 factorial, a resolution V design. First we create the 24 design, and then place the 5th factor in the highest-level interaction column. Run A B C D AB AC AD BC BD CD ABC ABD ACD BCD E=ABCD 1 -1 -1 -1 -1 +1 +1 +1 +1 +1 +1 -1 -1 -1 -1 +1 2 +1 -1 -1 -1 -1 -1 -1 +1 +1 +1 +1 +1 +1 -1 -1 3 -1 +1 -1 -1 -1 +1 +1 -1 -1 +1 +1 +1 -1 +1 -1 4 +1 +1 -1 -1 +1 -1 -1 -1 -1 +1 -1 -1 +1 +1 +1 5 -1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 +1 -1 6 +1 -1 +1 -1 -1 +1 -1 -1 +1 -1 -1 +1 -1 +1 +1 7 -1 +1 +1 -1 -1 -1 +1 +1 -1 -1 -1 +1 +1 -1 +1 8 +1 +1 +1 -1 +1 +1 -1 +1 -1 -1 +1 -1 -1 -1 -1 9 -1 -1 -1 +1 +1 +1 -1 +1 -1 -1 -1 +1 +1 +1 -1 10 +1 -1 -1 +1 -1 -1 +1 +1 -1 -1 +1 -1 -1 +1 +1 11 -1 +1 -1 +1 -1 +1 -1 -1 +1 -1 +1 -1 +1 -1 +1 12 +1 +1 -1 +1 +1 -1 +1 -1 +1 -1 -1 +1 -1 -1 -1 13 -1 -1 +1 +1 +1 -1 -1 -1 -1 +1 +1 +1 -1 -1 +1 14 +1 -1 +1 +1 -1 +1 +1 -1 -1 +1 -1 -1 +1 -1 -1 15 -1 +1 +1 +1 -1 -1 -1 +1 +1 +1 -1 -1 -1 +1 -1 16 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 This is a resolution V design because the generator is E=ABCD, which makes the defining relation I = ABCDE. Now let’s check the aliases. Alias Alias Alias Alias Alias of of of of of A: B: C: D: E: A*ABCDE = BCDE B*ABCDE = ACDE C*ABCDE = ABDE D*ABCDE = ABCE E*ABCDE = ABCD Alias Alias Alias Alias Alias Alias Alias Alias Alias Alias of of of of of of of of of of AB: AC: AD: BC: BD: CD: AE: BE: CE: DE: AB*ABCDE = CDE AC*ABCDE = BDE AD*ABCDE = BCE BC*ABCDE = ADE BD*ABCDE = ACE CD*ABCDE = ABE AE*ABCDE = BCD BE*ABCDE = ACD CE*ABCDE = ABD DE*ABCDE = ABC Now the design is Run A+ BCDE B+ ACDE C+ ABDE D+ ABCE AB + CDE AC + BDE AD + BCE BC + ADE BD + ACE CD + ABE ABC +DE ABD +CE ACD +BE BCD +AE E+ ABCD 1 -1 -1 -1 -1 +1 +1 +1 +1 +1 +1 -1 -1 -1 -1 +1 2 +1 -1 -1 -1 -1 -1 -1 +1 +1 +1 +1 +1 +1 -1 -1 3 -1 +1 -1 -1 -1 +1 +1 -1 -1 +1 +1 +1 -1 +1 -1 4 +1 +1 -1 -1 +1 -1 -1 -1 -1 +1 -1 -1 +1 +1 +1 5 -1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 +1 -1 6 +1 -1 +1 -1 -1 +1 -1 -1 +1 -1 -1 +1 -1 +1 +1 7 -1 +1 +1 -1 -1 -1 +1 +1 -1 -1 -1 +1 +1 -1 +1 8 +1 +1 +1 -1 +1 +1 -1 +1 -1 -1 +1 -1 -1 -1 -1 9 -1 -1 -1 +1 +1 +1 -1 +1 -1 -1 -1 +1 +1 +1 -1 10 +1 -1 -1 +1 -1 -1 +1 +1 -1 -1 +1 -1 -1 +1 +1 11 -1 +1 -1 +1 -1 +1 -1 -1 +1 -1 +1 -1 +1 -1 +1 12 +1 +1 -1 +1 +1 -1 +1 -1 +1 -1 -1 +1 -1 -1 -1 13 -1 -1 +1 +1 +1 -1 -1 -1 -1 +1 +1 +1 -1 -1 +1 14 +1 -1 +1 +1 -1 +1 +1 -1 -1 +1 -1 -1 +1 -1 -1 15 -1 +1 +1 +1 -1 -1 -1 +1 +1 +1 -1 -1 -1 +1 -1 16 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 So the main effects are aliased with 4-factor interactions and the 2-factor interactions are aliased with the 3-factor interactions. This all seems better than a resolution III design, on the surface, but remember that these effects are still confounded and all the consequences of confounding are still there. In all of the cases we have been talking about, we have had ½ fractions. If necessary to clear up any ambiguities, we can always run the other ½ fraction. If the original ½ fraction has defining relation I=ABCD, the complementary ½ fraction has defining relation I = -ABCD. Remember the filtration rate experiment we talked about earlier which was an unreplicated 24 factorial design. We used the normal probability plot of effects to determine that temperature (A), concentration of formaldehyde (C), and stirring rate (D) were significant, along with AC and AD interactions. Now what would have happened if we had run a ½ fraction instead of the full factorial? Since the original design was 24, we use instead a 24-1 ½ fraction. The design now is Run A+BCD B+ACD C+ABD AB+CD AC+BD BC+AD D+ABC Filt rate 1 -1 -1 -1 +1 +1 +1 -1 45 2 +1 -1 -1 -1 -1 +1 +1 100 3 -1 +1 -1 -1 +1 -1 +1 45 4 +1 +1 -1 +1 -1 -1 -1 65 5 -1 -1 +1 +1 -1 -1 +1 75 6 +1 -1 +1 -1 +1 -1 -1 60 7 -1 +1 +1 -1 -1 +1 -1 80 8 +1 +1 +1 +1 +1 +1 +1 96 and the generator is D = ABC. The effect of the first column is A+BCD=(-1)45+(+1)100+(-1)45+(+1)65 +(-1)75+(+1)60+(-1)80 +(+1)96 = 76/4 = 19 The other effects are found in the same way. They are: B+ACD = 1.5 C+ABD = 14.0 D+ABC = 16.5 AB+CD = -1.0 AC+BD = 18.5 AD+BC = 19.0 But these effects are all confounded. The engineer suspects that because the B column effect is small and the A, C, and D column effects are large that the A, C, and D effects are significant. He also thinks that the significant interactions are AC and AD, not BD or BC because the B effect is so small. He may suspect, but is he right? Let’s do the complementary ½ fraction and find out. For the complementary ½ fraction, the generator is D = -ABC. So the effect confounding is Run A-BCD B-ACD C-ABD AB-CD AC-BD BC-AD D-ABC Filt rate 1 -1 -1 -1 +1 +1 +1 +1 43 2 +1 -1 -1 -1 -1 +1 -1 71 3 -1 +1 -1 -1 +1 -1 -1 48 4 +1 +1 -1 +1 -1 -1 +1 104 5 -1 -1 +1 +1 -1 -1 -1 68 6 +1 -1 +1 -1 +1 -1 +1 86 7 -1 +1 +1 -1 -1 +1 +1 70 8 +1 +1 +1 +1 +1 +1 -1 65 Now we can find the effects from this design the same way as from the original one. A-BCD B-ACD C-ABD D-ABC = = = = 24.25 4.75 5.75 12.75 AB-CD = 1.25 AC-BD = -17.75 AD-BC = 14.25 Now we can resolve the ambiguities by combining the effects from the original ½ fraction and those from the complementary ½ fraction. Since the original ½ fraction estimates A+BCD and the complementary ½ fraction estimates A-BCD, we can isolate A by averaging the two estimates. This gives [(A+BCD) + (A-BCD)]/2 = 2A/2 = A Similarly we can isolate the BCD effect by [(A+BCD) – (A-BCD)]/2 = 2BCD/2 = BCD. The unconfounded estimates are Column Original estimate Complementary estimate A 19 24.25 B 1.5 4.75 C 14 5.75 9.88 → C D 16.5 12.75 14.63 → D AB -1 1.25 AC 18.5 AD 19 1/2(Orig + Comp) 21.63 → A 3.13 → B 1/2(Orig – Comp) -2.63 → BCD -1.63 → ACD 4.13 → ABD 1.88 → ABC 0.13 → AB -1.13 → CD -17.75 -18.13 → AC -0.38 → BD 14.25 16.63 → AD 2.38 → BC So we can unconfound the effects by doing the complementary ½ fraction. This should not be surprising because the complete factorial has no confounding. Now let’s look at the ¼ fraction design. The designation for a ¼ fraction is 2k-2 fractional design. To make a ¼ fraction design, say a 26-2, we first create a 24 design and associate the extra two variables with the highest-level interactions. This means that a ¼ fraction will have two generators. In the 26-2 example, we may associate factor E with ABC and factor F with BCD. The two generators are E=ABC and F=BCD. Therefore the two defining relations are I=ABCE and I=BCDF. To get the complete defining relation, we use all columns = I, so the complete defining relation is the above two and their interaction: I=ABCE=BCDF=ADEF. Because the smallest word here is length 4, this is a resolution IV design. To find the aliases for each effect, multiply each word in the complete defining relation by that effect. For example, Aliases of A: A*ABCE= BCE A*BCDF= ABCDF A+ADEF= DEF So A= BCE=ABCDF=DEF. In ¼ fraction designs, each effect has a number of aliases. The complete alias structure for the 26-2 design with I=ABCE=BCDF=ADEF is A=BCE=DEF=ABCDF B=ACE=CDF=ABDEF C=ABE=BDF=ACDEF D=BCF=AEF=ABCDE E=ABC=ADF=BCDEF F=BCD=ADE=ABCEF ABD=CDE=ACF=BEF ACD=BDE=ABF=CEF AB=CE=ACDF=BDEF AC=BE=ABDF=CDEF AD=EF=BCDE=ABCF AE=BC=DF=ABCDEF AF=DE=BCEF=ABCD BD=CF=ACDE=ABEF BF=CD=ACEF=ABDE There are three complementary fractions for the I=ABCE=BCDF=ADEF design. They have defining relations: I = ABCE =-BCDF =-ADEF I =-ABCE = BCDF =-ADEF I =-ABCE =-BCDF = ADEF In the first and third complementary fractions, the expression –BCDF means that F is placed in the BCD column and all the signs in the BCD column are reversed. Similarly, in the second and third complementary fractions, the expression –ABCE means that E is placed in the ABC column and all the signs in the ABC column are reversed. The alias structure for the effects will now change. For example, in the first complementary fraction, the aliases of A =BCE=-DEF=-ABCDF. Whole tables of these fractional factorials exist, where you can find 1/8 fractions, 1/16 fractions, etc. So if you have a large number of factors to study and wish a small design and a huge headache, consult these tables. In fact, there are 3k designs, which can be fractionalized. In these designs, there are three levels of each factor and k factors. These designs work pretty much the same way as the 2k fractionals, except that there are complex alias relationships in 3k-1 designs that require the assumption of no interaction to be useful. In addition, the 3-level designs are quite large even for a modest number of factors, so they tend to be used only occasionally. Mostly they are used for testing quadratic relationships, but they are not the best way to do so. On the other hand, the 2k designs and their fractionals are used quite extensively in industrial experimentation, despite the confounding in the fractionals. The most vigorous proponent of fractional factorials is a Japanese gentleman named Genichi Taguchi. Taguchi has been rightly credited with popularizing experimental design in manufacturing. He has incorrectly credited himself with creating what he calls orthogonal arrays, which really are fractional factorials. Taguchi calls them L8, L16, etc. designs, depending on the number of runs. Taguchi is a proponent of quality engineering and manufacturing. His design philosophy is that all products and processes should be robust to various forms of noise during their use. For example, airplanes should fly as well in thunderstorms as they do in clear skies and cars should drive as well in the rain and snow as they do in good weather. In addition to promoting robust design, Taguchi emphasizes the reduction of variability in manufacturing and emphasizes the importance of minimizing cost. For all of this, he deserves great credit. Taguchi has designed his experiments to cover both controllable factors and uncontrollable noise. Taguchi sees each system as Signal System Noise factors Variation Response Taguchi puts the controllable factors in an inner array and the noise factors in the outer array. So his design looks like Outer Array (L4) Inner Array (L8) Run A B D 1 1 2 2 E 1 2 1 2 C 1 1 1 1 resp resp resp resp 2 2 1 1 resp resp resp resp 3 1 2 1 resp resp resp resp 4 2 2 1 resp resp resp resp 5 1 1 2 resp resp resp resp 6 2 1 2 resp resp resp resp 7 1 2 2 resp resp resp resp 8 2 2 2 resp resp resp resp Y S-N Taguchi then uses both the mean and a measure of variation he calls the SN ratio for each row. In each case, the combination of factors represented by the mean is the average over all noise combinations. This is what makes it robust. Taguchi chooses the best combination of conditions by looking at plots of factor effects. He does not believe in significance tests. Taguchi also proposes analyzing the signal-to-noise ratio for each combination of conditions. His S-N ratio for larger-the-better is 1 n 1 SN L 10 log 2 n i 1 Yi If smaller is better, the smaller-thebetter SN is 1 n 2 SN S 10 log Yi n i 1 Taguchi believes that the SN ratios separate location from variability. When he analyzes them as response variables in an ANOVA, he thinks he is both optimizing the response and reducing the variability around it. This has been shown to be completely incorrect. But Taguchi adherents still plot SN for each effect just as they do for Y. Taguchi also does not believe in interactions, although they are sometimes present in the experiments he has designed. He claims that if the engineer is working at the “energy level of the system,” there are no interactions. But since Taguchi eyeballs marginal means plots and SN plots to pick the winners, he clearly misses out on some of the best combinations if there are interactions. Another criticism of the Taguchi approach is that his combined inner and outer array setup produces very large designs. A better strategy might be to use a single design that has both controllable and noise factors and look at their interactions, as we did in the battery life experiment earlier (slide 109) where batteries had to be robust to extreme temperatures. Now let’s look further at random effects factorial experiments. You already know that a random effects model has factors with very many levels, and that the levels used in the experiment are chosen at random from all those available. Let’s take a two-factor factorial where both factors are random. In this experiment, the model is Yijk Y j k j k ijk where j = 1,2, …, J k = 1,2, …, K i = 1,2, …, n replicates and the model parameters, j , k , j k , and ijk are all independent normally distributed random variables2 with mean 2 2 2 0 and variances , , , and . The SS and MS for each factor are calculated exactly the same as for the fixed effects model. But for the F ratios, we must examine the expected MS for each of the variance components. E ( MS E ) 2 2 E ( MS A ) 2 n Kn 2 2 E ( MS B ) 2 n Jn 2 2 E ( MS AB ) 2 n To test each of these effects, we form the following F-ratios Interaction effect: A effect: MS A F MS AB B effect: F F MS AB MS E MS B MS AB Note that the main effects tests are different from those in the fixed-effects model. In the fixed effects model, each of the MS terms estimates only error variance plus its own effect, so all effects can be tested by MSE. E ( MS ) 2 E J E ( MS AB ) 2 K n ( j k ) 2 j 1 k 1 ( J 1)( K 1) J E ( MS A ) 2 Kn 2j j 1 J 1 K E ( MS B ) 2 Jn k2 k 1 K 1 Now most people do random effects models more to estimate the variance components than to test for significance. These estimates are 2 MS E 2 MS AB MS E n 2 MS A MS AB Kn 2 MS B MS AB Jn Gage R&R studies are a common industrial application of random effects models to test a measurement system. In a typical experiment of this sort, there are J parts to be measured by some gage and K operators to do the measurement with n repetitions of the measurement. In this example, there are J=20 parts to be measured, K=3 operators, and n=2 repetitions. The data are Part Operator 1 Operator 2 Operator 3 1 21 20 20 20 19 21 2 24 23 24 24 23 24 3 20 21 19 21 20 22 4 27 27 28 26 27 28 5 19 18 19 18 18 21 6 23 21 24 21 23 22 7 22 21 22 24 22 20 8 19 17 18 20 19 18 9 24 23 25 23 24 24 10 25 23 26 25 24 25 11 21 20 20 20 21 20 12 18 19 17 19 18 19 13 23 25 25 25 25 25 14 24 24 23 25 24 25 15 29 30 30 28 31 30 16 26 26 25 26 25 27 17 20 20 19 20 20 20 18 19 21 19 19 21 23 19 25 26 25 24 25 25 20 19 19 18 17 19 17 The total variability can be divided into that due to parts, to operators, and to the gage itself. Y2 2 2 2 2 2 is the variance component for parts 2 2 2 is the variance component for operators is the variance component for interaction of parts and operators is the random experimental error variance A gage R&R study is a repeatability and reproducibility study. The repeatability part of this is given by 2 because this reflects variation when the same part is measured by the same operator. 2 2 The reproducibility part is given by because this reflects the additional variability in the system from different operators using the gage. The ANOVA table for this study is Source SS df A: Parts 1185.43 19 B: Operators 2.62 2 AB 27.05 38 Error 59.50 60 Total 1274.60 119 MS p 62.39 <0.00001 1.31 0.1730 0.71 0.8614 0.99 The estimates of the variance components are 2 MS E 0.99 2 MS AB MS E 0.71 0.99 0.14 n 2 2 MS A MS AB 62.39 0.71 10.28 Kn (3)( 2) 2 MS B MS AB 1.31 0.71 0.015 Jn (20)( 2) Notice that one of the variance components is negative, which is impossible because variances are positive by definition. What can you do about this? Well, you could just call it 0 and leave the other components unchanged. Or you could notice that the interaction is insignificant and redo the ANOVA for a reduced model excluding the interaction term. The reduced ANOVA table is Source SS df MS p A: Parts 1185.43 19 62.39 <0.00001 B: Operators 2.62 2 1.31 0.2324 Error 86.55 98 0.88 Total 1274.60 119 and the new variance components are 2 MS E 0.88 2 MS A MS E 62.39 0.88 10.25 Kn (3)( 2) 2 MS B MS E 1.31 0.88 0.0108 Jn (20)( 2) Then the gage variance is 2 2 0.88 0.0108 0.8908 and the total variance is 2 2 2 0.88 0.0108 10.25 11.1408 So most of the total variance is due to variability in the product. Very little is due to operator variability or nonrepeatability from part to part. Of course, it had to come to this. If factors can be fixed and they can be random, there are certainly going to be studies that are a combination of fixed and random factors. These are called mixed models. Let’s look at a simple case where there is one fixed factor A and one random factor B. The model is Yijk Y j k j k ijk J where j is fixed so j 1 j 0 The other effects are random. However, summing the interaction components over the fixed effect = 0. That is, J j 1 j k 0 This restriction implies that some of the interaction elements at different levels of the fixed factor are not independent. This restriction makes the model a restricted model. The expected MS are E ( MS E ) 2 E ( MS AB ) 2 n 2 J E ( MS A ) 2 n 2 E ( MS B ) 2 Jn 2 Kn 2j j 1 J 1 This implies that the F-ratio for the fixed MS A factor is F MS AB But the tests for the random factor B and the AB interaction are MS B F MS E and F MS AB MS E Let’s look at a mixed effects ANOVA. Suppose we still have a gage R&R study, but now there are only 3 operators who use this gage. In this case, Operators is a fixed factor, not a random factor as we had earlier. The parts are still random, of course, because they are chosen from production randomly. We still have the same observations, so the ANOVA table is the same as before except for the p values. This is because the F-ratios are different. The F-ratio for operators is, as before, Foperators MS operators MS AB but the F-ratio for parts is now F parts MS parts MS e The conclusions are still the same. Only the parts factor is significant, which is expected because the parts are different and should have different measurements. The variance estimates are also virtually identical to those in the complete random effects model, even with the negative estimate for the AB interaction. The reduced model then produces the same results as before. So far, we have talked about several methods for reducing the residual variance by controlling nuisance variables. Remember that nuisance variables are expected to affect the response, but we don’t want that effect contaminating the effect of interest. If the nuisance factors are known and controllable, we can use blocking (most common), Latin Squares, or Graeco-Latin Squares. But suppose the nuisance variable is known but uncontrollable. Now we need a new way to compensate for its effects. This new way is called analysis of covariance or ANCOVA. Say we know that our response variable Y is linearly related to another variable X, which cannot be controlled but can be observed along with Y. Then X is called a covariate and ANCOVA adjusts the response Y for the effect of the covariate X. If we don’t make this adjustment, MSE could be inflated and thus reduce the power of the test to find real differences in Y due to treatments. ANCOVA uses both ANOVA and regressions analysis. For a one-way design, the model is Yij Y j ( X ij X ) eij . As usual, we assume that the errors are normal with constant variance and that J j 1 j 0 because we have a fixed-effect model. For ANCOVA, we also assume that β ≠ 0, so there is a linear relationship between X and Y, and that β is the same for each treatment level. The estimate of β is the pooled sum of cross-products of X and Y divided by the pooled sum of squares of the covariate within treatments: n ̂ J ( X i 1 j 1 n ij J X j )(Yij Y j ) 2 ( X X ) ij j i 1 j 1 SCPXYpooled SS Xpooled The SSE for this model is SSerror n J ( X X )( Y Y ) ij j ij j n J i 1 j 1 2 (Yij Y j ) n J i 1 j 1 ( X ij X j )2 2 i 1 j 1 with nJ-J-1 df. But if there were no treatment effect, the SSE for the reduced model would be ' SS error n J ( X X )( Y Y ) ij ij n J i 1 j 1 (Yij Y ) 2 n J i 1 j 1 ( X ij X ) 2 i 1 j 1 with nJ -2 df. 2 Note that SSE is smaller than the reduced SS’E because the full model with the treatment effect contains the additional parameters τj so SS’E – SSE is the reduction in SS due to the τj. So to test the treatment effect, we use ' ( SS error SS error ) /( J 1) F SS error /( nJ J 1) The ANCOVA table is Source Regression Treatments Error Total SS SCPXY SS X ' SS error SS error SS error n J SSTotal (Yij Y ) 2 i 1 j 1 df 1 MS p J-1 Jn – J -1 Jn – 1 Consider an experiment seeking to determine if there is a difference in the breaking strength of a fiber produced by three different machines. Clearly the breaking strength of the fiber is affected by its thickness, so the thickness of the fiber is recorded along with its strength measurement. This is a perfect example for ANCOVA. The data are Breaking strength in pounds; Diameter in 10-3 inches Machine 1 Machine 2 Machine 3 strength diameter strength diameter strength diameter 36 20 40 22 35 21 41 25 48 28 37 23 39 24 39 22 42 26 42 25 45 30 34 21 49 32 44 28 32 15 It is clear that the strength and diameter are linearly related from this plot: Scatterplot of Strength v s Diameter 50 Strength 45 40 35 30 12 15 18 21 24 Diameter 27 30 33 The ANCOVA table is Source Regression Treatments SS SCPXY 305.13 SS X SSe' SSe df MS 1 305.13 2 6.64 0.118 41.27 27.99 13.28 Error Total SSerror 27.99 n J SSTotal (Yij Y )2 346.40 i 1 j 1 11 14 p 2.54 In this case, the machines have not been shown to be different. But suppose you had ignored the relationship of the diameter to the breaking strength and done instead a simple one-way ANOVA. Source SS df Machines 140.4 2 Error 206.0 12 Total 346.4 14 MS p 70.20 0.0442 17.17 So if you had ignored the relationship of breaking strength and diameter, you would have concluded that the machines were different. Then you would have been misled into spending resources trying to equalize the strength output of the machines, when instead you should be trying to reduce the variability of the diameter of the fiber. This shows how important it is to control for nuisances. Now we are going to talk about nested designs. A nested design is one in which the levels of one factor are not identical for different levels of another factor. For example, a company purchases its raw material from 3 different suppliers, and wants to know if the purity of the material is the same for all three suppliers. There are 4 batches of raw material available from each supplier and three determinations of purity are made for each batch. The design has this hierarchical structure. Supplier 1 2 3 Batch 1 2 3 4 1 2 3 4 1 2 3 4 1st obs Y111 Y121 Y131 Y141 Y211 Y221 Y231 Y241 Y311 Y321 Y331 Y341 2nd obs Y112 Y122 Y132 Y142 Y212 Y222 Y232 Y242 Y312 Y322 Y332 Y342 3rd Obs Y113 Y123 Y133 Y143 Y213 Y223 Y233 Y243 Y313 Y323 Y333 Y343 The batches are nested within supplier. That is, batch 1 from supplier 1 has nothing to do with batch 1 from the other two suppliers. The same is true for the other three batches. So suppliers and batches are not crossed. This design is called a two-stage nested design because there is only one factor nested within one other factor. Suppliers are the first stage and batches are the second stage. It is possible to have higher-stage designs. For example, if each batch had another factor nested in it, this would become a three-stage hierarchical or nested design. The linear model for the two-stage nested design is Y Y j k ( j ) jk ( i ) The notation k ( j ) is read βk nested in j . There are J levels of factor A, as usual, and K levels of factor B in each level of factor A. There are n replicates for each level of B nested in A. The above design is a balanced nested design because there is the same number of levels of B nested within each level of A and the same number of replicates. As always, the total SS can be partitioned n J K (Y i 1 j 1 k 1 J ijk K 2 j 1 SST = SSA df J n J K Y ) Kn (Y j Y ) n (Y jk Y j ) (Yijk Y jk ) 2 2 2 j 1 k 1 i 1 j 1 k 1 + SSB(A) + SSE JKn = J-1 + J(K-1) + JK(n-1) The appropriate F-ratio for each factor depends on whether the factor is fixed or random. For fixed A and B, J E ( MS A ) 2 Kn 2j j 1 J 1 J E ( MS B ( A) ) 2 E ( MS E ) 2 K n k ( j ) j 1 k 1 J ( K 1) So the F-ratio for A is MSA / MSE and for B(A) = MSB(A) / MSE . If both factors are random, the expectations are E ( MS A ) 2 n 2 Kn 2 E ( MS B ( A) ) 2 n 2 E ( MS E ) 2 So the F-ratio for A = MSA / MSB(A) and for B(A) = MSB(A) / MSE . If we have a mixed model with A fixed and B random, the expectations are J Kn 2j E ( MS A ) 2 n 2 j 1 J 1 E ( MS B ( A) ) 2 n 2 E ( MS E ) 2 So the F-ratio for A is MSA / MSB(A) and for B(A) = MSB(A) / MSE, the same as in the fully random model. This model would be used if the batches were a random selection from each supplier’s full set of batches. Suppose this were a mixed model with the batches being a random selection from the total set of batches for each supplier. The data are Supplier 1 2 3 Batch 1 2 3 4 1 2 3 4 1 2 3 4 1st obs 1 -2 -2 1 1 0 -1 0 2 -2 1 3 2nd obs -1 -3 0 4 -2 4 0 3 4 0 -1 2 3rd Obs 0 -4 1 0 -3 2 -2 2 0 2 2 1 The ANOVA table is Source SS Suppliers 15.06 Batches (within suppliers) 69.92 Error 63.33 Total 148.31 df MS p 2 7.53 0.42 9 24 35 7.77 2.64 0.02 An examination of this table shows that only batches within suppliers is significant. If this were a real experiment, this would be an important conclusion. If the suppliers had been different in purity of their raw material, the company could just pick the best supplier. But since it is the purity from batch to batch within supplier that is different, the company has a real problem and must get the suppliers to reduce their variability. For the m-stage nested design, we can just extend the results from the 2-stage nested design. For example, suppose a foundry is studying the hardness of two metal alloy formulations. For each alloy formulation, three heats are prepared and two ingots are selected at random from each heat. Two hardness measurements are made on each ingot. This design is alloy 1 heat ingot 1 1 2 2 2 1 3 2 1 1 2 1 2 2 1 3 2 1 2 Obs 1 Obs 2 Note that the ingots (random) are nested within the heats (fixed) and the heats are nested within the alloy formulations (fixed). So this is a 3-stage nested design with 2 replicates. It is analyzed in the same way as a 2-stage design except that there is an additional factor to consider. There are some designs with both crossed and nested factors. Let’s look at an example. An industrial engineer needs to improve the assembly speed of inserting electronic components on printed circuit boards. He has designed three assembly fixtures and two workplace layouts and he randomly selects 4 operators for each fixture-layout combination. In this experiment, the 4 operators are nested under each layout and the fixtures are crossed with layouts. The design is Workplace Layout 1 Workplace Layout 2 Operator 1 2 3 4 1 2 3 4 Fixture 1 Y1111 Y1112 Y1121 Y1122 Y1131 Y1132 Y1141 Y1142 Y1211 Y1212 Y1221 Y1222 Y1231 Y1232 Y1241 Y1242 Fixture 2 Y2111 Y2112 Y2121 Y2122 Y2131 Y2132 Y2141 Y2142 Y2211 Y2212 Y2221 Y2222 Y2231 Y2232 Y2241 Y2242 Fixture 3 Y3111 Y3112 Y3121 Y3122 Y3131 Y3132 Y3141 Y3142 Y3211 Y3212 Y3221 Y3222 Y3231 Y3232 Y3241 Y3242 The model is Yijkl Y j k l ( k ) j k j l ( k ) jkl( i ) where j is the effect of the jth fixture k is the effect of the kth layout l (k ) is the effect of the lth operator within the jth layout j k is the fixture by layout interaction j l (k ) is the fixture by operators within layout interaction jkl (i ) is the error term Designs such as this are done occasionally when it is physically impossible to cross the factors. For example, in this design, the workplace layouts were in different parts of the plant so the same 4 operators could not be used for both types of layout. The fixtures, on the other hand, could be crossed with the layouts because they could be installed in both workplace layouts. Now we’ll look at split-plot designs. The split-plot design is a generalization of the randomized block design when we cannot randomize the order of the runs within the block. That is, there is a restriction on randomization. For example, a paper manufacturer is interested in the tensile strength of his paper. He has three different pulp preparation methods and four different cooking temperatures. He wants three replicates for the design. How can the paper manufacturer design his experiment? Now the pilot plant where the experiment is to be run can do only 12 runs a day. So the experimenter decides to run one replicate each day for three days. In this case, days is a block. On each day, the first method of preparation is used and when the pulp is produced, it is divided into four samples, each of which is to be cooked at one of the four cooking temperatures. Then the second method of preparation is used and when the pulp is produced, it is divided into four samples, each of which is cooked at one of the four temperatures. Finally the third method of preparation is used and when the pulp is produced, it is divided into four samples, each of which is cooked at one of the four temperatures. The design is Block 1 (day 1) Block 2 (day 2) Block 3 (day 3) Pulp Prep method 1 2 3 1 2 3 1 2 3 Temp 200 30 34 29 28 31 31 31 35 32 225 35 41 26 32 36 30 37 40 34 250 37 38 33 40 42 32 41 39 39 275 36 42 36 41 40 40 40 44 45 Each block (day) is divided into three Pulp Prep methods called main plots. Then each main plot is further divided into four cooking temperatures called subplots or split plots. So this design has three main plots and four split plots. The model for the split-plot design is Y jkl Y j k j k l j l k l j k l jkl where the second, third, and fourth terms represent the whole plot: j is blocks (factor A) k is pulp prep methods (factor B) j k is the AB interaction, which is whole plot error. The rest of the terms represent the split plot: l is the temperature (factor C) j is the AC interaction k l is the BC interaction j k l is the ABC interaction, the subplot error The expected MS for this design, with blocks random and pulp treatments and temps fixed are Whole plot E ( MS A ) 2 KL 2 K E ( MS B ) 2 L 2 JL k2 k 1 K 1 E ( MS AB ) 2 L 2 Split plot E ( MS C ) 2 K 2 L JL l2 l 1 K 1 E ( MS AC ) 2 K 2 K 2 E ( MS BC ) 2 2 E ( MS ABC ) 2 L J ( ) 2kl k 1 l 1 ( K 1)( L 1) With these expected mean squares, it is clear how to form the F-ratio for testing each effect. B is tested by AB C is tested by AC BC is tested by ABC A, AB, AC, and ABC would all be tested by MSE if it were estimable, but it is not because there is only one observation per prep method, temperature combination per day. The ANOVA table for this design is Source SS Blocks (A) 77.55 Prep method (B) 128.39 AB (whole plot error) 36.28 Temperature (C) 434.08 AC 20.67 BC 75.17 ABC (subplot error) 50.83 Total 822.97 df MS p 2 38.78 2 64.20 0.05 4 9.07 3 144.69 <0.01 6 3.45 6 12.53 0.05 12 4.24 35 The split-plot design, due to Fisher, had its origin in agriculture. There is usually a very large plot of land called a field, which is divided into subplots. If each field is planted with a different crop and different fertilizers are used in the field subplots, the crop varieties are the main treatments and the fertilizers are the subtreatments. In applications other than agriculture, there are often factors whose levels are more difficult to change than those of other factors or there are factors that require larger experimental units than others. In these cases, the hard-to-vary factors form the whole plots and the easy-to-vary factors are run in the subplots. Of course, if there are split-plots, there would have to be split-split-plots. These designs occur when there is more than one restriction on randomization. Consider the example of a researcher studying how fast a drug capsule is absorbed into the bloodstream. His study has 3 technicians, 3 dosage strengths, and four capsule wall thicknesses. In addition the researcher wants 4 replicates, so each replicate is run on a different day. So the days are blocks. Within each day (block), each technician runs three dosage strengths at the four wall thicknesses. But once a dosage strength is formulated, all the wall thicknesses must be run at that dosage level by each technician. This is a restriction on randomization. The first dosage strength is formulated and the first technician runs it at all four wall thicknesses. Then another dosage strength is formulated and this technician runs it at all four wall thicknesses. Then the third dosage strength is formulated and this technician runs it at all four wall thicknesses. On that same day, the other two technicians are doing the same thing. This procedure has two randomization restrictions within a block: technician and dosage strength. The whole plot is the technician. The dosage strengths form three subplots, and may be randomly assigned to a subplot. Within each dosage strength (subplot), there are four sub-subplots, the capsule wall thicknesses, which may be run in random order. The design is shown below and repeated for 2 more blocks (days). Technician Dosage strength Block (Day) Wall thick 1 1 2 3 4 2 1 2 3 4 1 1 2 2 3 1 2 3 3 1 2 3 Now blocks (days) is a random factor and the other factors are fixed. So the expected mean squares are: Whole plot A(blocks) : E ( MS A ) 2 JKL 2 K B(techs) : E ( MS B ) 2 KL 2 Subplot JLH k2 k 1 K 1 AB : E ( MS AB ) 2 KL 2 L C ( Dosage ) : E ( MS C ) 2 KH 2 JKH l2 l 1 L 1 AC : E ( MS AC ) 2 KH 2 K 2 BC : E ( MS BC ) 2 H 2 ABC : E ( MS ABC ) 2 H L JH ( ) 2kl k 1 l 1 ( K 1)( L 1) Sub-subplot H D(Wallthick) : E ( MS D ) 2 KL 2 JKL h2 h 1 H 1 AD : E ( MS AD ) 2 KL 2 K H JL ( ) 2kh 2 BD : E ( MS BD ) 2 L k 1 h 1 ( K 1)( H 1) 2 ABD : E ( MS ABD ) 2 L L 2 CD : E ( MS CD ) 2 K H JK ( ) 2lh l 1 h 1 (l 1)( H 1) 2 ACD : E ( MS ACD ) 2 K K 2 BCD : E ( MS BCD ) 2 2 ABCD : E ( MS ABCD ) 2 L H J ( ) 2klh k 1 l 1 h 1 ( K 1)( L 1)( H 1) From these expectations, the tests would be: MS(B) / MS(AB) MS(C) / MS(AC) MS(BC) / MS(ABC) MS(D) / MS(AD) MS(BD) / MS(ABD) MS(CD) / MS(ACD) MS(BCD) / MS(ABCD) Factor A, and all its interactions (AB, AC, AD, ABC, ABD, ACD, ABCD) would be tested by MS(Error) if it were estimable, but it is not. All along in our discussion of fixed models, we have been interested in whether the means for different treatments are different. What if we are interested in whether the variances are different in different levels of a factor? How would we handle this? We know that variances are not normally distributed so we cannot do an ANOVA on raw variances. Suppose there is an experiment in an aluminum smelter. Here alumina is added to a reaction cell with other ingredients. Four algorithms to maintain the ratio of alumina to other ingredients in the cell are under test. The response is related to cell voltage. A sensor scans cell voltage several times per second producing thousands of voltage measurements during each run of the experiment. The average cell voltage and the standard deviation of cell voltage for each ratio control algorithm were recorded for each run. The data are: Ratio control algorithm Run 1 Means Run 2 Means Run 3 Means Run 4 Means Run 5 Means Run 6 Means 1 4.93 4.86 4.75 4.95 4.79 4.88 2 4.85 4.91 4.79 4.85 4.75 4.85 3 4.83 4.88 4.90 4.75 4.82 4.90 4 4.89 4.77 4.94 4.86 4.79 4.76 Ratio control algorithm Run 1 SDs Run 2 SDs Run 3 SDs Run 4 SDs Run 5 SDs Run 6 SDs 1 0.05 0.04 0.05 0.06 0.03 0.05 2 0.04 0.02 0.03 0.05 0.03 0.02 3 0.09 0.13 0.11 0.15 0.08 0.12 4 0.03 0.04 0.05 0.05 0.03 0.02 The engineers want to test both means and standard deviations. The mean is important because it impacts cell temperature. The standard deviation, called “pot noise” by the engineers, is important because it affects overall cell efficiency. But the problem is that standard deviations are not normally distributed. The way to get around this is to do a log transformation of the standard deviation and use that for the ANOVA. Because all standard deviations are less than 1, it is best to use Y = -ln(SD), the natural log of pot noise, as the response variable. The ANOVA table for pot noise is Source RC algorithm Error Total SS df MS p 6.166 3 2.055 <0.001 1.872 20 0.094 8.038 23 It is clear that the ratio control algorithm affects pot noise. In particular, algorithm 3 produces greater pot noise than the other three algorithms. There are other occasions when it is appropriate to use transformations of the response variable, and many different transformations are available. Box and Cox have developed a set of rules for when to use which transformation. Leo Goodman introduced logit analysis to analyze frequency data in an ANOVA. Consider the problem of car insurance for teens. In America, teens pay (or their parents do) about three times as much for car insurance as adults. Teens claim that they are better drivers than adults, but the insurance companies are interested in accident rates. Consider the following frequency data: Adults Teens Total Accidents No accidents Accidents No accidents Males 2 98 16 84 200 Females 3 97 12 88 200 Total 5 195 28 172 400 Now look at the odds of a teen having an accident. The odds of a teen driver having an accident are 28:172. The odds of an adult driver having an accident are 5:195. We can test to see if teens have a higher accident rate by forming the odds ratio. We can do the same thing for males vs females. Then we can even look at the interaction of gender and age. But we can’t analyze the raw odds ratios because they are not normally distributed. When Goodman considered this, he came up with “let’s log it.” Such a transformed odds ratio has ever since been called a logit. Once you get these logits, you can do the ANOVA in the usual way. Another transformation that is useful when non-normality is suspected is the K-W rank transformation, introduced by Kruskal and Wallis. To use this technique, put the observations in ascending order and assign each observation a rank, Rij. If observations are tied, assign them the average rank. The test statistic is J R 2j N ( N 1) 2 ( N 1) 4 j 1 n j H n J N ( N 1) 2 2 Rij 4 i 1 j 1 where N = total number of observations Rj = sum of ranks for treatment j Here the denominator divided by N-1 is the variance of the ranks. H is referred to the χ2 table to determine probability. As an example, consider the following data. Level 1 Level 2 Level 3 Yi1 Ri1 Yi2 Ri2 Yi3 Ri3 7 1.5 12 5.5 14 7 7 1.5 17 9 18 11.5 15 8 12 5.5 18 11.5 11 4 18 11.5 19 14.5 9 3 18 11.5 19 14.5 Rj 18 43 59 Rj is sum of the ranks in treatment j. How did we get these ranks? Level Ordered Yij Order Rank 1 7 1 1.5 1 7 2 1.5 1 9 3 3 1 11 4 4 2 12 5 5.5 2 12 6 5.5 3 14 7 7 1 15 8 8 2 17 9 9 2 18 10 11.5 2 18 11 11.5 3 18 12 11.5 3 18 13 11.5 3 19 14 14.5 3 19 15 14.5 Now it’s a simple matter to apply the formula. J R 2j N ( N 1) 2 ( N 1) n 4 j 1 j H n J N ( N 1) 2 2 Rij 4 i 1 j 1 (15 1)1130.8 960 2391.2 8.74 2 15(15 1) 273.5 1233.5 4 The χ2 for 3-1 df at .025 = 7.38. Since the observed χ2 is > 7.38, we can reject Ho and conclude that the treatments are different. The Kruskal-Wallis rank procedure is a very powerful nonparametric alternative to ANOVA. It relies on no assumption of normality. Moreover, the rank transformation is not distorted by unusual observations (outliers), so it is very robust to all distributional assumptions. It is equivalent to doing an ANOVA on the ranks. So if in doubt about whether the random variable is normal, you should use the Kruskal-Wallis rank transformation. Now let’s take a look at Response Surface Methodology or RSM. RSM is used to analyze problems where there are several variables influencing the response and the purpose of the experiment is to optimize the response. Let’s say we’re dealing with two factors that affect the response Y. Then the model is Y = f(X1, X2) where f(X1, X2) is a response surface. In this case, the response surface is in the third dimension over the X1,X2 plane. Under the response surface, right on the X1,X2 plane, we can draw the contours of the response surface. These contours are lines of constant response. Both the response surface and the corresponding contour plot are shown in the handouts. Generally, the exact nature of the response surface is unknown and the model we decide to use is an attempt at a reasonable approximation to it. If we use a first-order model, such as Yˆ = β0 + β1X1 + β2X2 + … + βkXk we are assuming that the response is a linear function of the independent variables. If there is some curvature in the relationship, we try a second-order polynomial to fit the response. K K k 1 k 1 Yˆ 0 k X k k X k2 jk X k X j j k No model ever perfectly fits the relationship, but over a relatively small region, they seem to work pretty well. When we use response surface methods, we are looking for an optimum. If the optimum is a maximum, then we are hill-climbing toward it. In this case, we take the path of steepest ascent in the direction of the maximum increase in the response. If the optimum is a minimum, then we are going down into a valley toward it. In this case, we take the path of steepest descent in the direction of the maximum decrease in the response. So RSM methodology is sequential, continuing along the path of steepest ascent (descent) until no further increase (decrease) in response is observed. For a first-order model, the path of steepest ascent looks like X2 10 20 30 40 X1 Let’s look at an example of the method of steepest ascent. A chemical engineer is looking to improve the yield of his process. He knows that two variables affect this yield: reaction time and temperature. Currently he uses a reaction time of 35 minutes and temperature of 155˚F and his current yield is 40 percent. This engineer wants to explore between 30 and 40 minutes of reaction time and 150˚ to 160˚ temperature. To have the variables coded in the usual way (-1,+1), he uses X1 X2 1 35 5 2 155 5 He decides to use a 22 factorial design with 5 center points. His data are Natural Variables 1 2 Coded Variables Response X1 X2 Y 30 150 -1 -1 39.3 30 160 -1 +1 40.0 40 150 +1 -1 40.9 40 160 +1 +1 41.5 35 155 0 0 40.3 35 155 0 0 40.5 35 155 0 0 40.7 35 155 0 0 40.2 35 155 0 0 40.6 The fitted model is Yˆ 40.44 0.775 X 1 0.325 X 2 and the ANOVA table is Source Regression Residual Interaction Pure quadratic Pure error Total SS df MS p 2.8250 2 1.4125 0.0002 0.1772 6 0.0025 1 0.0025 0.8215 0.0027 1 0.0027 0.8142 0.1720 4 0.0430 3.0022 8 Note that this ANOVA table finds the two β coefficients significant. The error SS is obtained from the center points in the usual way. The interaction SS is found by computing β12 = ¼[(1*39.3)+(1*49.5)+(-1*40.0)+(-1*40.9) = -0.025 SSinteraction = (4*-0.025)2 / 4 = 0.0025 which was not significant. The pure quadratic test comes from comparing the average of the four points in the 22 factorial design, 40.425, with the average of the center points, 40.46. Y f Yc 40.425 40.46 0.035 SSquad n f nc (Y f Yc )2 n f nc (4)(5)( 0.035)2 0.0027 45 This quadratic effect is not significant. The purpose of testing the interaction and the quadratic effects is to make sure that a firstorder model is adequate. To move away from the design center (0,0) along the path of steepest ascent, we move 0.775 units in the X1 direction for every 0.325 units in the X2 direction. That is, the slope of the path of steepest ascent is 0.325 / 0.775 = 0.42. Now the engineer decides to use 5 minutes as the basic step size for reaction time. So when coded, this step size = 1. This changes the step size for X2 to 0.42, the slope of the path of steepest ascent. Now the engineer computes points along the path until the response decreases. Coded Natural Variables Variables Response 1 2 0 35 155 1 0.42 5 2 Origin + 1Δ 1 0.42 40 157 41.0 Origin + 2Δ 2 0.84 45 159 42.9 Origin + 3Δ 3 1.26 50 161 47.1 Origin + 4Δ 4 1.68 55 163 49.7 Origin + 5Δ 5 2.10 60 165 53.8 Origin + 6Δ 6 2.52 65 167 59.9 Origin + 7Δ 7 2.94 70 169 65.0 Origin + 8Δ 8 3.36 75 171 70.4 Origin + 9Δ 9 3.78 80 173 77.6 Origin +10Δ 10 4.20 85 175 80.3 Origin + 11Δ 11 4.62 90 177 76.2 Origin +12Δ 12 5.04 95 179 75.1 Step X1 X2 Origin 0 Step Size Δ Y These computed results are shown in the plot below. Note that from steps 1 through 10, the response is increasing, but at step 11 it begins to decrease and continues this in step 12. Y ield v s Steps along Path of Steepest Ascent 85 80 Computed Yield 75 70 65 60 55 50 45 40 0 2 4 6 Steps 8 10 12 From these computations, it is clear that the maximum is somewhere close to (response time 85, temp 175), the natural values of X1 and X2 at step 10. So the next experiment is designed in the vicinity of (85, 175). We still retain the first-order model because there was nothing to refute it in the first experiment. This time the region of exploration for 1 is (80,90) and for 2 , it is (170, 180). So the coded variables are X1 X2 1 85 5 2 175 5 Again, the design used is a 22 factorial with 5 center points. The data for this second experiment are Natural Variables Coded Variables Response 1 2 X1 X2 Y 80 170 -1 -1 76.5 80 180 -1 +1 77.0 90 170 +1 -1 78.0 90 180 +1 +1 79.5 85 175 0 0 79.9 85 175 0 0 80.3 85 175 0 0 80.0 85 175 0 0 79.7 85 175 0 0 79.8 The ANOVA table for this design is Source Regression Residual Interaction Pure quad Pure error Total SS 5.00 11.12 0.25 10.66 0.21 16.12 df MS 2 6 1 0.25 1 10.66 4 0.05 8 p 0.0955 0.0001 The first-order model fitted to the coded values is Yˆ 78.97 1.00 X 1 0.50 X 2 The first-order model is now in question because the pure quadratic term is significant. in the region tested. We may be getting the curvature because we are near the optimum. Now we need further analysis to reach the optimum. To explore further, we need a secondorder model. To analyze a secondorder model, we need a different kind of design from the one we’ve been using for our first-order model. A 22 factorial with 5 center points doesn’t have enough points to fit a second-order model. We must augment the original design with four axial points to get a central composite design or CCD. The data for this third (CCD) experiment are Natural Variables Coded Variables Response 1 2 X1 X2 Y 80 170 -1 -1 76.5 80 180 -1 +1 77.0 90 170 +1 -1 78.0 90 180 +1 +1 79.5 85 175 0 0 79.9 85 175 0 0 80.3 85 175 0 0 80.0 85 175 0 0 79.7 85 175 0 0 79.8 92.07 175 1.414 0 78.4 77.93 175 -1.414 0 75.6 85 182.07 0 1.414 78.5 85 167.93 0 -1.414 77.0 The CCD design is X2 o (0,1.414) (-1,1) (-1.414,0) o (1,1) o (0,0) (-1,-1) (1,-1) o (0,-1.414) (1.414,0) X1 The ANOVA table for this CCD design is Source SS df MS p Regression Intercept A: Time B: Temp A2 B2 AB Residual Lack of fit Pure error Total 28.25 0.50 0.28 0.22 28.75 5 1 1 1 1 1 1 7 3 4 12 5.649 <0.001 <0.001 <0.001 <0.001 <0.001 0.103 0.071 0.095 0.053 0.289 The quadratic model is significant, but not the interaction. The final equation in terms of coded values is Y = 79.94 + 0.995*X1 + 0.515*X2 -1.376*X12 -1.001*X22 + 0.25*X1X2 and in terms of actual values Yield = -1430.69 + 7.81(time) +13.27(temp) -0.055(time2) -0.04(temp2) +0.01(time*temp) The optimum turns out to be very near 175˚F and 85 minutes of reaction time, where the response is maximized. When the experiment is relatively close to the optimum, the second –order model is usually sufficient to find it. J J j 1 j 1 Yˆ 0 j X j jj X 2j jk X j X k j k How do we use this model to find the optimum point? This point will be the set of Xk for which the partial derivatives Yˆ X Yˆ X ... Yˆ X 0 . This point is called the stationary point. It could be a maximum, a minimum, or a saddle point. 1 2 k We write the second-order model in matrix notation: Yˆ 0 X ' B X ' QX where X1 X 2 . X . . X K kx1 ˆ1 ˆ2 . B . . ˆ K ˆ11 Q= ˆ12 / 2 ˆ1K / 2 ˆ / 2 ˆ22 2K . . . ̂ KK kx1 kxk symmetric In this notation, X is the vector of K variables, B is a vector of first-order regression coefficients, and Q is a k x k symmetric matrix. In Q, the diagonals are the pure quadratic coefficients and the offdiagonal elements are ½ the interaction coefficients. The derivative of Y with respect to X equated to 0 is Yˆ B 2Q X X The stationary point is the solution to 1 1 Xs Q B 2 and we can find the stationary point response by 1 ' Yˆs 0 X s Q 2 After we find the stationary point, we want to know if the point is a maximum, a minimum, or a saddle point. Moreover, we want to know the relative sensitivity of the response to the variables X. The easiest way to do this is to examine the response surface or the contour plot of the response surface. With only two variables, this is easy, but with more than two, we need another method. The formal method is called canonical analysis. It consists of first transforming the model so that the stationary point is at the origin. Then we rotate the axes about this new origin until they are parallel to the principal axes of the fitted response surface. The result of this transformation and rotation is the canonical form of the model Yˆ Yˆs 1 w12 2 w22 ... K w K2 where the {wj} are the transformed, rotated independent variables and the {λj} are the eigenvalues of the matrix Q . If all the {λj} are positive, the stationary point is a minimum. If all the {λj} are negative, the stationary point is a maximum. If the {λj} have mixed signs, the stationary point is a saddle point. The magnitude of the {λj} is important as well. The surface is steepest in the wj direction for which |λj| is greatest. Continuing in our example, recall that the final equation in terms of coded values is Y = 79.94 + 0.995*X1 + 0.515*X2 -1.376*X12 -1.001*X22 + 0.25*X1X2 In this case, X1 X X 2 0.995 B 0.515 Q= -1.376 0.1250 0.1250 -1.001 so the stationary point is 1 1 Xs Q B 2 0.389 0.306 = 1/2 -0.7345 -0.0917 -0.0917 -1.0096 0.995 0.515 0.389 0.306 is the stationary point in coded values. In the natural values, 0.389 1 85 5 1 86.95 87 2 175 0.306 5 2 176.53 176.5 So the stationary point is 87 minutes of reaction time and 176.5˚ temperature. Now we can use the canonical analysis to see whether this is a max, min, or saddle point. We can take the roots of the determinantal equation Q – λI = 0 or -1.3770- λ 0.1250 0.1250 -1.0018 – λ = 0 which is simply the quadratic equation λ2 + 2.3788 λ + 1.3639 = 0 whose roots are λ1 = -0.9641 λ2 = -1.4147 From these roots, we get the canonical form of the fitted model Yˆ 80.21 0.9641w12 1.4147 w22 Since both λ1 and λ2 are both negative within the region of exploration, we know that the stationary point is a maximum. The yield is more sensitive to changes in w2 than to changes in w1. Designs that used to fit response surfaces are called response surface designs. It is critical to have the right design if you want the best fitted response surface. The best designs have the following features: 1. Provide a reasonable distribution of points in the region of interest. 2. Allow investigation of lack of fit 3. Allow experiments to be performed in blocks 4. Allow designs of higher order to be built up sequentially 5. Provide an internal estimate of error 6. Do not require a large number of runs 7. Do not require too many levels of the independent variables 8. Ensure simplicity of calculation of model parameters Designs for first-order models include the orthogonal design, a class of designs that minimize the variance of the β coefficients. A first-order design is orthogonal if the sum of cross-products (or the covariance) of the independent variables is 0. These include 2k replicated designs or 2k designs with replicated center points. The most popular type of design for second-order models is the central composite design, CCD. The CCD is a 2k factorial design with nc center points and 2k axial points added. This is the design we used in our example after the first-order design proved inadequate. There are two parameters that need to be specified in a CCD: (1) α = nf1/4 is the distance of the axial points from the design center. nf is the number of factorial points in the design. (2) nc, the number of center point runs, usually equals 3 or 5. Another class of designs used for fitting response surfaces is the Box-Behnken design. Such a design for 3 factors is o o o o o o o o o o o o o Note that there are no points on the corners of the design. This is the three-variable Box-Behnken design Run X1 X2 X3 1 -1 -1 0 2 -1 +1 0 3 +1 -1 0 4 +1 +1 0 5 -1 0 -1 6 -1 0 +1 7 +1 0 -1 8 +1 0 +1 9 0 -1 -1 10 0 -1 +1 11 0 +1 -1 12 0 +1 +1 13 0 0 0 14 0 0 0 15 0 0 0 There are other response-surface designs as well, but the most commonly used are the CCD and the Box-Behnken. In all of these designs, the levels of each factor are independent of the levels of other factors. But what if you have a situation where the levels of the factors are not independent of another? For example, in mixture experiments, the factors are components of a mixture, so their levels cannot be independent because together they constitute the entire mixture. If you have more of a level factor A, you must have less of some other factor’s level. That is, X1 + X2 + X3 + … + XK = 1 where there are K components in the mixture. If K = 2, the factor space includes all points that lie on the line segment X1 + X2 = 1, where each component is bounded by 0 and 1. 1 X2 0 1 X1 If K = 3, the mixture space is a triangle, where the vertices are mixtures of 100% of one component. X2 1 0 1 X3 1 X1 With 3 components of the mixture, the experimental region can be represented on trilinear coordinate paper as X1 0.8 0.2 0.2 0.2 0.8 X2 0.8 X3 The type of design used for studying mixtures is called a simplex design. A simplex lattice design for K components is a set of m+1 equally spaced points from 0 to 1. That is, Xk = 0, 1/m, 2/m, …,1 for k = 1,2,…, K So the kth component may be the only one (Xk =1), may not be used at all (Xk =0), or may be somewhere in between. If there are K=3 components in the mixture, then m= 2 and the three levels of each component are Xk = 0, ½, 1. Then the simplex lattice consists of the following six runs: (1,0,0) (0,1,0) (0,0,1) (½,½,0) (½,0,½) (0,½,½) They are shown in the simplex lattice design on the next slide. A simplex lattice for K = 3 components is 1,0,0 ½,½,0 0,1,0 ½,0,½ 0,½,½ 0,0,1 Note that the three pure blends occur at the vertices and the other three points occur at the midpoints of the three sides. A criticism of the simplex lattice is that all of the runs occur on the boundary of the region and thus include only K-1 of the K components. When K = 3, the pure blends include only one of the 3 components and the other runs include only 2 of the three components. If you augment the simplex lattice design with additional points in the interior of the region, it becomes a simplex centroid design, where the mixture consists of portions of all K components. In our K =3 example, 1,0,0 ½,½,0 ½,0,½ 1/3,1/3,1/3 0,1,0 0, ½,½ 0,0,1 In the center point, the mixture is 1/3 of each component. The mixture models are constrained by K X k 1 k 1 which makes them different from the usual models for response surfaces. K The linear model is Yˆ k X k k 1 where the β coefficients represent the response to the pure blends, where only one component is present. The quadratic model isK Yˆ k X k jk X j X k k 1 j k where the β coefficients in the nonlinear portion represent either synergistic blending (+βjk) or antagonistic blending (-βjk) Higher-order terms, including cubic, are frequently necessary in mixtures because the process is generally very complex with numerous points in the interior of the simplex lattice region. Let’s look at an example of a mixture experiment with 3 components, polyethylene, polystyrene, and polypropylene, which are blended to make yarn for draperies. The response is the yarn elongation, measured as kilograms of force applied. A simple lattice design is used, with 2 replicates at each of the pure blends and 3 replicates at each of the binary blends. The data are Design Point (X1, X2, X3) Average Response (1, 0, 0) 11.7 (1/2, 1/2, 0) 15.3 (0, 1, 0) 9.4 (0, 1/2, 1/2) 10.5 (0, 0, 1) 16.4 (1/2, 0, 1/2) 16.9 The fitted model is Yˆ 11.7 X 1 9.4 X 2 16.4 X 3 19.0 X 1 X 2 11.4 X 1 X 3 9.6 X 2 X 3 The model turns out to be an adequate representation of the response. Since component 3 (polypropylene) has the largest β, it produces yarn with the highest elongation. Since β12 and β13 are positive, a blend of components 1 and 2 or of components 1 and 3 produces higher elongations than with just the pure blend alone. This is an example of synergistic effects. But a blend of components 2 and 3 is antagonistic (produces less elongation) because β23 is negative.