Page 1 FRACTIONAL FACTORIAL (#:; ) STUDIES Motivation: For : factors, even #: gets big fast for : œ "! #: œ "!#% Example A B C D E F G H J K L M N O P Hendrix 1979 Chemtech Coating Roll Temp Solvent Polymer X-12 Preheat Web Type Coating Roll Tension Number of Chill Rolls Drying Roll Temp Humidity of Air Feed Feed Air to Dryer Preheat Dibutylfutile in Formula Surfactant in Formula Dispersant in Formula Wetting Agent in Formula Time Lapse Mixer Agitation Speed 115° vs 125° Recycled vs Refined No vs Yes LX-14 vs LB-17 30 vs 40 1 vs 2 75° vs 80° 75% vs 90% Yes vs No 12% vs 15% .5% vs 1% .1% vs .2% 1.5% vs 2.5% 10min vs 30min 100rpm vs 250rpm C œ a measure of product cold crack resistance #"& œ $#ß (') !!!!!! "Solution": Collect data for only some (a fraction) of all possible combinations of levels of the factors. Page 2 Qualitative Points That Ought to be "Obvious" + :riori: ì necessary information loss (relative to the full factorial) ì some ambiguity inevitable because of the loss ì careful planning and wise analysis needed to hold this to a minimum b ab (1) (−) a (−) Factor B (+) Example (hypothetical) ##" ... a half fraction of a # ‚ # factorial Factor A (+) Page 3 $" $ Example (hypothetical) a # ... Suppose that # factorial effects and combination means are as below: .á œ "!, !# œ $, "# œ ", ## œ #, !"## œ #, !### œ !, "### œ !, !"#### œ ! (+) μbc = 8 μb = 4 μabc = 18 μab = 14 μc = 10 μac = 12 (−) or μ(1) = 6 Factor A (+) Fa ct μa = 8 (− ) (−) C (+ ) Factor B Suppose further that one gets data adequate to essentially reveal the mean responses for combinations a, b, c and abc (the % corners circled above) but has no data on the other combinations. !# œ "right face average" "grand average" A "half-fraction version" of this might be !*# œ "available right face average" "available grand average" œ "$ "! œ$ !!!!! Here !*# œ !# !!! Something for nothing? Page 4 A similar calculation for the C main effect however gives: ##* œ "available back face average" "available grand average" œ% ???? ##* Á ## The general story behind this situation is that for this #$" fractional factorial !*# œ !# "### and ##* œ ## !"## Confounding/aliasing ... ambiguity Page 5 Issues to be Addressed in Order to Use # :; Fractional Factorials: ì how to rationally choose #"; out of #: combinations for study ì how to determine the corresponding aliasing/confounding pattern ì how to do data analysis First consider these in the context of half fractions ... then for general ; . Choice of standard half fractions of #: factorials: Write out signs for specifying levels for all possible combinations of the "first" : " factors. Then "multiply" these together for a given combination of the "first" factors to arrive at a corresponding level to use for the "last" factor. Example (#%" ) With % two-level factors A, B, C and D one proceeds as per A B C Product (used for D) Combination (1) ad bd ab cd ac bc abcd Page 6 Example Snee in 1985 ASQC Technical Supplement A B C D E Solvent/Reactant Catalyst/Reactant Temperature Reactant Purity pH of Reactant ÐÑ ÐÑ low vs high .025 vs .035 150° vs 160° 92% vs 96% 8.0 vs 8.7 C œ color index combination e a b abe c ace bce abc C Þ'$ #Þ&" #Þ') "Þ'' #Þ!' "Þ## #Þ!* "Þ*$ combination d ade bde abd cde acd bcd abcde C 'Þ(* 'Þ%( $Þ%& &Þ') &Þ## *Þ$) %Þ$! %Þ!& These are data from half of all $# combinations of # levels of each of the & factors (half of all possible labels of combinations based on the & letters a,b,c,d and e are given above, namely those involving an odd number of letters). Snee followed the standard recommendation for choosing the half fraction Page 7 Determining the "alias structure" of the half fraction (the implied pattern of ambiguities): Use a method of formal multiplication, beginning from a so-called "generator" that represents the way in which the half fraction was chosen. The generator is of the form name of "last" factor Ç product of names of "first" factors The rules of multiplication are that • letter ‚ I Ç the same letter • letter ‚ same letter Ç I Example (the #$" numerical example used above) The generator here is C Ç AB We can multiply through by C to obtain the so called "defining relation" I Ç ABC This first says that the ABC $ factor interaction !"#### is aliased with the grand mean. That is, only . á !"#### can be estimated, not !"#### alone. Multiplying through the defining relation by any set of letters of interest produces a statement of what effect(s) are "aliased with" the corresponding effect. For example, we see that A Ç BC (read "the A main effect is aliased with the BC 2 factor interaction). Similarly C Ç AB Page 8 as was illustrated earlier. In fact, the whole alias structure is I Ç ABC A Ç BC B Ç AC C Ç AB #$ effects are aliased in % pairs. The technical meaning of aliasing is that only sums of effects can be estimated, not individual effects. Example (the #%" again) With the generator D Ç ABC the defining relation is I Ç ABCD From this, e.g., we see that the AB # factor interaction is aliased with the CD 2-factor interaction. Example Snee's #&" study had generator E Ç ABCD and hence defining relation I Ç ABCDE From this one sees, e.g., that the AB #-factor interaction is aliased with the CDE $-factor interaction. Page 9 Data Analysis for Standard Half Fractions: Initially temporarily ignore the "last" factor and treating the data as a full factorial in the "first" : " factors, judge the statistical significance and practical importance of estimates derived from the Yates algorithm. Then interpret these estimates in light of the alias structure as estimates of appropriate sums of #: effects. Where there is some replication (not all #:" sample sizes are 1) confidence intervals can be made for the (sums of) effects. ^ „ >†= effect pooled where # =pooled œ 1 2:" " " " " â Ë8 8 8 8 a b ab Ð"Ñ !aa8combination "b=#combination b !a8combination "b and the appropriate degrees of freedom for > are "a8combination "b œ 8 #:" Lacking any replication, normal plotting of the output of the Yates algorithm (ignoring the "last" factor) can be used in judging statistical significance. Page 10 Example (another hypothetical # $" ) Suppose 8a œ ", Ca œ & 8b œ #, –Cb œ $, =#b œ "Þ& and 8abc œ $, –Cabc œ &Þ&, =#abc œ "Þ). 8c œ ", Cc œ #Þ& Yates applied to: #Þ& & $ &Þ& =#pooled œ ! Ð# "Ñ"Þ& ! Ð$ "Ñ"Þ) ! Ð# "Ñ ! Ð$ "Ñ Intervals: ^ „> †= effect $ pooled " #$" " " " " Ê " # " $ Page 11 &" Example Snee's # had no replicationÞ Ignoring factor E temporarily, Yates can be applied to the "' responses exactly as listed earlier. The result is estimates as belowÞ combination e a b abe c ace bce abc d ade bde abd cde acd bcd abcde C Þ'$ #Þ&" #Þ') "Þ'' #Þ!' "Þ## #Þ!* "Þ*$ 'Þ(* 'Þ%( $Þ%& &Þ') &Þ## *Þ$) %Þ$! %Þ!& Normal plot the (last "&) estimates ... estimate ("' divisor) #Þ)(& Þ)#$ "Þ#&$ Þ!&& Þ$)% Þ!'% Þ!%" Þ!!" #Þ(*$ Þ!*& Þ!%& Þ#)) Þ$"% Þ")' Þ$!' Þ)(" Page 12 Tentative engineering conclusion of Snee study: For uniform color index, attention must be paid to controlling/reducing variation in "st, Factor D, Reactant Purity #nd, Factor B, Catalyst/Reactant Ratio $rd, Factor E, pH of Reactant %th, Factor A, Solvent/Reactant Ratio Page 13 : Smaller (than half) Fractions of # Studies (# ;œ" ;œ# ;œ$ etc. :; Fractional Factorials) half fractions quarter fractions eighth fractions ... Issues (still) À ì how to rationally choose #"; of #: possible combinations of levels of : #level factors ì how to determine the corresponding aliasing/confounding pattern ì how to do data analysis Answers: the natural generalizations of the half fraction answers just discussed Page 14 Choice of standard " #; fractions of #: factorials: Write out signs for specifying levels for all possible combinations of the "first" : ; factors. Pick ; different groups of the first : ; factors. Use the products of the signs corresponding to members of the groups to specify levels for the "last" ; factors. Example Best and Hanson 1986 ASA Meeting Presentation development of a catalyst for producing ethyleneamines by the amination of monoethanolamine ÞÞÞ : œ & factors A B C D E Ni/Re Ratio Precipitant Calcining Temp Reduction Temp Support Used #Î" vs #!Î" (NH% )# CO$ vs none $!!° vs &!!° $!!° vs &!!° alpha-alumina vs silica alumina C œ % water produced ; œ # i.e. a "% fraction contemplated ... i.e. #&# œ ) out of the #& œ $# possible A, B, C, D, E combinations The (somewhat arbitrary) choice was made to use ABC sign products to choose levels of D, and BC sign products to choose levels of E. (Other choices are possible and lead to different aliasing patterns that might for some other studies be preferred by the engineer in charge.) Page 15 A B C ABC Product (for D) BC Product (for E) + + Combination e ade bd ab cd ac bce abcde The last column specifies those 8 combinations actually used in the study. The data obtained were as below. combination e ade bd ab cd ac bce abcde C )Þ(!ß ""Þ'!ß *Þ!! #'Þ)! #%Þ)) $$Þ"& #)Þ*!ß $!Þ*) $!Þ#! )Þ!!ß )Þ'* #*Þ$! C *Þ('( #'Þ)!! #%Þ))! $$Þ"&! #*Þ*%! $!Þ#!! )Þ$%& #*Þ$!! =# #Þ&%$ #Þ"'$ Þ#$) Page 16 Determining the "alias structure" of the " #; fraction: Use the method of formal multiplication, beginning from ; generators that represent the way in which the #"; fraction was chosen. To find the defining relation (the list of all products 'equivalent to' I) first convert the generators to statements of products equivalent to I, and then multiply these in pairs, then in triples, then in sets of four, etc. The letter I will have #; " equivalent products ... i.e. effects are aliased in #:; different groups of #; each. Example Hanson and Best again D Ç ABC so I Ç ABCD E Ç BC so I Ç BCE further, multiplying these two we get I † I Ç (ABCD) † (BCE) i.e. I Ç ADE So the defining relation for the catalyst study is I Ç ABCD Ç BCE Ç ADE and therefore effects are aliased in ) groups of %. For example, multiplying through the defining relation by A gives A Ç BCD Ç ABCE Ç DE and we see that, for example, the A main effect is aliased with the DE # factor interaction. Page 17 Data analysis for standard # :; studies: Initially ignore the "last" ; factors, and treating the data as a full factorial in the "first" : ; factors, judge the statistical significance and practical importance of estimates produced by the Yates algorithm. Then interpret these in light of the alias structure as estimates of appropriate sums of #: effects. With some replication, confidence intervals can be made for the (sums of) effects and used in the process of judging statistical significance. ^ „ >†= effect pooled 1 2:; " " " " â Ë8 8a 8b 8ab Ð"Ñ where (as always) # =pooled œ !aa8combination "b=#combination b !a8combination "b and the appropriate degrees of freedom for > are "a8combination "b œ 8 #:; Lacking any replication, one can normal plot estimates, looking for ones clearly of larger order of magnitude than the rest (and therefore larger than background noise as well). Page 18 Example Hanson and Best catalyst study The ) sample means, –C, listed before were in Yates standard order for factors A, B and C (the "first" : ; œ $) ignoring D and E (the "last" ; œ #). So the Yates algorithm can be applied to them in the order listed. combination e ade bd ab cd ac bce abcde C *Þ('( #'Þ)!! #%Þ))! $$Þ"&! #*Þ*%! $!Þ#!! )Þ$%& #*Þ$!! estimate #%Þ!%) &Þ)"& Þ"#* "Þ%*# Þ$** Þ&"" &Þ%*& $Þ')# sum estimated grand mean aliases A main effect aliases B main effect aliases AB interaction aliases C main effect aliases AC interaction aliases BC interaction aliases ABC interaction aliases statistical significance/detectability of these? =#pooled œ ($ ")(#Þ&%$) (# ")(#Þ"'$) (# ")(Þ#$)) œ "Þ)(# ($ ") (# ") (# ") So =pooled œ È"Þ)(# œ "Þ$'), and this can be used as a measure of background noise and as a basic ingredient of confidence intervals for the sums of effects. Page 19 =pooled has % associated degrees of freedom. So ifß e.g.ß *&% confidence intervals for the sums of effects are desired, the " / part" of the confidence interval formula becomes „ #Þ(('("Þ$')) " " " " " " " " " Ê #$ $ " " " # " # " i.e. „ "Þ"*& We therefore might judge any estimate larger in absolute value than "Þ"*& to represent a sum of effects clearly large enough to see above the background experimental variation. Note the "detectable" sums are (in order of magnitude): sum !# "#$### !"#%#### $%## "### !$## %# !"#$%##### !"#### $# !%## "#$%#### !"## #$## !#%### "$%### estimate &Þ)"& &Þ%*& $Þ')# "Þ%*# Tentative interpretations? A main effect ?? E main effect ?? D main effect ?? or AE interaction ?? ?????? (happily much smaller than the other sums) (And there are other equally plausible interpretations of the 3 large sums!) In fact a follow-up study confirmed the importance of the D main effect. Page 20 If the A (Ni/Re ratio) main effect, the E (Support Type) main effect and the D (Reduction Temp) main effect are indeed the most important determiners of C, and large C is desirable, the signs of the estimates indicate the need for "high A" (#!Î" Ni/Re ratio), "low E" (alpha-alumina support) and "high D" (&!!° reduction temp). Notice !!!! The larger ; , the larger the inevitable ambiguity of interpretation of the fractional factorial results and the more likely the need for follow-up study. Small fraction are really most useful as screening studies, to pick a few likely candidates out of many potentially important factors for subsequent more detailed study. End with an extreme example of large ; , i.e. a small fraction. Example Hendrix Chemtech study mentioned at the beginning C œ cold crack resistance of a product : œ "& factors A, B, C, D, E, F, G, H, J, K, L, M, N, O, P (factor names and levels given earlier) : ; œ %, i.e., only #% œ "' combinations were run !!!!! This was a " #"&% œ " #!%) fraction !!!! The 11 generators used were: E Ç ABCD K Ç CD P Ç AB F Ç BCD L Ç BD G Ç ACD M Ç AD H Ç ABC N Ç BC J Ç ABD O Ç AC These led to the "' combinations and (ultimately) the data below: Page 21 combination eklmnop aghjkln bfhjkmo abefgkp cfghlmp acefjlo bcegjmn abchnop dfgjnop adefhmn bdeghlo abdjlmp cdehjkp acdgkmo bcdfkln abcdefghjklmnop C "%Þ) "'Þ$ #$Þ& #$Þ* "*Þ' ")Þ' ##Þ$ ##Þ# "(Þ) ")Þ* #$Þ" #"Þ) "'Þ' "'Þ( #$Þ& #%Þ* Pretty clearly it isn't sensible to write out the whole defining relation here ... effects are going to be aliased in "' groups of #"" œ #!%) effects. But for a most tentative interpretation, let's see what we might glean if the physical system is so simple that only main effects dominate. (Physically reasonable ???? Ask the engineer, not the statistician!) The "' observations are listed in Yates order for factors A,B,C and D (ignoring the rest). We therefore begin by running them through the Yates algorithm, with the results below. Page 22 combination eklmnop aghjkln bfhjkmo abefgkp cfghlmp acefjlo bcegjmn abchnop dfgjnop adefhmn bdeghlo abdjlmp cdehjkp acdgkmo bcdfkln abcdefghjklmnop C "%Þ) "'Þ$ #$Þ& #$Þ* "*Þ' ")Þ' ##Þ$ ##Þ# "(Þ) ")Þ* #$Þ" #"Þ) "'Þ' "'Þ( #$Þ& #%Þ* estimate ("' divisor) #!Þ#) Þ"$ #Þ)( Þ!) Þ#( Þ!) Þ"* Þ$' Þ"$ Þ!$ Þ!% Þ!' Þ#' Þ#* "Þ!' Þ"" sum estimated grand mean â Aâ Bâ AB P â Câ AC O â BC N â ABC H â Dâ AD M â BD L â ABD J â CD K â ACD G â BCD F â ABCD E â There is no replication in this data set ... so we're driven to normal plotting in order to judge statistical significance of these estimates. Page 23 A normal plot of the (last 15) estimates is: Tentative interpretation: The most important factors appear to be B (Solvent) and F (# of Chill Rolls) and for large cold crack resistance "high B" (refined solvent) and "high F" (# chill rolls) appear best. (Note that the analysis does point out what is in retrospect quite obvious, namely that it is those combinations in the data set with "high B" and "high F" that have the largest C's.)