Dip DoE , A07 1 (a) Solutions 1.1 The spread in all four samples appears normal, apart from possible exceptionally high cases in samples A and C. Membrane B appears best and Membrane C worst, with not much difference between A and D. (b) (i) The "Membrane" mean square measures the variation between the sample means, plus chance variation. (MS(Membrane) estimates 2 + Membrane effect) The "Error" mean square measures chance variation within samples. (MS(Error) estimates 2) The F ratio measures the relative excess of between sample mean variation over chance variation. (F 1 + (Membrane effect)/2) (ii) (c) (i) Null hypothesis: All four membrane types have the same mean burst strength. Critical value: F.05, 3,36 2.85, Conclusion: 15.54 > 2.85, null hypothesis is rejected. Membrane B mean is significantly bigger than Membranes C and D means and close to significantly bigger than Membrane A mean. Membrane C mean is significantly smaller than the other three means. Membranes A and D means are not significantly different. (ii) The simultaneous confidence intervals are slightly wider than confidence interval for differences between individual pairs of means. This is because the level of confidence in several intervals simultaneously is reduced relative to the level of confidence in one of those intervals individually. Widening the intervals increases the confidence. The extent of widening may be chosen to compensate for the reduction in confidence involved. (d) Membrane C can be eliminated from our inquiries. Membrane D shows no sign of being an improvement on the existing Membrane A and so need not be considered further. Membrane B shows some improvement on Membrane A but not enough to recommend a change. It may be worth while carrying out further comparisons between Membranes A and B. Dip DoE , A07 2. (a) Solutions 2.1 An interaction between two factors occurs when the effect on the response of changing the level of one factor depends on the level of the other factor. Summary: Cooking in iron pots adds substantially to the average iron content of all cooked foods. However, it adds considerably more to the iron content of meat, around 2.5 to 2.6 milligrams per 100gms on average, than to that of legumes or vegetables, around 1.2 to 1.5. The added iron content is very similar using aluminium and clay, (marginally higher for clay), for all three food types. Alternatively: Vegetables tend to have lower iron content than either legumes or meat. Using aluminium or clay pots, the differences are similar. Using iron pots the difference from legumes is similar, while the difference for meat is much higher. The iron content of meat is slightly lower than that of legumes using aluminium or clay pots but considerably higher using iron pots. (b) Iron content includes an added contribution for each food type plus an added contribution for each pot type plus an added contribution for each food type / pot type combination, plus an added contribution due to chance variation. Alternatively: Iron content ( i , j, k) = overall mean + food type effect ( i ) + pot type effect ( j ) + food/pot interaction effect ( i , j ) + ( i , j, k ) i = Aluminium, Clay, Iron, j = Meat, Legumes, Vegetables, k = sample number 1,2,3,4. or: Yijk = + i + j + ij ijk, i = Aluminium, Clay, Iron, j = Meat, Legumes, Vegetables, k = sample number 1,2,3,4. where Y is iron content, represents food type effect, represents pot type effect, represents the effect of interaction of food type and pot type and represents chance variation. Summary: The pot type and food type effects are very highly statistically significant and the pot type / food type interaction is also highly statistically significant. (c) The adjustment makes no difference in this case because the experimental layout is balanced, that is, the same number of samples is allocated to each pot type / food type combination. Dip DoE , A07 Solutions 2.2 Adjusted sums of squares are the additional sums of squares determined by adding each particular term to the model given the other terms are already in the model. When the terms in the model are uncorrelated, the presence or absence of other terms has no effect on the sum of squares for a particular term. However, when terms are correlated, the additional sum of squares determined by adding each particular term to the model depends on which other terms are already in the model. Terms will be correlated when the experimental layout is unbalanced, in this case when there are unequal numbers of samples for different pot type / food type combinations. If the adjustment is not made, then the sums of squares associated with different terms will depend on the order in which the terms are included in the model. Thus, a term could appear to be statistically insignificant merely because another term with which it was correlated was added first. Dip DoE , A07 3. (a) Solutions 3.1 When there are possible differences between experimental units, units may be grouped into blocks that are as similar to each other as possible, with estimates of factor effects being made within each block and combined across blocks. Randomisation involves randomly allocating treatments to units and to order of experimentation with units. The purpose of blocking is to eliminate the effects of known sources of systematic variation between experimental units from the assessment of factor effects. The purpose of randomisation is to give protection against the presence of unknown sources of systematic variation that may lead to differences between experimental units. (Randomisation also provides a basis for valid statistical inference). In the experiment described, Figure 2 shows clear evidence of a decline in defect rate over time. In the experiment carried out, days were blocked in pairs, comparisons of new and old were made within each pair (block) and the comparisons combined across pairs. Combined assessment via Figure 1 shows that there was no significant difference between new and old. If blocking had not been used and one process applied for the first four weeks and the other for the second four, the observed difference that actually occurred, as shown in table 2, would have been ascribed to a difference between processes. In the experiment, allocation of new and old processes to days within pairs was done systematically, with systematic switching of order to negate the effect of the trend from one day to the next. An alternative approach would have been to randomise the order within each day pair. This would have given protection against the trend effect and also any other effect that might have been present without the knowledge of the experimenters. (b) To illustrate the interaction effect, consider an experiment to assess the effects of Training (Y,N) and use of a Script (Y,N) on the performance of telemarketing sales personnel, as reflected in the percentage of sales calls that result in a sale. The results are summarised below. 10.8 41.8 26.6 Yes Script –4.4 21.2 No 9.8 15.2 No Training 20.6 Yes Dip DoE , A07 (i) Solutions 3.2 Suppose that 21 sales representatives were available. Using the traditional "one-factor-at-a-time" approach, one trial will be run to compare both levels of one factor, at the "standard" level of the second and a second trial to compare the "standard" and "new" levels of the second factor using the "best" level of the first factor. For example, testing the script factor first, there will be two comparisons: no script, no training, script, no training, followed by another, best script, training. It makes sense to allocate 7 sales representatives to each of the three combinations involved, for balanced comparisons. The statistical "multi-factor" approach will allocate sales representatives to all four possible combinations. A balanced allocation would involve assigning 5 sales representatives to each and either not allocating the 21st or allocating the 21st arbitrarily to one of the combinations. Either way, a comparison of Script with No Script now involves a comparison of at least 10 results with at least 10, rather then 7 with 7 as in the "one-factor-at-a-time" approach and similarly with the comparison of Training with No Training. For the purpose of identifying the best levels of the two factors separately, this is a more efficient use of resources. (ii) Note that, if the traditional approach sequence in part (i) is followed, the first comparison will lead to using No Script as the best level of the first factor and the second comparison will lead to using No Script and Training as the best combination. Clearly, the best combination of Script and Training will be found by the statistically recommended "multi-factor" approach. It would not have been found by the "one-factor-at-a-time" approach as outlined in part (i) although it would if we happened to test the Training factor first. Having to depend on the choice of factor to test first in order to discover the best combination of factor levels can hardly be described as good science. N.B. Other illustrations may be used.