Lecture & Examples Topic 2: Completely Randomized Design The completely randomized design is the simplest form of experimental designs. In a completely randomized design, each treatment is applied to each experimental unit completely by chance. Although the completely randomized design is very simple, it has many advantages: (1) It is very flexible. Any number of treatments and any number of experimental units can be used. (2) The statistical analysis is easy even if the number of experimental units in different treatments is different. (3) The statistical analysis remains simple in the presence of missing values. We can actually prove that the relative loss of information due to missing data is smaller than any other experimental designs. Completely randomized design is extremely attractive in the case when experimental units are homogeneous. For example, it is the method of choice for many laboratory experiments, e.g., in physics, chemistry, or cookery, where a quantity of material, after through mixing, is divided into small samples or batches to which the treatments are applied. Steps for Conducting an Analysis of Variance (ANOVA) for a Completely Randomized Design: Step 1: To make sure the data come from a completely randomized design. This means that each treatment is assigned to an experimental unit completely by chance. Step 2: Use a graphical procedure such as box-plots or dot-plots to visualize the equal variance assumption. The normality assumption is guaranteed if the data truly comes from a completely randomized design. Suppose that the equal variance assumption is not satisfied, you can either use a nonparametric method, discussed in Chapter 15, or find a statistical consultant to get help. The method discussed in Chapter 15 is just one alternative when the equal variance assumption is not satisfied. It is not the "best" alternative. Step 3: Create an ANOVA table that is similar to Table 10.1 below with any statistical package such as SAS (used in this lecture) or SPSS. In Table 10.1, we assume that there are p treatments and n experimental units. Table 10.1: ANOVA Table for a Completely Randomized Design Source df SS MS F Treatment SST MST(a) MST/MSE (p 1) (b) Error SSE MSE (n p) Total SS(Total) (n 1) (a) MST = SST/(p 1) (b) MSE = SSE/(n p) Note: (1) The degrees of freedom for treatment is (p 1) in a completely randomized design with p treatments. (2) The degrees of freedom for error is (n p) in a completely randomized design with p treatments and n experimental units. (3) SST + SSE = SS(Total) (4) We can complete the ANOVA table if we know any two of these three quantities: SST, SSE, or SS(Total). (5) We can complete the ANOVA table if we know both MSE and MST. (6) Partially complete ANOVA table will always be available in exams or practice problems. Step 4: Use the F-test provided in the ANOVA table to perform the following test: H0: 1 = 2 = . . . = p Ha: At least two treatment means are different. The output from SAS or SPSS always includes the pvalue of the above F-test. We can reject the null hypothesis at significance level when "p-value ." Step 5: Suppose the F-test suggests that one can reject the null hypothesis, one can then use the multiple comparison procedures discussed in Section 10.3 to find the mean differences. However, all of those multiple comparison procedures discussed are very conservative and will not be discussed this semester. Example 10.4: A manufacturer of television sets is interested in the effect on tube conductivity of four different types of coating for color picture tubes. The following conductivity data are obtained. Coating Type 1 2 3 4 143 152 134 129 Conductivity 141 150 149 137 136 132 127 132 146 143 127 129 (a) Complete the following ANOVA table. General Linear Models Procedure Dependent Variable: RESP Sum of Source DF Squares Model (1) 844.68750 Error (2) (4) Corrected Total(3) 1080.93750 Mean Square (5) (6) F Value (7) Pr > F .00029 Solution: n = 16, p = 4, df(Treatment) = (p 1), df(Error) = (n p), df(Total) = (n 1), SSE = SS(Total) SST, MST = SST/(p 1), MSE = SSE/(n p), F = MST/MSE Note: In the above SAS printout, Model = Treatment and Pr > F = p-value. General Linear Models Procedure Dependent Variable: RESP Sum of Source DF Squares Model 3 844.68750 Error 12 236.25000 Corrected Total 15 1080.93750 Mean Square 281.56250 19.68750 F Value 14.30 Pr > F .00029 (b) Test the null hypothesis that 1 = 2 = 3 = 4, against the alternative that at least two of the means differ. Use = 0.05. Solution: H0: 1 = 2 = 3 = 4 Ha: At least two means differ Test Statistic: Fc = 14.30 p-value: p-value = 0.00029 Since the p-value = 0.00029 < = 0.05, one can reject the null hypothesis. Example 10.5: A manufacturer suspects that the batches of raw material furnished by her supplier differ significantly in calcium content. There is a large number of batches currently in the warehouse. Five of these are randomly selected for study. A chemist makes five determinations on each batch and obtains the following data. Batch 1 23.46 23.48 23.56 23.39 23.40 Batch 2 23.59 23.46 23.42 23.49 23.50 Batch 3 23.51 23.64 23.46 23.52 23.49 Batch 4 23.28 23.40 23.37 23.46 23.39 Batch 5 23.29 23.46 23.37 23.32 23.38 (a) Complete the following ANOVA table. General Linear Models Procedure Dependent Variable: RESP Sum of Source DF Squares Model (1) (4) Error (2) (5) Corrected Total(3) (6) Mean Square 0.0242440 0.0043800 F Value (7) Pr > F .00363 Solution: n = 25, p = 5, SST = (MST)(p 1), SSE = (MSE)(n p), SS(Total) = SST + SSE General Linear Models Procedure Dependent Variable: RESP Sum of Source DF Squares Model 4 0.0969760 Error 20 0.0876000 Corrected Total 24 0.1845760 Mean Square 0.0242440 0.0043800 F Value 5.54 Pr > F .00363 (b) Is there a significant variation in calcium content from batch to batch? Use = 0.05. Solution: H0: 1 = 2 = 3 = 4 = 5 Ha: At least two means differ Test Statistic: Fc = 5.54 p-value: p-value = 0.00363 Since the p-value = 0.00363 < = 0.05, one can reject the null hypothesis that all means are equal. Thus, there is a significant variation in calcium content from batch to batch, for = 0.05.