1 Defn: A completely randomized design (CRD) is an experimental design in which: 1) the data are treated as a random sample from a normal distribution, and 2) experimental units are randomly assigned to each experimental treatment. The optical lens experiment described above is an example of a completely randomized design. The experiment is a balanced CRD, since equal numbers of experimental units are randomly assigned to each treatment. If for some reason, the numbers of experimental units assigned to the different treatments are not equal, we have an unbalanced CRD. For an unbalanced CRD, the model is: Yij i ij , for j = 1, 2, …, ni and i = 1, 2, …, a, where a = number of levels of the experimental factor, ni = number of experimental units assigned to the ith level of the experimental factor, Yij = value of the response variable for the jth experimental unit assigned to the ith experimental treatment, µ = the grand mean of the response variable, i = the effect of the ith experimental treatment, ij = a random error term associated with the jth experimental unit assigned to the ith experimental treatment. It is assumed that the random error terms are independent and identically distributed, having a normal distribution with mean 0 and variance 2. The sums of squares for an unbalanced CRD are given by a ni SST Y i 1 j 1 2 ij Y2 a n i 1 Yi2 Y2 a , and i 1 ni ni a , SSTR i i 1 SSE = SST – SSTR. With these changes, the analysis of the data proceeds as before. Example: p. 253, Exercise 5-58. Which means differ (for the balanced CRD)? If we reject the null hypothesis, our conclusion is that there are some differences among the means. It remains to determine which means differ from each other, and how they differ. There are rigorous mathematical methods for doing this, called multiple comparison procedures. These would be covered in a more advanced course. We will look at a graphical method. 2 Assuming that the assumption of normality is correct, each of the sample means has the following distribution: Yi ~ Normal i , . n We know from the empirical rule that there is a 99.74% probability that the measured value of a normally distributed random variable is within three standard deviations on either side of the mean of the distribution. Hence, the measured value, yi , of the random variable the interval Yi is nearly certain to lie in , Yi 3 Yi 3 . n n Now, we don’t know the value of , but if the sample size is large enough, it should be sufficient to estimate using MSE , the square root of the estimated error variance, obtained from the ANOVA table. Hence, we may use the interval MSE MSE yi 3 . If we find that the sample mean for treatment group j lies , y 3 i n n outside this interval, then we conclude that j i . Example: Optical lens example, continued. Which means differ from the others? From the data in the table below, we obtain the following four intervals. a) Lens coating 1: ( 7.9769, 10.9174) b) Lens coating 2: (11.1411, 14.0817) c) Lens coating 3: (10.7183, 13.6588) d) Lens coating 4: ( 8.5826, 11.5231) We conclude that there are differences between a) 1 and 2 , b) 1 and 3 , c) 4 and 2 , and d) 4 and 3 . In particular, we can conclude that the mean increase in haziness is greater for coating 2 than for coatings 1 or 4, and greater for coating 3 than for coatings 1 or 4. We cannot conclude that there are any other differences. Groups Count Sum Average Variance Column 1 7 66.13 9.447142857 0.512957143 Column 2 7 88.28 12.61142857 0.173780952 Column 3 Column 4 7 85.32 12.18857143 4.343714286 7 70.37 10.05285714 1.694890476 3 Residual Analysis and Model Checking The single-factor ANOVA model makes the assumptions that the random error terms ij are i.i.d. Normal(0, 2). The results of the data analysis can be considered valid only if these assumptions are reasonable. In the CRD single-factor experimental design, the errors may be assumed to be independent because of random sampling and randomization. It remains to check the assumptions of normality and of common error variance. Defn: The difference between an observed value, y ij , of the response variable, Yij , and the estimated value, ŷ ij , for the observation based on fitting a particular model (such as the single-factor ANOVA model) is called the residual, eij , of the observation. A residual may be considered to be an observed value of the random error term in the model. For the single-factor ANOVA model (balanced), the residuals are given by eij y ij y i , for i = 1, 2, …, a and for j = 1, 2, …, n. We may use the set of residuals to check the assumptions of the model. First we do a normal q-q plot to check the normality assumption. Then calculate the standard deviations of the residuals for each treatment group, and compare these values, to check the assumption of constant error variance. If we know the time order in which the runs of the experiment were performed, we may check the independence assumption by doing a plot of the residuals v. time. If there is any pattern observed in the plot, we may assume that the independence assumption is violated. Example: Optical lens example (continued). We may calculate the residuals using the original table of the data together with the table of descriptive statistics for the treatment groups. The results are in the table below: Trt 1 Trt 2 Trt 3 Trt 4 -0.9271 -0.1114 -3.7386 0.6771 -0.2371 -0.7714 -1.2986 -2.0529 1.0029 0.0786 -0.6986 -0.3029 0.7829 -0.1814 0.6814 -1.3429 -0.6971 0.1686 2.3314 0.3971 -0.1271 0.5386 1.7514 1.3271 0.2029 0.2786 0.9714 1.2971 The normal Q-Q plot of residuals is shown below. Most of the residuals lie near the straight line; only the smallest residual lies some distance off the line. Hence we may conclude that the normality assumption is reasonable. 4 Residual Normal Q-Q Plot 3 Ordered Residuals 2 1 0 -3 -2 -1 -1 0 1 2 3 -2 -3 -4 -5 Standard Normal Quantiles Next we want to check the assumption of constant error variance. The sample standard deviations of the residuals for the four treatment groups are: Trt 1: 0.7162 Trt 2: 0.4169 Trt 3: 2.0842 Trt 4: 1.3019 There seems to be a problem with the assumption of common error variance. The sample standard deviation for Coating #3 is about 5 times the sample standard deviation for Coating #2. Note that the extreme residual found in the normal Q-Q plot corresponds to the first observation from Trt 3. Since the order in which the observations were taken is not known for this data, there is no way to check the assumption of independence. In practice, if it is found that an assumption of the model is violated, remedial action is taken, possibly including a functional transformation of the response variable.