Minitab Notes for STAT 6305 Dept. of Statistics — CSU East Bay Unit 4: Block Designs as a Generalization of Paired t Tests 4.1. The Data Insurance adjusters took each of 15 damaged automobiles to both Garage 1 and Garage 2, obtaining estimates for repair at each garage. These two garages are of particular interest because they have reputations for doing good work and they are conveniently located. Data are estimated repair costs in hundreds of dollars. The issue is whether one of the garages tends to give higher bids than the other. Bids (in $100) For Repairing 15 Damaged Automobiles Garage 1 Garage 2 -------------------7.6 7.3 10.2 9.1 9.5 8.4 1.3 1.5 3.0 2.7 6.3 5.8 5.3 4.9 6.2 5.3 2.2 2.0 4.8 4.2 11.3 11.0 12.1 11.0 6.9 6.1 7.6 6.7 8.4 7.5 Note: The data as shown below are from Table 6.6 of Ott: An Introduction to Statistical Methods and Data Analysis, 4th ed., page 292. These data appear in O/L 6e, Table 6.14, page 315, but in modified form. The modification is that 10 (that is, $1000) has been added to each observation, presumably to reflect inflation in auto repair costs. It is useful to consider which computations are changed by this modifications and which are not (see problem 4.5.4). We do not know whether the original data above are real or contrived; it seems clear that the modified data are not real. These are "paired" data; each row of the data table shows two bids on the same car. The two columns are not independent: a heavily damaged car (as in rows 11 and 12) will be expensive to repair at either garage, and a car with relatively minor damage (as in row 4) will receive a lower bid at both. Minitab Notes for STAT 6305 Unit 4-2 Problems 4.1.1. This problem deals with preparation of the data for exploration and analysis. (a) Label c1 and c2 of a Minitab worksheet as Gar1 and Gar 2, respectively. Then cut these data from Word or your browser and paste them into c1 and c2. (When pasting, put the cursor in the first row of c1; use spaces as delimiters). (b) Anticipating our work in the next section, put the differences in bids into c3, labeled Diff. CALC > Calculator, results in c3, expression c1 - c2 MTB > let c3 = c1 - c2 (c) Print the results to the Session Window and verify that they are correct. Your output should be similar to that shown below, except that the display shown here has been edited into three panels for more compact display. DATA > Display Data MTB > print c1 - c3 ROW Gar1 Gar2 Diff ROW Gar1 Gar2 Diff ROW Gar1 Gar2 Diff 1 2 3 4 5 7.6 10.2 9.5 1.3 3.0 7.3 9.1 8.4 1.5 2.7 0.3 1.1 1.1 -0.2 0.3 6 7 8 9 10 6.3 5.3 6.2 2.2 4.8 5.8 4.9 5.3 2.0 4.2 0.5 0.4 0.9 0.2 0.6 11 12 13 14 15 11.3 12.1 6.9 7.6 8.4 11.0 11.0 6.1 6.7 7.5 0.3 1.1 0.8 0.9 0.9 4.1.2. Suppose that you have 30 randomly chosen damaged cars available. How could this experiment have been designed in order to produce data that are not paired? Rewrite the first sentence of this section (above the data) to indicate the experimental procedure you have in mind to produce two independent columns of data. Which design, paired or independent, do you think is best for the purpose at hand, and why? 4.1.3. Return now to the original paired-data experiment. It can be viewed as a two-factor experiment. One factor is Garage and the other is Car. In Units 1 and 2 we dealt with fixed factors, the levels of which are determined to be of particular interest to the experimenters. For example. the three levels of type of hot dogs in Unit 2 were Beef, Meat, and Poultry. In Unit 3 we dealt with a random factor, in which the levels were four randomly chosen batches. The individual batches are of interest only insofar as they may reflect random batch-to-batch variability. In the current experiment, one of the factors is fixed and the other is random. Which is the fixed factor, which is the random factor, and why? 4.2. Descriptive Techniques Separate dotplots for the two repair shops mainly show the high variability among the 15 cars as to the amount of damage to be repaired. But a comparison of the two dotplots shows no important difference between the garages. This graphical comparison is ineffective partly because we do not know which dot in one plot compares to which dot in the other; the paired nature of the data is obscured. Minitab Notes for STAT 6305 Unit 4-3 > Dotplot > Multiple Y's simple, select 'Gar1' and 'Gar2' (GPRO plot) gstd dotp c1 c2; same.Dotplot GRAPH MTB > MTB > SUBC> . . . . . .. . : . . . . . ---+---------+---------+---------+---------+---------+---Gar1 . . . . . . . . . .. . . : ---+---------+---------+---------+---------+---------+---Gar2 2.0 4.0 6.0 8.0 10.0 12.0 However, if we look at a dotplot of the differences (below), we see that all but one are positive. Thus Garage 1 tends to give higher bids than Garage 2. From this dotplot we can also see that, on average for our small sample of cars, bids from Garage 1 are about $60 higher than those from Garage 2. If the two garages have equivalent bidding rules, it is difficult to imagine that such a large difference resulted just because of random error. GRAPH > Dotplot > One Y Simple (GPRO plot) MTB > gstd MTB > dotp c3 Dotplot . . . . . : . . . . : : +---------+---------+---------+---------+---------+-------Diff -0.25 0.00 0.25 0.50 0.75 1.00 Problems 4.2.1. It was claimed above that we should expect the bids by the two garages to be associated. (For example, a heavily damaged car will get a high bid at both garages.) Make a scatterplot of Gar1 vs. Gar2 and describe what you see. (Where do the points lie relative to the line Gar 1 = Gar2?) Note that the syntax for a scatterplot in gpro mode requires an asterisk (*) between the column numbers. (Menu path and commands shown; results not shown.) Graph > Plot MTB gpro # (if needed) MTB plot c1 * c2 4.2.2. Reproduce the two dotplots on the same scale as shown above. Make a printout and connect each dot for Garage 1 with the corresponding dot (same damaged car) for Garage 2. (The plot at the top of the next page shows similar information; it was made using a procedure intended for another purpose, so the label Means should read Bid.) Minitab Notes for STAT 6305 Unit 4-4 Plot Showing the Paired Nature of the Data Auto 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 12 10 Mean 8 6 4 2 0 1 2 Gar 4.3. Paired t Test Assuming that the data are approximately normally distributed, an appropriate test to confirm what we saw in the dotplot of the differences is a paired t test. In Minitab, this can be performed as a onesample t test of the null hypothesis that the population of differences has zero mean. We conclude that the difference between garages is highly significant, as indicated by the P-value < 0.00005. STAT > Basic > 1-sample t, test mean = 0 MTB > ttest 0 c3 TEST OF MU = 0.000 VS MU N.E. 0.000 N 15 Diff MEAN 0.613 STDEV 0.394 SE MEAN 0.102 T 6.02 P VALUE 0.0000 Recent releases of Minitab have a paired procedure that finds the differences and performs the above test on them with a single command (or menu box). The two columns used must be of equal length. Compare the printout below with the one just above. STAT > Basic > Paired-t MTB > Paired c1 c2 Paired T-Test and CI: Gar1, Gar2 Paired T for Gar1 - Gar2 Gar1 Gar2 Difference N 15 15 15 Mean 6.84667 6.23333 0.613333 StDev 3.20399 2.94125 0.394365 SE Mean 0.82727 0.75943 0.101825 95% CI for mean difference: (0.394941, 0.831725) T-Test of mean difference = 0 (vs not = 0): T-Value = 6.02 P-Value = 0.000 Minitab Notes for STAT 6305 Unit 4-5 It is crucial to take the paired nature of the data into account. If we had incorrectly analyzed these data using a two-sample t test, we would have failed to detect the significant effect; note the 0.589 P-value below. (This is the computational equivalent of the incorrect and futile graphical comparison of the separate dotplots for the two garages that we made in Section 4.2.) INCORRECT PROCEDURE STAT > Basic > 2-sample t, different columns, assume equal variances MTB > twos c1 c2; SUBC> pool. Two-Sample T-Test and CI: Gar1, Gar2 Two-sample T for Gar1 vs Gar2 Gar1 Gar2 N 15 15 Mean 6.85 6.23 StDev 3.20 2.94 SE Mean 0.83 0.76 Difference = mu (Gar1) - mu (Gar2) Estimate for difference: 0.613333 95% CI for difference: (-1.687000, 2.913667) T-Test of difference = 0 (vs not =): T-Value = 0.55 Both use Pooled StDev = 3.0754 P-Value = 0.589 DF = 28 Problems 4.3.1. Looking at the dotplot of differences one may legitimately wonder whether they are normally distributed. Perform one of the three formal tests of normality available in the menu path STAT > Basic > Normality. Give the P-value and say what it indicates. Also try the normal probability plot in the menu path GRAPH > Normal probability plot. What device does the latter plot use to help you judge whether the data are normal? Note: The null hypothesis of normality is not rejected. But even if this hypothesis had been rejected at the borderline, many statisticians would say that the conclusion from the t test is still valid. The criteria for using the t test are that (i) the sample size is moderately large, (ii) there is no obvious skewness, and (iii) there are no outliers. The Central Limit Theorem suggests that the sample mean would be nearly normally distributed, even if the original data were not. These criteria indicate that the t statistic has nearly a t-distribution. 4.3.2. If one chose to do a nonparametric test, either a sign test (STAT > Nonparametrics > 1-Sample Sign; command stest 0 c3) or a Wilcoxon signed-rank test (STAT ð Nonparametrics > 1-Sample Wilcoxon; command wtest 0 c3) would be appropriate. Remember that, for these tests, the null hypothesis is that the population of differences has zero median. Perform both tests, give P-values, and interpret the results. 4.3.3. In Problem 3.1.4 you were asked to consider redesigning this experiment so that the data are in two independent groups. Could data collected in this way be correctly analyzed according to the two-sample t procedure shown at the end of this section? Explain briefly. Minitab Notes for STAT 6305 Unit 4-6 4.4. Stacked Data An important goal of this unit is to show that a randomized block design is a generalization of the paired t test shown above. However, we need to put the data into stacked format before we can continue. [Instead of the last subcommand you could use the set command with data (1:2)15.] MTB > name c11 'Bid' c12 'Garage' MTB > stack c1 c2 c11; SUBC> subs c12. But there is one more step in order for the stacked data to fully represent the structure of the experiment. We need to identify which car produced which bids. This requires another column labeled Car. (The results are shown after the problems for this section.) MTB > MTB > DATA> DATA> name c13 'Car' set c13 2(1:15) end Now we have Bids in c11, subscripts (1 and 2) for Garages in c12, and subscripts (1 through 15) for Cars in c13. Below is a printout of the stacked data. Study it carefully to make sure that you understand how all of the information in the columns Gar1 and Gar2 has been re-expressed in the columns Bid, Garage, and Car. (For your convenience, columns Gar1 and Gar2 are shown again.) STACKED FORMAT ----------------------------ROW Bid Garage Car UNSTACKED FORMAT -------------------ROW Gar1 Gar2 1 2 3 4 5 7.6 10.2 9.5 1.3 3.0 1 1 1 1 1 1 2 3 4 5 1 2 3 4 5 7.6 10.2 9.5 1.3 3.0 7.3 9.1 8.4 1.5 2.7 6 7 8 9 10 6.3 5.3 6.2 2.2 4.8 1 1 1 1 1 6 7 8 9 10 6 7 8 9 10 6.3 5.3 6.2 2.2 4.8 5.8 4.9 5.3 2.0 4.2 11 12 13 14 15 11.3 12.1 6.9 7.6 8.4 1 1 1 1 1 11 12 13 14 15 11 12 13 14 15 11.3 12.1 6.9 7.6 8.4 11.0 11.0 6.1 6.7 7.5 16 17 18 19 20 7.3 9.1 8.4 1.5 2.7 2 2 2 2 2 1 2 3 4 5 21 22 23 24 25 5.8 4.9 5.3 2.0 4.2 2 2 2 2 2 6 7 8 9 10 26 27 28 29 30 11.0 11.0 6.1 6.7 7.5 2 2 2 2 2 11 12 13 14 15 Minitab Notes for STAT 6305 Unit 4-7 In stacked format each row gives full information on how a particular bid arose—its amount, which garage, and which car. The order in which these self-sufficient rows are recorded in the worksheet is not important for an analysis of variance. For example, as long as each row stays intact, it is not necessary for all of the bids for Garage 1 to appear first, nor for the Cars to be listed in any particular order. However, if the data were collected in a particular known order (one would hope based on randomization), then the actual order of collection should be reflected in the order of the rows of the stacked format in the worksheet. (In each format, locate the data for the first four five cars.) Problems 4.4.1. Which format would you choose if you were presenting data in a report, stacked or unstacked? Give reasons briefly. 4.4.2. How could the data table at the very beginning of this unit be modified to make it clearer to the reader that the data are paired and that each row shows the results for a particular car? 4.4.3. In conducting an experiment such as this one, damaged cars would be sampled and then each car would be taken to the two garages to get bids for repair. Perhaps the toss of a coin would decide whether a particular car was taken to Garage 1 or Garage 2 first. Which format, stacked or unstacked, would be natural for recording the results of this experiment as data collection progresses? 4.5. Block Design The randomized block design, explained in Ott/Longnecker (Sec. 15.3), is a generalization of the two-sided paired t test that is able to handle more than two treatments. The word block should be considered as a generalization of the word pair. When paired data are treated according to a block design, the conclusion should be the same as for the paired t test. We use Minitab's balanced ANOVA procedure to analyze the data as a block design. STAT > ANOVA > Balanced, Response: Bid, Model: Garage Car, Random: Car, Storage: Residuals MTB > anova c11 = c12 c13; SUBC> random c13; SUBC> restrict; SUBC> resids c14. ANOVA: Bid versus Garage, Car Factor Garage Car Type fixed random Factor Garage Car Values 1, 2 1, 2, Levels 2 15 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 Analysis of Variance for Bid Source Garage Car Error Total DF 1 14 14 29 S = 0.278858 SS 2.821 263.742 1.089 267.652 MS 2.821 18.839 0.078 R-Sq = 99.59% F 36.28 242.26 P 0.000 0.000 R-Sq(adj) = 99.16% Minitab Notes for STAT 6305 Unit 4-8 Notes on Minitab's anova procedure: • The notation for the model (in the "Model" box for the menu, or after the equal sign (=) for the command) is shorthand for the model: Yij = µ + αi + Bj + eij, where i = 1, 2; j = 1, ..., 15; Σαi = 0; Bj iid N(0, σB2); and eij iid N(0, σ2). A grand mean µ and an error term must be present in every ANOVA model, and so there is no need for these terms to appear in the syntax. What we must mention is c12 (Garage) and c13 (Car), together with the column of measurements c11 (Bid). • We take Garage to be a fixed effect (Greek letter α) because the two garages are of specific interest. Perhaps they are the garages most conveniently located to an insurance company that routinely seeks bids. The restriction that the α's sum to 0 is expressed in the subcommand restrict. • We take Car to be a random effect (Latin letter B) because the 15 cars are chosen at random. Minitab requires random effects to be declared using the subcommand random. • With a design as simple as the current one, it happens that no damage would have resulted in the output if the commands random and restrict had been omitted. However, we shall soon see models for which these specifications are crucial, and it is a good idea to start now to include all of the specifications needed to make the model correct. • The residuals subcommand is used to store the residuals from the model in empty column c14. (When using the subcommand residuals be careful to use an empty column because this subcommand will overwrite data without warning. With menus, Minitab automatically selects an empty column for the residuals, but it may not be especially conveniently located in the worksheet.) The F-ratio for testing the Garage effect is F = 2.8213 / 0.0778 = 36.3. The corresponding P-value, printed as 0.000 indicates a value smaller than 0.0005. Clearly, the difference between the garages is highly significant. This F test is equivalent to the paired t test performed above. F = 36.3 = (6.02)2 is the square of the t-statistic for the paired t test. (Depending on the release of Minitab and the procedure used, the P-value may be written as 0.0000, meaning less than 0.00005, or 0.000, but even if expressed to different numbers of decimal places, the value represented are actually the same.) The Car effect ("block effect") can be tested with the statistic F = 18.8387 / 0.0778 = 242.26; it is very highly significant. Of course, this is no surprise. We used Cars as blocks precisely because we anticipated large differences from car to car. The reason for introducing the blocking factor into an experimental design is the suspicion that it will be significant, thus helping to explain variability that would otherwise have wound up inflating MS(Error), making the F-ratio for the "main" effect small, and possibly leading to a Type II error. (See Problem 4.5.4 for an illustration.) If Blocks are not significant, it may be a signal that a block design should not have been used. In performing any ANOVA you should always generate the residuals and look at them graphically, usually starting with a boxplot and a normal probability plot. The symmetry of the normal probability plot results from the fact that there are only two observations per Car and thus each car gives two residuals equal in absolute value and opposite in sign. Minitab Notes for STAT 6305 Unit 4-9 The normal probability plot of the residuals is shown below. It is not precisely linear, but the departure from normal is not extreme. (Various tests of normality have P-values around 0.1. The result of the Anderson-Darling test is shown as part of the plot below.) Such a plot can be made from Graphs option in menu box for the balanced ANOVA. or, after storing residuals, or by using the menu path shown below . Without some practice, the precision of the high resolution plot (and the way it treats repeated values) can lead to over-interpretation of small deviations from linearity. It is a good idea for beginners to include "error bands" in their high resolution plots. (See the graph on the next page. GRAPH > Probability plot > Single, select residuals Normal Probability Plot of Residuals Normal - 95% CI ANOVA for Block Design 99 Mean StDev N AD P-Value 95 90 -5.18104E-17 0.1938 30 0.614 0.100 Percent 80 70 60 50 40 30 20 10 5 1 -0.75 -0.50 -0.25 0.00 RESI1 0.25 0.50 Problems 4.5.1. Suppose we want to use the fixed significance level 1% for testing whether there is a difference between the two garages. Find the critical values for both the paired t test and the F test of the block procedure. Show that the square of the critical values of t is the critical value of F. Note: Critical values for a given significance level can be obtained by using the invcdf command; or by selecting CALC ð Probability distributions ð t (or F), inverse from menus. The inverse cumulative distribution function gives the cut-off value that has a designated probability below; adjust appropriately to get the critical values for each test. The t-distribution has 14 degrees of freedom; there is 1 degree of freedom in the numerator and there are 14 degrees of freedom in the denominator of the F-distribution. cumulative 4.5.2. Use Minitab's general linear model procedure (STAT > ANOVA > GLM) to get mainly the same output as at the beginning of this section. Notice that the standardized residuals for one of the cars are (barely) greater than 2 in absolute value and hence the observations for that car are noted as Minitab Notes for STAT 6305 Unit 4-10 "unusual." Which car? Make a boxplot of the residuals (or standardized residuals). What outliers, if any, are indicated? 4.5.3. Earlier in this unit we looked at nonparametric alternatives to the one-sample t test. There are also nonparametric alternatives for analyzing a block design. Perhaps the best known of them is the Friedman test (menu path STAT ð Nonparametrics ð Friedman or command friedman c11 c12 c13). Do the Friedman test for the present data and interpret the result. 4.5.4. Read the footnote to the data table in section 4.1. Add 10 to each observation shown in the table and repeat the ANOVA of section 4.5 for the modified data. Comment on which results changed and which did not. Would a linear transformation of the data affect the F-statistic? 4.5.5. In this problem we ask you to perform two INCORRECT procedures so that you will understand the consequences of ignoring or misunderstanding the block structure of an experimental design. What result do you obtain if you analyze the data (as given in section 4.1): • As a one-way ANOVA, ignoring Cars? • As a one-way ANOVA, ignoring Garages? Why is each of these analyses incorrect? Which one corresponds to the incorrect two-sample t test done at the end of Section 4.3? Finally, compare the values of MS(Error) for the correct analysis as a block design and for the incorrect analysis ignoring Cars, and comment. In this comparison, what two sums of squares in the correct ANOVA table sum to SS(Error) in the incorrect one? Can the correct ANOVA table be constructed from information in the two incorrect tables? 4.5.6. For some years, the "Taster's Choice" column was a regular feature in the Wednesday Food Section of the San Francisco Chronicle. The column of December 17, 2003 reported results when five professional tasters rated four brands of creamy horseradish (a common condiment for prime rib of beef), declaring Mezzetta as the "winner." The taster's scores are shown below. Treat Tasters as random blocks (analogous to Cars in the example of this unit) and Brands of horseradish as the fixed main effect (analogous of Garages). Each score is out of a total of 20 points. According to the appropriate ANOVA, are there any statistically significant differences among brands? (Because there are now more than two observations per block a paired t test is no longer an option. In a subsequent unit we take a more careful look at such block designs.) BRAND ---------------------------------------TASTER Mezzetta Sosnick Beaver Tulelake -----------------------------------------------------Carroll 16 13 16 3 Halperin 19 10 9 13 Hatfield 18 14 12 16 Katzl 16 16 14 16 Passot 16 16 16 9 Minitab Notes for Statistics 6350 by Bruce E. Trumbo, Department of Statistics, CSU East Bay, Hayward CA, 94542, Email: bruce.trumbo@csueastbay.edu. Comments and corrections welcome. Copyright © 1991, 2010 by Bruce E. Trumbo. All rights reserved. These notes are intended for use by students at California State University, East Bay. Please contact the author for permission to use elsewhere. Preparation partially supported by NSF grant USE-9150433. Modified: 1/10