Design of Experiment in R Full Factorial designs. The following example is based on an example on battery life in Chapter 5 of the textbook "Design and Analysis of Experiments" by Douglas Montgomery. To make the experiment more relevant to life sciences, we'll consider the following experiment. We are concerned that bacteria are contaminating cell cultures, causing failures and unreliable results in our experiments. We think that the type of material in the cell culture containers (Material) and the temperature of the cell culture containers (Temperature) affect the survival of the bacteria (Life). We perform an experiment to examine the following. Temperature: 3 levels: 15, 70, 125 degrees F Material: 3 types: 1, 2, 3 Response: survival time of viable bacteria (Life) setwd("C:/Users/Walker/Desktop/UCSD Biom 285/Data") bacterialife <- read.table("bacterialife.txt",header=TRUE) > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 bacterialife Life Material Temperature 130 1 15 74 1 15 155 1 15 180 1 15 150 2 15 159 2 15 188 2 15 126 2 15 138 3 15 168 3 15 110 3 15 160 3 15 34 1 70 80 1 70 40 1 70 75 1 70 136 2 70 106 2 70 122 2 70 115 2 70 174 3 70 150 3 70 120 3 70 139 3 70 25 26 27 28 29 30 31 32 33 34 35 36 20 82 70 58 25 58 70 45 96 82 104 60 1 1 1 1 2 2 2 2 3 3 3 3 125 125 125 125 125 125 125 125 125 125 125 125 We don't know if the Temperature effect will be linear, so we'll re-define Temperature to be a factor. Material is coded as 1, 2, 3 , so R would treat Material as a quantitative variable. We want it to be a factor, too. bacterialife$Material <- as.factor(bacterialife$Material) bacterialife$Temperature <- as.factor(bacterialife$Temperature) summary(bacterialife) > summary(bacterialife) Life Material Min. : 20.0 1:12 1st Qu.: 70.0 2:12 Median :108.0 3:12 Mean :105.5 3rd Qu.:141.8 Max. :188.0 Temperature 15 :12 70 :12 125:12 Here is the two-way ANOVA model, including an interaction effect. Notice that Life~Material*Temperature is equivalent to Life~Material+Temperature+ Material*Temperature. bacterialife.aov <- aov(Life~Material*Temperature,data=bacterialife) summary(bacterialife.aov) > summary(bacterialife.aov) Df Sum Sq Mean Sq F value Pr(>F) Material 2 10684 5341.9 7.9114 0.001976 ** Temperature 2 39119 19559.4 28.9677 1.909e-07 *** Material:Temperature 4 9614 2403.4 3.5595 0.018611 * Residuals 27 18231 675.2 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Create an interaction plot: with(bacterialife, interaction.plot(Temperature,Material,Life,type="b",pch=19, fixed=T,xlab="Temperature (◦F)",ylab="Average life")) Average life decreases as temperature increases, with Material type 3 leading to extended bacteria life compared to the other, especially at higher temperature, hence the interaction effect. Material 1 seems to give short bacterial life, at either 70 or 125 degrees. Material 2 is effective at 125 degrees, but not at lower temperatures. Another useful plot: plot.design(Life~Material*Temperature,data=bacterialife) We can use Tukey's test to determine which pairwise differences are significant among the material types. TukeyHSD(bacterialife.aov,which="Material") > TukeyHSD(bacterialife.aov,which="Material") Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Life ~ Material * Temperature, data = bacterialife) $Material diff lwr upr p adj 2-1 25.16667 -1.135677 51.46901 0.0627571 3-1 41.91667 15.614323 68.21901 0.0014162 3-2 16.75000 -9.552344 43.05234 0.2717815 Tukey's test for temperature and material types. TukeyHSD(bacterialife.aov,which=c("Material","Temperature")) > TukeyHSD(bacterialife.aov,which=c("Material","Temperature")) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Life ~ Material * Temperature, data = bacterialife) $Material diff lwr upr p adj 2-1 25.16667 -1.135677 51.46901 0.0627571 3-1 41.91667 15.614323 68.21901 0.0014162 3-2 16.75000 -9.552344 43.05234 0.2717815 $Temperature diff lwr upr p adj 70-15 -37.25000 -63.55234 -10.94766 0.0043788 125-15 -80.66667 -106.96901 -54.36432 0.0000001 125-70 -43.41667 -69.71901 -17.11432 0.0009787 96-well plate example Before we do an experiment, we should test that our assay is working correctly. For a 96-well plate, we can put the identical solution into every well. The response in each well should be very similar. Any differences among the wells should be very small compared to the treatment effects we hope to detect. We'll look at 4 examples: Response1: roughly uniform response Response2: row effect Response3: column effect Response4: edge effect for rows We have two factors, row and column, that may affect the response. So we'll use a linear model with two factors. We'll code Column as a numeric value to we can look for linear trends. library(Rcmdr) # Use to create 3D graphs setwd("C:/Users/Walker/Desktop/UCSD Biom 285/Data") RowColEg <- read.table("RowColEg.csv",header=TRUE, sep=",") Here are the first few rows of the data. > RowColEg Row Column Col.Num Response1 Response2 Response3 Response4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 A A A A A A A A A A A A B B B B B B B B B B B B C C 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 3 0.23 0.25 0.51 0.11 0.36 0.57 0.62 0.92 0.08 0.68 0.30 0.09 0.14 0.83 0.66 0.18 0.44 1.00 0.56 1.00 0.92 0.05 0.28 0.48 0.47 0.98 0.7 0.8 1.5 2.2 2.7 3.0 3.3 3.8 4.5 5.2 5.5 5.8 0.5 1.2 1.7 1.8 2.3 2.8 3.5 4.0 4.3 4.8 5.3 6.0 0.7 1.0 0.5 0.7 0.7 0.7 0.5 0.7 0.7 0.3 0.5 0.7 0.3 0.7 1.2 0.8 1.2 0.8 0.8 0.8 1.2 1.2 1.0 0.8 1.0 1.0 1.5 1.5 7.5 12.0 15.3 18.2 19.7 20.0 19.5 18.2 15.5 12.2 7.7 1.8 7.7 11.8 15.5 18.0 19.7 19.8 19.5 18.2 15.3 12.0 7.5 2.0 7.7 12.2 # Roughly uniform response with(RowColEg, scatter3d(Row , Response1, Col.Num, fit="linear", rev=1)) lm.Response1=lm(Response1 ~ Row + Col.Num, data= RowColEg) summary(lm.Response1) > summary(lm.Response1) Call: lm(formula = Response1 ~ Row + Col.Num, data = RowColEg) Residuals: Min 1Q Median -0.48779 -0.23849 -0.00123 3Q 0.20389 Coefficients: Estimate Std. Error t (Intercept) 0.4889435 0.0870877 Row 0.0003365 0.0087031 Col.Num 0.0010218 0.0131120 --Signif. codes: 0 ‘***’ 0.001 ‘**’ Max 0.50699 value Pr(>|t|) 5.614 2.03e-07 *** 0.039 0.969 0.078 0.938 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2944 on 93 degrees of freedom Multiple R-squared: 8.137e-05, Adjusted R-squared: -0.02142 F-statistic: 0.003784 on 2 and 93 DF, p-value: 0.9962 Neither row nor column is significant. Good news! We should check that the standard deviation of the responses in the wells is small compared to the treatment effects we want to detect. sd(RowColEg$Response1) [1] 0.2912613 Provided that the treatment effect of interest is greater than 0.29, preferably greater than 0.29*2 = 0.58, then a moderate sample size should be sufficient to detect the effect. # Row effect. Response2 values increase across the rows with(RowColEg, scatter3d(Row , Response2, Col.Num, fit="linear", rev=1)) lm.Response2=lm(Response2 ~ Row + Col.Num, data= RowColEg) summary(lm.Response2) > summary(lm.Response2) Call: lm(formula = Response2 ~ Row + Col.Num, data = RowColEg) Residuals: Min 1Q -0.216264 -0.168061 Median 0.009426 3Q 0.082585 Max 0.245430 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.028761 0.044949 0.640 0.524 Row 0.494843 0.004492 110.162 <2e-16 *** Col.Num -0.002183 0.006768 -0.322 0.748 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1519 on 93 degrees of freedom Multiple R-squared: 0.9924, Adjusted R-squared: 0.9922 F-statistic: 6068 on 2 and 93 DF, p-value: < 2.2e-16 Row is significant. Response2 values increase across the rows. Column is not significant. We should check that the standard deviation of the responses in the wells is small compared to the treatment effects we want to detect. sd(RowColEg$Response2) [1] 1.723764 We will have difficulty detecting effects less than 1.7. # Column effect with(RowColEg, scatter3d(Row , Response3, Col.Num, fit="linear", rev=1)) lm.Response3=lm(Response3 ~ Row + Col.Num, data= RowColEg) summary(lm.Response3) > summary(lm.Response3) Call: lm(formula = Response3 ~ Row + Col.Num, data = RowColEg) Residuals: Min 1Q Median -0.24886 -0.08380 -0.01121 3Q 0.16994 Max 0.21512 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.011282 0.045740 0.247 0.806 Row 0.003759 0.004571 0.822 0.413 Col.Num 0.496230 0.006887 72.057 <2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1546 on 93 degrees of freedom Multiple R-squared: 0.9824, Adjusted R-squared: 0.982 F-statistic: 2596 on 2 and 93 DF, p-value: < 2.2e-16 Column is significant. Response3 values increase across the columns. Row is not significant. # Edge effect with(RowColEg, scatter3d(Row , Response4, Col.Num, fit="linear", rev=1)) Notice that the points are above the plane in the middle rows, but below the plane along the edges. The edges are giving lower signals. We see curvature in the graph, so we'll ask for a quadratic fit: with(RowColEg, scatter3d(Row , Response4, Col.Num, fit="quadratic", rev=1)) The curvature in the graph suggests we should add a quadratic (squared) term to the model. Notice the syntax I(Row^2) for adding a squared term to the R model. lm.Response4=lm(Response4 ~ Row + I(Row^2) + Col.Num, data= RowColEg) summary(lm.Response4) > summary(lm.Response4) Call: lm(formula = Response4 ~ Row + I(Row^2) + Col.Num, data = RowColEg) Residuals: Min 1Q Median -0.234692 -0.186465 -0.007008 3Q 0.170427 Max 0.240736 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.9493506 0.0682036 28.581 <2e-16 *** Row 6.0277035 0.0211490 285.011 <2e-16 *** I(Row^2) -0.5021916 0.0015837 -317.099 <2e-16 *** Col.Num -0.0009921 0.0072894 -0.136 0.892 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1636 on 92 degrees of freedom Multiple R-squared: 0.9992, Adjusted R-squared: 0.9991 F-statistic: 3.709e+04 on 3 and 92 DF, p-value: < 2.2e-16 The modeling including the row^2 term shows that row and row^2 are significant. This indicates a strong edge effect, and differences among the rows. Column is not significant. We should check that the standard deviation of the responses in the wells is small compared to the treatment effects we want to detect. sd(RowColEg$Response4) > sd(RowColEg$Response4) [1] 5.602771 We will have difficulty detecting effects less than 5.6. How does this large standard deviation affect the power and sample size of an experiment? We wish to calculate the sample size for the following experiment. • • • 4 groups Estimated group means are 2, 4, 6, 8. power = 0.9 • Within group standard deviation is 5.6. So within group variance is 5.6^2 = 31.36, giving within.var = 31.36. Here's the R code: groupmeans <- c(2, 4, 6, 8) power.anova.test(groups = length(groupmeans), between.var=var(groupmeans), within.var=31.36, power=.90) > power.anova.test(groups = length(groupmeans), between.var=var(groupmeans), within.var=31.36, power=.90) Balanced one-way analysis of variance power calculation groups n between.var within.var sig.level power = = = = = = 4 23.22364 6.666667 31.36 0.05 0.9 NOTE: n is number in each group If the standard deviation for replicates (within group) is 5.6, then we need n=23.2 animals per group. Rounding up to 24 gives us 4*24 = 96 specimens. Suppose that we tested the 96-well plate with identical samples in each well, found the edge effect, and took action to reduce the edge effect. What would our power and sample size be if reducing the unexplained variability (due to the edge effect) reduced the standard deviation from 5.6 to 2.0? Let's calculate the sample size using sd=2. Within group standard deviation is 2. So within group variance is 2^2 = 4, giving within.var = 4. groupmeans <- c(2, 4, 6, 8) power.anova.test(groups = length(groupmeans), between.var=var(groupmeans), within.var=4, power=.90) > power.anova.test(groups = length(groupmeans), between.var=var(groupmeans), within.var=4, power=.90) Balanced one-way analysis of variance power calculation groups = 4 n between.var within.var sig.level power = = = = = 3.975304 6.666667 4 0.05 0.9 NOTE: n is number in each group If the standard deviation for replicates (within group) is 2, then we need n=3.9 animals per group. Rounding up to 4 gives us 4*4 = 16 animals total If the standard deviation for replicates (within group) is 5.6, then we need n=23.2 animals per group. Rounding up to 24 gives us 4*24 = 96 animals total. Which would you rather do? Find the edge effect by running replicates in a 96-well plate, plotting the data, and perhaps doing 2-way ANOVA and reduce the standard deviation to require 16 animals, or Ignore variability of replicates within plate and require 96 animals? If you reduce unexplained variability in your experiments, you increase power and reduce sample size. 96-well plate example: recovering signal from noise Suppose you have tested four different treatments, A,B,C, and D, using a 96-well plate. You are concerned that row or column effects may be present. But the experiment described above to find and remove row and column effects hasn't been done. Is there anything you can do? Yes. You can build a statistical model to test for treatment effects, including row and column effects in the model to control for those sources of unexplained variance. library(Rcmdr) # Use to create 3D graphs setwd("C:/Users/Walker/Desktop/UCSD Biom 285/Data") RowColAnovaData <- read.table("RowColAnovaData.csv",header=TRUE, sep=",") > RowColAnovaData[1:20,] Row Column Col.Num Dose Treatment Response 1 1 A 1 0 A 5.8 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 A A A A A A A A A A A B B B B B B B B 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 B C D A B C D A B C D A B C D A B C D 8.4 4.8 7.4 6.8 9.0 4.1 5.8 2.7 3.3 9.9 6.6 12.0 11.1 12.2 11.1 7.2 4.1 10.3 10.8 with(RowColAnovaData, scatter3d(Row , Response, Col.Num, fit="linear")) There appears to be a strong column effect. First build a model to test for a dose effect, ignoring rows and columns. lm.RowColAnovaData = lm(Response ~ Dose, data= RowColAnovaData) summary(lm.RowColAnovaData) > summary(lm.RowColAnovaData) Call: lm(formula = Response ~ Dose, data = RowColAnovaData) Residuals: Min 1Q Median -6.9383 -2.5788 -0.4144 3Q 2.6010 Max 7.8904 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.63833 0.65527 14.709 <2e-16 *** Dose 0.05904 0.03503 1.686 0.0952 . --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.837 on 94 degrees of freedom Multiple R-squared: 0.02934, Adjusted R-squared: 0.01902 F-statistic: 2.841 on 1 and 94 DF, p-value: 0.09518 The Dose is not significant (p=0.0952) in the model ignoring row and column. Now, let's build a model that includes row and column along with Dose. lm.RowColAnovaData2 = lm(Response ~ Dose + Row + Col.Num, data= RowColAnovaData) summary(lm.RowColAnovaData2) > summary(lm.RowColAnovaData2) Call: lm(formula = Response ~ Dose + Row + Col.Num, data = RowColAnovaData) Residuals: Min 1Q -5.6377 -2.6199 Median 0.2516 3Q 2.2967 Max 4.6246 Coefficients: Estimate Std. Error t value (Intercept) 5.34503 0.90393 5.913 Dose 0.06904 0.02848 2.424 Row -0.10000 0.09225 -1.084 Col.Num 1.06518 0.13150 8.100 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 Pr(>|t|) 5.65e-08 *** 0.0173 * 0.2812 2.23e-12 *** ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.952 on 92 degrees of freedom Multiple R-squared: 0.4376, Adjusted R-squared: 0.4193 F-statistic: 23.86 on 3 and 92 DF, p-value: 1.640e-11 The Col.Num p-value is 2.2e-12, indicating a significant column effect, as we saw in the graph. The columns are a large source of variance. By including columns and rows in the model, removing them as sources of unexplained variance, the p-value for Dose is 0.0173, a significant result. with(RowColAnovaData, scatter3d(Row , Response, Col.Num, fit="linear", group=factor(Dose)))