Chapter 3 Experiments with a Single Factor: The Analysis of Variance

Chapter 3 Experiments with a Single Factor: The Analysis of Variance 1 3.1 An Example • Chapter 2: A signal-factor experiment with two levels of the factor • Consider signal-factor experiments with a levels of the factor, a  2 • Example: – The tensile strength of a new synthetic fiber. – The weight percent of cotton – Five levels: 15%, 20%, 25%, 30%, 35% – a = 5 and n = 5 2 • Does changing the cotton weight percent change the mean tensile strength? • Is there an optimum level for cotton content? 3 3.2 The Analysis of Variance • a levels (treatments) of a factor and n replicates for each level. • yij: the jth observation taken under factor level or treatment i. 4 Models for the Data • Means model:  i  1,2,..., a yij   i   ij ,   j  1,2,..., n – yij is the ijth observation, – i is the mean of the ith factor level – ij is a random error with mean zero • Let i =  + i ,  is the overall mean and i is the ith treatment effect • Effects model:  i  1,2,..., a y ij     i   ij ,   j  1,2,..., n 5 • Linear statistical model • One-way or Signal-factor analysis of variance model • Completely randomized design: the experiments are performed in random order so that the environment in which the treatment are applied is as uniform as possible. • For hypothesis testing, the model errors are assumed to be normally and independently distributed random variables with mean zero and variance, 2, i.e. yij ~ N(+i, 2) • Fixed effect model: a levels have been specifically chosen by the experimenter. 6 3.3 Analysis of the Fixed Effects Model • Interested in testing the equality of the a treatment means, and E(yij) = i =  + i, i = 1,2, …, a H0: 1 = … = a v.s. H1: i  j, for at least one pair (i, j) • Constraint:  i  i a   i  0 i • H0: 1 = … = a =0 v.s. H1: i  0, for at least one i 7 • n a n Notations: yi   yij , y   yij j 1 i 1 j 1 yi  yi / n, y  y / N , N  na 3.3.1 Decomposition of the Total Sum of Squares • • Total variability into its component parts. The total sum of squares (a measure of overall variability in the data) a n SST   ( yij  y.. )2 i 1 j 1 • Degree of freedom: an – 1 = N – 1 8 a n a n 2 ( y  y )  [( y  y )  ( y  y )]  ij ..  i. .. ij i. 2 i 1 j 1 i 1 j 1 a a n  n ( yi.  y.. )   ( yij  yi. ) 2 2 i 1 i 1 j 1 SST  SSTreatments  SS E • SSTreatment: sum of squares of the differences between the treatment averages (sum of squares due to treatments) and the grand average, and a – 1 degree of freedom • SSE: sum of squares of the differences of observations within treatments from the treatment average (sum of squares due to error), and N – a degrees of freedom. 9 SST  SSTreatments  SSE • A large value of SSTreatments reflects large differences in treatment means • A small value of SSTreatments likely indicates no differences in treatment means • dfTotal = dfTreatment + dfError • (n  1) S12    (n  1) S a2 SS E  N a (n  1)    (n  1) •If there are no differences between a treatment means, n ( y i  y ) 2 SS Treatments  a 1 i a 1 10 SSTreatments SS E • Mean squares: MS , MS E  Treatments  a 1 N a a n a 1 1 E ( MS E )  E ( y ij2   y i2 )   2 N  a i 1 j 1 n i 1 a E ( MS Treatments )   2  n( i ) /( a  1) i 1 3.3.2 Statistical Analysis • Assumption: ij are normally and independently distributed with mean zero and variance 2 11 • SST/2 ~ Chi-square (N – 1), SSE/2 ~ Chi-square (N – a), SSTreatments/2 ~ Chi-square (a – 1), and SSE/2 and SSTreatments/2 are independent (Theorem 3.1) • H0: 1 = … = a =0 v.s. H1: i  0, for at least one i 12 • Reject H0 if F0 > F, a-1, N-a • Rewrite the sum of squares: y2 SS T   y ij  N i 1 j 1 a n 1 a 2 y2 SS Treatments   y i  n i 1 N SS E  SS T  SS Treatments • See page 71 13 Response:Strength ANOVA for Selected Factorial Model Analysis of variance table [Partial sum of squares] Sum of Mean F Source Squares DF Square Value Prob > F Model 475.76 4 118.94 14.76 < 0.0001 A 475.76 4 118.94 14.76 < 0.0001 Pure Error161.20 20 8.06 Cor Total 636.96 24 Std. Dev. 2.84 Mean 15.04 C.V. 18.88 PRESS 251.88 R-Squared Adj R-Squared Pred R-Squared Adeq Precision 0.7469 0.6963 0.6046 9.294 14 3.3.3 Estimation of the Model Parameters • Model: yij =  + i + ij • Estimators: ˆ  y î  yi  y ˆ i  yi • Confidence intervals: y i ~ N (  i ,  2 / n) y i  t / 2 , N  a MS E MS E   i  y i  t / 2, N  a n n y i  y j   t / 2 , N  a MS E MS E   i   j  y i  y j   t / 2 , N  a n n 15 • Example 3.3 (page 75) • Simultaneous Confidence Intervals (Bonferroni method): Construct a set of r simultaneous confidence intervals on treatment means which is at least 100(1-): 100(1-/r) C.I.’s 3.3.4 Unbalanced Data • Let ni observations be taken under treatment i, i=1,2,…,a, N = i ni, 2 y SS T   yij2   N i 1 j 1 a ni y i2 y2   N i 1 ni a SS Treatments 16 1. The test statistic is relatively insensitive to small departures from the assumption of equal variance for the a treatments if the sample sizes are equal. 2. The power of the test is maximized if the samples are of equal size. 17 3.4 Model Adequacy Checking • Assumptions: yij ~ N(+i, 2) • The examination of residuals • Definition of residual: eij  yij  yˆ ij , yˆ ij  ˆ  î  y  ( yi  y )  yi • The residuals should be structureless. 18 th 3.4.1 The Normality Assumption • Plot a histogram of the residuals • Plot a normal probability plot of the residuals • See Table 3-6 R es idual -3.8 -1.55 0.7 2.95 5.2 ytilibaborp % la mr o N 1 5 10 20 30 50 70 80 90 95 99 19 • May be – Slightly skewed (right tail is longer than left tail) – Light tail (the left tail of error is thinner than the tail part of standard normal) • Outliers • The possible causes of outliers: calculations, data coding, copy error,…. • Sometimes outliers are more informative than the rest of the data. 20 • Detect outliers: Examine the standardized eij residuals, d ij  MS E 3.4.2 Plot of Residuals in Time Sequence • Plotting the residuals in time order of data collection is helpful in detecting correlation between the residuals. • Independence assumption 21 SIGN-EXPERT Pl ot ength Residuals vs. Run 5.2 Residuals 2.95 0.7 -1.55 -3.8 1 4 7 10 13 16 19 22 25 Run Num ber 22 3.4.3 Plot of Residuals Versus Fitted Values • Plot the residuals versus the fitted values • Structureless ESIGN-EXPERT Pl ot rength Residuals vs. Predicted 5.2 2.95 Residuals 2 2 0.7 2 2 -1.55 2 2 2 -3.8 9.80 12.75 15.70 Predicted 18.65 21.60 23 • Nonconstant variance: the variance of the observations increases as the magnitude of the observation increase, i.e. yij  2 • If the factor levels having the larger variance also have small sample sizes, the actual type I error rate is larger than anticipated. • Variance-stabilizing transformation Poisson Square root transformation Lognormal Logarithmic transformation log y ij Binomial Arcsin transformation arcsin y ij y ij 24 • Statistical Tests for Equality Variance: H 0 :  12     a2 v.s. H1 : above not true for at least one  i2 – Bartlett’s test:  02  2.3026 q c a q  ( N  a ) log S   (ni  1) log S i2 2 P i 1 1  a 1 1  c  1   (ni  1)  ( N  a )  3(a  1)  i 1  a S p2   (ni  1) S i2 /( N  a ) i 1 2 2 – Reject null hypothesis if  0   ,a1 25 • Example 3.4: the test statistic is  02  0.93 and  02.05, 4  9.49 • Bartlett’s test is sensitive to the normality assumption • The modified Levene test: – Use the absolute deviation of the observation in each treatment from the treatment median. d ij  y ij  ~ y i , i  1,2, , a, j  1,2, , ni – Mean deviations are equal => the variance of the observations in all treatments will be the same. – The test statistic for Levene’s test is the ANOVA F statistic for testing equality of means. 26 • Example 3.5: – Four methods of estimating flood flow frequency procedure (see Table 3.7) – ANOVA table (Table 3.8) – The plot of residuals v.s. fitted values (Figure 3.7) – Modified Levene’s test: F0 = 4.55 with P-value = 0.0137. Reject the null hypothesis of equal variances. 27 • • • • Let E(y) =  and y   Find y* = y that yields a constant variance. *  +-1 Variance-Stabilizing Transformations * and   = 1 -  Transformation *constant 0 1 No transformation *  1/2 ½ ½ Square root *   1 0 Log *  3/2 3/2 -1/2 Reciprocal square root *  2 2 -1 Reciprocal 28 • How to find : log  yi  log    log  i • Use S i   i and yi   i • See Figure 3.8, Table 3.10 and Figure 3.9 29 3.5 Practical Interpretation of Results • Conduct the experiment => perform the statistical analysis => investigate the underlying assumptions => draw practical conclusion 3.5.1 A Regression Model • Qualitative factor: compare the difference between the levels of the factors. • Quantitative factor: develop an interpolation equation for the response variable. 30 • RegressionDESIGN-EXPERT analysis Plot Strength • See Figure 3.1 One Factor Plot 25 X = A: Cotton Weight % Final Equation in Terms of Design Points Actual Factors: 20.5 2 2 Strength Strength = +62.61143 -9.01143* Cotton Weight % +0.48143 * Cotton Weight %^2 -7.60000E-003 * Cotton Weight %^3 2 2 16 2 11.5 2 This is an empirical model of the experimental results 7 2 15.00 20.00 25.00 30.00 A: Cotton Weight % 35.00 31 3.5.2 Comparisons Among Treatment Means • If that hypothesis is rejected, we don’t know which specific means are different • Determining which specific means differ following an ANOVA is called the multiple comparisons problem 3.5.3 Graphical Comparisons of Means 32 3.5.4 Contrast • A contrast: a linear combination of the parameters of the form a a i 1 i 1    ci  i ,  ci  0 • H0:  = 0 v.s. H1:   0 • Two methods for this testing. 33 • The first method: a a i 1 i 1 Let C   ci y i Then Var (C )  n 2  ci2 a c y Under H 0 , i 1 i i a ~ N (0,1) n 2  ci2 i 1 a Hence the statistic, t 0  c y i 1 i i a nMS E  ci2 ~ t N a i 1 34 • The second method: a F0  t 02  (  ci y i ) 2 i 1 a nMS E  ci2 ~F1,N  a i 1  a    ci y i  MS C SS C / 1  F0   , SS C   i 1 a MS E MS E n ci2 i 1 35 • The C.I. for a contrast,  a    ci  i i 1 σ2 Let C   ci y i . Then Var(C)  n i 1 a a Hence C.I.  ci y i  t / 2, N  a i 1 MS E n a 2 c i i 1 a 2 c i i 1 • Unequal Sample Size     ci y i  ci y i   i 1 3. SSC   i a1 a 2 2 n c MS E  ni ci  ii a a a 1.  ni ci  0 2. t 0  i 1 i 1 i 1 36 2 3.5.5 Orthogonal Contrast • Two contrasts with coefficients, {ci} and {di}, are orthogonal if ci di = 0 • For a treatments, the set of a – 1 orthogonal contrasts partition the sum of squares due to treatments into a – 1 independent single-degreeof-freedom components. Thus, tests performed on orthogonal contrasts are independent. • See Example 3.6 (Page 94) 37 3.5.6 Scheffe’s Method for Comparing All Contrasts • Scheffe (1953) proposed a method for comparing any and all possible contrasts between treatment means. Suppose u  c1u 1   c au  a , u  1,2,  , m a C u   ciu y i and S Cu  MS E  (ciu2 / ni ) i 1 i 1 The critical value : S  ,u  S Cu (a  1) F ,a 1, N  a If C u  S  ,u , then reject H 0 : u  0 • See Page 95 and 96 38 3.5.7 Comparing Pairs of Treatment Means • Compare all pairs of a treatment means • Tukey’s Test: – The studentized range statistic: q y max  y min MS E / n , y max and y min are the largest and smallest sample means out of a group of p sample means MS E The critical point is T  q (a, f ) n or T  q (a, f ) MS E (1 / ni  1 / n j ) – See Example 3.7 39 • Sometimes overall F test from ANOVA is significant, but the pairwise comparison of mean fails to reveal any significant differences. • The F test is simultaneously considering all possible contrasts involving the treatment means, not just pairwise comparisons. The Fisher Least Significant Difference (LSD) Method • For H0: i = j t0  y i  y j  MS E (1 / ni  1 / n j ) 40 • The least significant difference (LSD): LSD  t / 2, N a 1  1 MS E    n n  j   i • See Example 3.8 Duncan’s Multiple Range Test • The a treatment averages are arranged in ascending order, and the standard error of each average is determined as S yi   MS E , nh  nh a a 1 / n i 1 i 41 • Assume equal sample size, the significant ranges are RP  r  p, f S yi  , p  2,3,, a • Total a(a-1)/2 pairs • Example 3.9 The Newman-Keuls Test • Similar as Duncan’s multiple range test • The critical values: K P  q ( p, f ) S yi  42 3.5.8 Comparing Treatment Means with a Control • Assume one of the treatments is a control, and the analyst is interested in comparing each of the other a – 1 treatment means with the control. • Test H0: i = a v.s. H1: : i  a, i = 1,2,…, a – 1 • Dunnett (1964) • Compute yi  ya , i  1,2,, a  1 • Reject H0 if y i   y a 1 1   d  (a  1, f ) MS E     ni na  • Example 3.10 43 3.7 Determining Sample Size • Determine the number of replicates to run 3.7.1 Operating Characteristic Curves (OC Curves) • OC curves: a plot of type II error probability of a statistical test,   1  PReject H 0 | H 0 is false   1  P( F0  F ,a 1, N  a | H 0 is false) 44 • If H0 is false, then F0 = MSTreatment / MSE ~ noncentral F with degree of freedom a – 1 and N – a and noncentrality parameter  • Chart V of the Appendix a • Determine 2 2  n  i i 1 a 2 • Let i be the specified treatments. Then estimates a of i :  i   i   ,     i / a i 1 2 • For  , from prior experience, a previous experiment or a preliminary test or a judgment 45 estimate. • Example 3.11 • Difficulty: How to select a set of treatment means on which the sample size decision should be based. • Another approach: Select a sample size such that if the difference between any two treatment means exceeds a specified value the null hypothesis should be rejected. 2 nD 2  a 2 46 3.7.2 Specifying a Standard Deviation Increase • Let P be a percentage for increase in standard deviation of an observation. Then a  2   i /a i 1 / n  1  0.01P  2   1 n • For example (Page 110): If P = 20, then  1.2 2   1 n  0.66 n 47 3.7.3 Confidence Interval Estimation Method • Use Confidence interval. y i   y j   t / 2 , N  a MS E MS E   i   j  y i  y j   t / 2 , N  a n n • For example: we want 95% C.I. on the difference in mean tensile strength for any two cotton weight percentages to be  5 psi and  = 3. See Page 110. 48 3.9 The Regression Approach to the Analysis of Variance • Model: yij =  + i + ij 2 • a n a n L    ij2    y ij     i  i 1 j 1 i 1 j 1 L L   0, i  1,2,  , a   i  y a n i 1 j 1  ˆ  î   0 &   y ij  ˆ  î   0, i  1,2,  , a n ij j 1 49 • The normal equations Nˆ nˆ nˆ nˆ  nˆ1  nˆ1  nˆ2    nâ     nâ   nˆ2  ˆ i 1 i 0 y y1 y 2  y a • Apply the constraint a Then estimations are ˆ  y ,î  yi  y • Regression sum of squares (the reduction due to fitting the full model) 2 a a y i R( , )  ˆy  î yi   i 1 i 1 n 50 • The error sum of squares: a n SS E   yij2  R ,  i 1 j 1 • Find the sum of squares resulting from the treatment effects: R( |  )  R(  , )  R(  )  R(Full Model) - R(Reduced Model) 2 y y /n N i 1 2 i 51 • The testing statistic for H0: 1 = … = a R( |  ) /( a  1) F0  ~ Fa 1, N  a a n 2   yij  R(  , ) /( N  a)  i 1 j 1  52

Chapter 3 Experiments with a Single Factor: The Analysis of Variance

Related documents

Products

Support

Chapter 3 Experiments with a Single Factor: The Analysis of Variance

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib