Always be mindful of the kindness and not the faults of others. 1 One-way Anova: Inferences about More than Two Population Means What is Anova? One-Way Anova; F tests Pairwise comparisons: Bonferroni procedure 2 Analysis of Variance & One Factor Designs Y= DEPENDENT VARIABLE (“yield”) (“response variable”) (“quality indicator”) X = INDEPENDENT VARIABLE (A possibly influential FACTOR) 3 OBJECTIVE: To determine the impact of X on Y Mathematical Model: Y = f (x, ) , where = (impact of) all factors other than X Ex: Y = Forced expiratory volume in one second (liters) X = Medical center (John Hopkins, Rancho Los Amigos, St. Louis) = Many other factors (possibly, some we’re unaware of) 4 Statistical Model “LEVEL” OF Center 1 1 2 • • • • n 2 ••• • • •••C Y11 Y12 • • • • • • •Y1c Y21 • • • • • • Yij • • • • YnI (Brand is, of course, represented as “categorical”) • • • • • • • Yij = + j + ij i = 1, . . . . . , nj j = 1, . . . . . , C • •Ync 5 Where = OVERALL AVERAGE j = index for FACTOR (center) LEVEL i = index for “replication” j = Differential effect (response) associated with jth level of X and ij = “noise” or “error” associated with the (particular) (i,j)th data value. Let j = AVERAGE associated with jth level of X j = j – and = AVERAGE of j . 6 1 Y11 2 3 ••••• C Y12 • • • • • •Y1c Y21 • • • • • • • • YRI • • • • • • • • • Y 1 Y 2• • • (Y j) YRc • • Yc Y1, Y2, etc., are Column Means 7 c Y • = Y j C = “GRAND MEAN” j=1 / (assuming same # data points in each column) (otherwise, Y • = mean of all the data) 8 MODEL: Y• Yj - Y • Yij = + j + ij estimates estimates j (= j – ) (for all j) These estimates are based on Gauss’ (1796) PRINCIPLE OF LEAST SQUARES and on COMMON SENSE 9 MODEL: Yij = + j + ij If you insert the estimates into the MODEL, < (1) Yij = Y • + (Yj - Y • ) + ij. it follows that our estimate of ij is (2) ij = Yij - Yj 10 Then, Yij = Y• + (Yj - Y• ) + ( Yij - Yj) { { { or, (Yij - Y• ) = (Yj - Y•) + (Yij - Yj ) (3) VARIABILITY in Y Variability Variability TOTAL = in Y + in Y associated associated with X with all other factors 11 If you square both sides of (3), and double sum both sides (over i and j), you get, [after some unpleasant algebra, but lots of terms which “cancel”] C nj C 2 2 C nj (Yij - Y• ) = nj(Yj - Y•) + (Yij - Yj) j=1 j=1 i=1 { { { j=1 i=1 2 ( ( TSS TOTAL SUM OF SQUARES = SSB + = SUM OF SQUARES BETWEEN COLUMNS + ( SSW (SSE) ( SUM OF SQUARES WITHIN COLUMNS 12 ANOVA TABLE SOURCE OF VARIABILITY SSQ Between Columns (due to center) SSB DF Mean (M.S.) square C-1 SSB = MSB C-1 SSW N-C Within Columns (due to other factors) SSW N-C TOTAL TSS N-1 = MSW 13 ANOVA TABLE Source of Variability SSQ df CENTER 1.583 2 M.S. 0.791 = 3-1 ERROR 14.480 57 0.254 = 59 - 2 TOTAL 115.84 59 = 60 -1 14 We can show: E ( MSB ) = 2 + VCOL E ( MSW ) = 2 This suggests that if MSB MSW if MSB MSW > < There’s some evidence of non1 , zero V , or “level COL of X affects Y” No evidence that 1, VCOL > 0, or that “level of X affects Y” 15 With HO: HI: Level of X has no impact on Y Level of X does have impact on Y, We need MSB MSW >>1 to reject HO. 16 More Formally, HO: 1 = 2 = • • • c = 0 HI: not all j = 0 OR HO: 1 = 2 = • • • • c (All column means are equal) HI: not all j are EQUAL 17 The distribution of MSB MSW = “Fcalc” , is The F - distribution with (dfB, dfw) degrees of freedom Assuming a HO true. Ca = Table Value 18 In our problem: ANOVA TABLE Source of Variability SSQ df CENTER 1.583 2 M.S. 0.791 = 3-1 ERROR 14.480 57 Fcalc 3.12= 0.791/ 0.254 0.254 = 59 - 2 TOTAL 115.84 59 = 60 -1 19 F table: Table A-5 a = .05 C0.5 = 3.15 Fcal =3.12 (2, 57 DF) 20 Hence, at a = .05, Do Not Reject Ho , i.e., Conclude that centers don’t differ significantly on FEV1 at 5% level. Pvalue is .052, so it is significant at 6% level 21 Multiple Comparison Procedures Once we reject H0: ==...c in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns, are all 4 ’s different? Are 3 the same and one different? If so, which one? etc. Overall Type I Error Rate We set up “a” as the significance level for a hypothesis test. Suppose we test 3 independent hypotheses, each at a= .05; each test has type I error (rej H0 when it’s true) of .05. However, P(at least one type I error in the 3 tests) all ) = 1 - (.95)3 .14 = 1-P( accept 3, given true Pairwise Comparisons Bonferroni Correction: Do a series of pairwise t-tests, each with specified a value divided by # of comparisons. MINITAB INPUT center fev1 1 1 1 1 3.23 3.47 1.86 2.47 . . . 2.85 2.43 3.20 3.53 . . . 3 3 3 3 25 ONE FACTOR ANOVA (MINITAB) MINITAB: STAT>>ANOVA>>ONE-WAY Click for comparisons 26 27 Minitab Outputs Fisher 98.3% Individual Confidence Intervals All Pairwise Comparisons among Levels of center Simultaneous confidence level = 95.58% center = 1 subtracted from: center Lower Center Upper ------+---------+---------+---------+--2 -0.0049 0.4063 0.8176 (-----------*----------) 3 -0.1215 0.2525 0.6266 (---------*----------) ------+---------+---------+---------+---0.35 0.00 0.35 0.70 center = 2 subtracted from: center Lower Center Upper ------+---------+---------+---------+--3 -0.5572 -0.1538 0.2496 (-----------*----------) ------+---------+---------+---------+---0.35 0.00 0.35 0.70 28