Chapter 6: Linear Combinations and Multiple Comparisons of Means The F -test for equality of means does not tell us which means are different from which others or typically answer specific questions of interest. It is mainly an initial screening device. This chapter discusses using linear combinations of means to address specific questions, along with methods for adjusting for multiple comparisons. 6.1 Case studies 6.1.1 Discrimination Against the Handicapped • Question of Interest: Do subjects systematically evaluate qualifications differently according to the candidate’s handicap? If so, which handicaps produce the different evaluations? • Scope of Inference? 6.1.2 Preexisting Preferences of Fish • Scientific Question: Do female platyfish have a pre-existing bias for a male trait, even before the males of the same species possess it? • Specific Question of Interest: Did the female platyfish show a preference for the males that were given the yellow swordtails? • Scope of Inference? 6.2 Inferences About Linear Combinations of Group Means • Questions of interest often involve several group means, not just two! – Example: the difference between the averages of multiple group means. 1 • What is a linear combination of group means? γ = C1 µ 1 + C2 µ 2 + . . . + CI µ I where C1 , C2 , . . . , CI are constants called coefficients chosen by the researcher. • What is a linear contrast? – A linear combination where the coefficients (the C’s) add to zero. 6.2.2 Inferences About Linear Combinations of Group Means • How do we estimate linear combinations? g = C1 Y 1 + C2 Y 2 + . . . + CI Y I s SD(g) = σ C2 C12 C22 + + ... + I n1 n2 nI • We still need a standard error for g! s SE(g) = sp C2 C12 C22 + + ... + I n1 n2 nI • Where do we get sp ? • Two sample (and one-sample) comparisons are just a linear combination of means: • Inferences based on the t-Distributions: 2 * t-ratio: * d.f.: • Example: Handicap study (Display 6.4) 6.2.3 Specific Linear Combinations • Comparing Averages of Group Means: • Comparing Rates: – Goal: Report results as rate of increase in the mean response with changes in an explanatory variable. – Example: Diet Restriction study ∗ Estimate the increase in mean lifetime associated with a reduction from 50 to 40 kcal/wk −→ =months/(kcal/wk) • Linear Trends: – Goal: Determine whether the group means are linearly associated with an explanatory variable. ∗ Linear combination for a linear trend: · If Xi is explanatory variable associated with group i, then Ci = (Xi − X) · NOTE: Inference is unchanged if multiply all the Ci s by the same constant! 3 ∗ Example: Platyfish study (see Display 6.5): · t-test for Linear Effect of Body size: • Averages: – Goal: Estimate the average of group means and/or test if it is equal to some value. – Is this a contrast? – Example: Platyfish study 4 6.3 Simultaneous Inferences • Individual (pairwise) confidence level: Frequency with which a single interval captures its parameter. • overall (familywise) confidence level: Frequency with which multiple intervals simultaneously capture their parameters. • Compound uncertainty = increased chance of making at least one mistake when drawing more than one inference. – If you make k 95% confidence intervals, then the familywise confidence level is and . between ∗ How small can the familywise level be with k = 10? – How does this relate to p-values? • Multiple Comparison Procedures: ways of constructing individual CIs so that the familywise confidence level is controlled at a specified level when needed. • Planned comparisons: Researcher knows before seeing the data that the comparison will be reported. −→ use individual confidence level • Unplanned comparisons: Examine all possible pairs (or other linear combinations) of differences and then report the “significant” ones. −→ use familywise confidence level • Data snooping: The hypothesis or comparison chosen originates from looking at the data. (Ex. Look at a plot of the groups or the individual group means and then choose the two most different.) −→ use familywise confidence level 5 Data snooping example • DNA molecule: 2,436 mononucleotides (A, C, G, and T) (see Display 6.7) There are 11 breaks in the molecule and the researchers noticed that 6 of them were downstream from a TGG trinucleotide. • Does this support the hypothesis that TGG is a precursor of breaks in the DNA? Scenario 1: Suppose this was a planned comparison (hypothesized before looking at the data): • There are 2435 possible break positions. How many are downstream from TGG’s? • If breaks occurred at random positions, how many of the 11 breaks would we expect to be downstream of TGG? • What is probability of six or more occurring downstream of TGG? • This is strong evidence of an association between the occurrences of breaks and TGG! • BUT, they chose to look specifically at TGG because it is the most frequently observed trinucleotide upstream from the breaks. Was the visual search for the trinucleotide an integral part of the statistical analysis? Scenario 2: Use computer simulation to investigate the effect of first searching for the most frequent trinucleotide before calculating the p-value. • Randomly select 11 break points from the molecule and search for the most frequent trinucleotide appearing upstream. Record the number of breaks it is upstream of. Do this 1000 times. • In 320 of the 1,000 trials, some trinucleotide was found upstream of the 11 breaks six times or more (and 11 times in one trial!). • What is the resulting p-value and conclusion? 6 6.4 Some Multiple Comparison Procedures These procedures change the multiplier used to construct a confidence interval: Tukey-Kramer Procedure and Studentized Range Distributions • WHEN? Controls for all pairwise comparisons of means • To make a 100(1 − α)% CI, use a value from the studentized range distributions rather than the t-distribution: q(1 − α; I, n − I) where I=number of groups and n − I is the d.f. associated with the SE. • Tukey-Kramer Procedure (works for unequal sample sizes): 1 TKmult = √ q(1 − α; I, n − I) 2 • Does the CI contain the null parameter value (usually zero)? • Example: – R code: ∗ qtukey(1 − α, number of groups, d.f.) = qtukey(1 − α, I, n − I) (Example: For 95% level with 6 groups and n = 84, get qtukey(.95, 6, 78) and then multiply it by √12 ) ∗ aov.out <- aov(response ~ group) TukeyHSD(aov.out) plot(TukeyHSD(aov.out)) Scheffe’s Procedure • WHEN? Controls level for all possible linear contrasts among group means. – For all pairwise differences in means −→ at least 100(1 − α)% – More appropriate application = intervals for regression curves. • The multiplier: Scheffemult = q (I − 1)FI−1,d.f. (1 − α) • Example: – R code: qf(1−α, (number of groups - 1), d.f.) = qf(1−α, I −1, n−I) (Example: For 95% level with 6 groups and n = 84, get qf(.95, 5, 78) and then multiply it by (I − 1) and take the squareroot.) 7 The Least Significance Difference (LSD) • WHEN? For planned comparisons • The usual t-multiplier tmult = td.f. (1 − α/2) • Any difference that exceeds the LSD in size is significant in an α-level hypothesis test. F-Protected Inferences • WHEN? Unplanned comparisons • Two-steps: 1. Perform the usual ANOVA F-test 2. (a) Large p-value −→ do not declare any individual difference significant (b) Small p-value −→ proceed with individual comparisons with usual t-multiplier. • It works for tests, but not for confidence intervals! Bonferroni • WHEN? Wide range of comparisons (usually conservative) • For k comparisons: α Bonferronimult = td.f. (1 − ) 2k • Example Multiple RESPONSE measurements • Instead of many groups and one response, you may have many responses and few groups. • The multiple comparison procedures described here are generally not appropriate, but Bonferroni methods do offer reasonable protection. • Use a multivariate statistical tool. 8 Comments • A LOT of studies fall within the general class of one-way classifications. The statistical analysis is NOT the same for all! (i.e You do not have to do the usual ANOVA F-test and use multiple comparison methods searching for the “significant” differences.) • A major distinction can be made between studies in which the group structure calls for specific planned comparisons and those in which the only question is which means differ from which others. • Computer simulation is a powerful tool that can help evaluate evidence about more complex hypotheses suggested by the data. 9