Mean Comparison Tests - Crop and Soil Science

Treatment Comparisons  ANOVA can determine if there are differences among the treatments, but what is the nature of those differences?  Are the treatments measured on a continuous scale?  Look at response surfaces (linear regression, polynomials)  Is there an underlying structure to the treatments?  Compare groups of treatments using orthogonal contrasts or a limited number of preplanned mean comparison tests  Use simultaneous confidence intervals on preplanned comparisons  Are the treatments unstructured?  Use appropriate multiple comparison tests (today’s topic) Variety Trials  In a breeding program, you need to examine large numbers of selections and then narrow to the best  In the early stages, based on single plants or single rows of related plants. Seed and space are limited, so difficult to have replication When numbers have been reduced and there is sufficient seed, you can conduct replicated yield trials and you want to be able to “pick the winner” Comparison of Means  Pairwise Comparisons – Least Significant Difference (LSD)  Simultaneous Confidence Intervals – Tukey’s Honestly Significant Difference (HSD) – Dunnett Test (making all comparisons to a control) • May be a one-sided or two-sided test – Bonferroni Inequality – Scheffé’s Test – can be used for unplanned comparisons  Other Multiple Comparison Tests - “Data Snooping” – – – – Fisher’s Protected LSD (FPLSD) Student-Newman-Keuls test (SNK) Waller and Duncan’s Bayes LSD (BLSD) False Discovery Rate Procedure  Often misused - intended to be used only for data from experiments with unstructured treatments Multiple Comparison Tests  Fixed Range Tests – a constant value is used for all comparisons – Application • Hypothesis Tests • Confidence Intervals  Multiple Range Tests – values used for comparison vary across a range of means – Application • Hypothesis Tests Type I vs Type II Errors  Type I error - saying something is different when it is really the same (false positive) (Paranoia) – the rate at which this type of error is made is the significance level   Type II error - saying something is the same when it is really different (false negative) (Sloth) – the probability of committing this type of error is designated b – the probability that a comparison procedure will pick up a real difference is called the power of the test and is equal to 1-b  Type I and Type II error rates are inversely related to each other  For a given Type I error rate, the rate of Type II error depends on – sample size – variance – true differences among means Nobody likes to be wrong...  Protection against Type I is choosing a significance level  Protection against Type II is a little harder because – it depends on the true magnitude of the difference which is unknown – choose a test with sufficiently high power  Reasons for not using LSD to make all possible comparisons – the chance for a Type I error increases dramatically as the number of treatments increases Pairwise Comparisons  Making all possible pairwise comparisons among t treatments – # of comparisons:  t! t(t  1) t   2 2!(t  2)! 2  If you have 10 varieties and want to look at all possible pairwise comparisons – that would be t(t-1)/2 or 10(9)/2 = 45 – that’s quite a few more than t-1 df = 9 Comparisonwise vs Experimentwise Error  Comparisonwise error rate ( = C) – measures the proportion of all differences that are expected to be declared real when they are not  Experimentwise error rate (E) – the risk of making at least one Type I error among the set (family) of comparisons in the experiment – measures the proportion of experiments in which one or more differences are falsely declared to be significant – the probability of being wrong increases as the number of means being compared increases – Also called familywise error rate (FWE) Comparisonwise vs Experimentwise Error  Experimentwise error rate (E) Probability of no Type I errors = (1-C)x where x = number of pairwise comparisons Max x = t(t-1)/2 , where t=number of treatments  Probability of at least one Type I error E = 1- (1-C)x if t = 10, Max x = 45 E = 1-(1-0.05)45 = 90%  Comparisonwise error rate C = 1- (1-E)1/x Least Significant Difference  Calculating a t for testing the difference between two means t calc  (Y1  Y2 ) / s 2Y1  Y2 – Any difference for which the tcalc > t would be declared significant 2 t s  Further,  Y1  Y2 is the smallest difference for which significance would be declared – Therefore LSD  t  s 2Y1  Y2 – For equal replication, where r is the number of observations forming each mean LSD  t  2 * MSE r Do’s and Don’ts of using LSD  LSD is a valid test when – Making comparisons planned in advance of seeing the data (this includes the comparison of each treatment with the control) – Comparing adjacent ranked means  The LSD should not (unless F test for treatments is significant**) be used for – Making all possible pairwise comparisons – Making more comparisons than df for treatments **Some would say that LSD should never be used unless the F test from ANOVA is significant Pick the Winner  A plant breeder wanted to measure resistance to stem rust for six wheat varieties – – – – planted 5 seeds of each variety in each of four pots placed the 24 pots randomly on a greenhouse bench inoculated with stem rust measured seed yield per pot at maturity Ranked Mean Yields (g/pot) Mean Yield Difference Yi1 - Yi Variety Rank Yi F 1 95.3 D 2 94.0 1.3 E 3 75.0 19.0 B 4 69.0 6.0 A 5 50.3 18.7 C 6 24.0 26.3 ANOVA Source df MS Variety 5 2,976.44 18 120.00 Error F 24.80  Compute LSD at 5% and 1% LSD  t 0.05,df 18 2 * MSE 2 *120  2.101  16.27 r 4 LSD  t 0.01,df 18 2 * MSE 2 *120  2.878  22.29 r 4 LSD=0.05 = 16.27 LSD=0.01 = 22.29 Back to the data... Mean Yield Difference Yi1 - Yi Variety Rank Yi F 1 95.3 D 2 94.0 1.3 E 3 75.0 19.0* B 4 69.0 6.0 A 5 50.3 18.7* C 6 24.0 26.3** Fisher’s protected LSD (FPLSD)  Uses comparisonwise error rate  Computed just like LSD but you don’t use it unless the F for treatments tests significant LSD  t  2 * MSE r  So in our example data, any difference between means that is greater than 16.27 is declared to be significant Tukey’s Honestly Significant Difference (HSD)  From a table of Studentized range values (see handout), select a value of Q which depends on p (the number of means) and v (error df)  Compute: HSD  Q,p,v MSE r  For any pair of means, if the difference is greater than HSD, it is significant  Uses an experimentwise error rate  Use the Tukey-Kramer test with unequal sample size HSD  Q,p,v MSE  1 1     2  r1 r2  Student-Newman-Keuls Test (SNK)  Rank the means from high to low  Compute t-1 significant differences, SNKj , using the studentized values for the HSD SNK j  Q,k,v MSE r where j=1,2,..., t-1; k=2,3,...,t k = number of means in the range  Compare the highest and lowest – if less than SNK, no differences are significant – if greater than SNK, compare next highest mean with next lowest using next SNK  Uses experimentwise  for the extremes  Uses comparisonwise  for adjacent means Using SNK with example data: k 2 3 4 5 6 Q 2.97 3.61 4.00 4.28 4.49 19.77 21.91 23.44 24.59 2 1 SNK 16.27 Mean Yield Variety Rank F 1 Yi 5 4 3 = 15 comparisons 95.3 18 df for error D 2 94.0 E 3 75.0 B 4 69.0 MSE 120 se    5.477 r 4 A 5 50.3 SNK=Q*se C 6 24.0 Waller-Duncan Bayes LSD (BLSD)  Do ANOVA and compute F (MST/MSE) with q and f df (corresponds to table nomenclature)  Choose error weight ratio, k – k=100 corresponds to 5% significance level – k=500 for a 1% test  Obtain tb from table (A7 in Petersen) – depends on k, F, q (treatment df) and f (error df)  Compute BLSD = tb 2MSE/r  Any difference greater than BLSD is significant  Does not provide complete control of experimentwise Type I error  Reduces Type II error Bonferroni Inequality  Theory E  X *  C where X = number of pairwise comparisons  To get critical probability value for significance C = E / X where E = maximum desired experimentwise error rate  Alternatively, multiply observed probability value by X and compare to E (values >1 are set to 1)  Advantages – simple – strict control of Type I error  Disadvantage – very conservative, low power to detect differences False Discovery Rate  Bars show P values for simple t tests among means – Largest differences have the smallest P values  Line represents critical P values = (i/X)* E False Positive Procedure 0.25 Probability 0.20 0.15 0.10 i = 1 to X Ranks for Reject H0 0.05 Yi -Yj 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Rank (i) More Options! Х Duncan’s New Multiple Range Test – A multiple range test – Less conservative than the SNK test – Used to be popular, but no longer recommended  Dunnett’s Test – Compare all treatments against a control – Compare all treatments against the best treatment – Conservative (controls Type 1, not Type 2 error)  Scheffe’s Test – Can be used for comparisons that are not preplanned – Very conservative! Most Popular  FPLSD test is widely used, and widely abused  BLSD is preferred by some because – It is a single value and therefore easy to use – Larger when F indicates that the means are homogeneous and small when means appear to be heterogeneous  The False Discovery Rate (FDR) has nice features – Good experimentwise Type I error control – Good power (Type II error control) – May not be as well-known as some other tests  Tukey’s HSD test – Widely accepted and often recommended by statisticians – May be too conservative if Type II error has more serious consequences than Type I error

Mean Comparison Tests - Crop and Soil Science

Related documents

Products

Support

Mean Comparison Tests - Crop and Soil Science

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib