Estimation and hypothesis testing Estimation & hypothesis testing (F-test, Chi2-test, t-tests) • Introduction • t-tests • Outlier tests (k • SD: Grubbs, Dixon's Q) • F-test, 2 (=Chi2)-test (= 1-sample F-test) • Tests and confidence limits CI&NHST; CI&NHST-EXCEL; CI&NHST-Exercise; Grubbs, free download from: http://www.graphpad.com/articles/outlier.htm Analysis of variance (ANOVA) • Introduction • Model I ANOVA Performance strategy – Testing of outliers – Testing of variances (Cochran "C", Bartlett) • Model II ANOVA – Applications Cochran&Bartlett; ANOVA Power and sample size Power Statistics & graphics for the laboratory 57 Introduction Introduction When we have a set/sets of data ("sample"), we often want to know whether a statistical estimate thereof (e.g., difference in 2 means, difference of a SD from a target) is pure coincidence or whether it is "statistically significant". We can approach this problem in the following way: The null hypothesis H0 (no difference) is tested against the alternative hypothesis H1 (there is a difference) on the basis of collected data. The decision acceptance/rejection of the hypothesis is made with a certain probability, most often with 95% (statistical significance). Because, usually, we have a limited set of data ("sample"), we extrapolate the estimates from our sample to the underlying populations by use of the statistical distribution theory and we assume random sampling. Hypothesis testing: Example Is the difference between the means of two data sets real or only accidental? 50 40 30 20 10 0 Statistical significance in more detail In statistics the words ‘significant’ and ‘significance’ have specific meanings. A significant difference, means a difference that is unlikely to have occurred by chance. A significance test, shows up differences unlikely to occur because of a purely random variation. To decide if one set of results is significantly different from another depends not only on the magnitude of the difference in the means but also on the amount of data available and its spread. Different? Note: Significance is a function of sample size. Comparing very large samples will nearly always lead to a significant difference but a statistically significant result is not necessarily an important result: does it really matter in practice? Statistics & graphics for the laboratory 58 Introduction Significance testing – Qualitative investigation Adapted from: Shaun Burke, RHM Technology Ltd, High Wycombe, Buckinghamshire, UK. Understanding the Structure of Scientific Data LC • GC Europe Online Supplement probably not different and would 'pass' the t-test (tcrit > tcalc) probably different and would 'fail' the t-test (tcrit < tcalc) could be different but not enough data to say for sure (i.e., would 'pass' the t-test [tcrit > tcalc]) practically identical means, but with so many data points there is a small but statistically siginificant ('real') difference and so would 'fail' the t-test (tcrit < tcalc) spread in the data as measured by the variance are similar would 'pass' the F-test (Fcrit > Fcalc) spread in the data as measured by the variance are different would 'fail' the F-test (Fcrit < Fcalc) could be a different spread but not enough data to say for sure would 'pass' the F-test (Fcrit > Fcalc) Statistics & graphics for the laboratory 59 Introduction General remarks General requirements for parametric tests • Random sampling • Normal distributed data • Homogeneity of variances, when applicable Note on the testing of means When we test means, the central limit theorem is of great importance because it favours the use of parametric statistics. Central limit theorem (see also "sampling statistics") The means of independent observations tend to be normally distributed irrespective of the primary type of distribution. Implications of the central limit theorem • When dealing with mean values, the type of primary distribution is of limited importance, e.g. the t-test for comparisons of means. • When dealing with percentiles, e.g. reference intervals, the type of distribution is indeed important. Statistics & graphics for the laboratory 60 Introduction Overview of test procedures (parametric) Testing levels • 1-sample t-test: comparison of a mean value with a target or limit • t-test: comparison of mean values (unpaired): Perform F-test before: – t-test equal variances – t-test unequal variances • paired t-test: comparison of paired measurements (x, y) Testing outliers • k • SD = Grubbs (http://www.graphpad.com/articles/outlier.htm) • Dixon's Q (Annex: n = 3 to 25) Testing dispersions • F-test for comparison of variances: F =s22/s12 • 2 (=Chi2)-test or 1-sample F-test Testing variances (several groups) • Cochran "C" • Bartlett Concordance between some parametric and non-parametric tests Parametric test Non-parametric analogue One-sample t-test Wilcoxon signed ranks (Sign-test) 2 samples t-test Mann-Whitney U Paired sample t-test Wilcoxon signed ranks Pearson's correlation Spearman's correlation ANOVA Kruskal-Wallis one-way ANOVA Chi-Squared "Goodness Kolmogorov Smirnov of Fit Test" Non-parametric tests are more robust towards outliers, otherwise the difference in practice is limited (the central limit theorem makes the t-test asymptotically nonparametric). Statistics & graphics for the laboratory 61 t-tests t-tests Difference between a mean and a target ("One-sample" t-test) With: 95%-CI: xm +/- t0.05;n • s/N t = (µ0 - xm)/(s/N) For t Degrees of freedom: n = N-1 Probability a = 0.05 (s/N = Standard error) Important t-distribution (see before: sampling statistics) Difference between two means Perform F-test before, decide on the outcome to use the t-test with equal or unequal variances. Given independence, the difference between two variables that are normally distributed is also normally distributed. The variance of the difference is the sum of the individual variances: t = (xm2 – xm1)/[s2/N1 + s2/N2]0.5 where s2 is a common estimate of the variance (= "pooled variance") s2 = [(N1 – 1)s12 + (N2 – 1)s22]/(N1 + N2 – 2) Standard error of mean difference • SEdif = [s2/N1 + s2/N2]0.5 ; If N1 = N2 : SEdif = SQRT(2) • s/N 95%-confidence interval of the mean difference • Mean difference +/- t0.05SEdif Example t-Test: 2-Sample Equal Variances Mean Variance Observations Pooled Variance Hyp. Mean Diff. df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail A B 99.98 98.40 9.995 8.608 30 30 9.301 0 58 2.002 0.025 1.672 0.05 2.002 Not given by EXCEL Difference : -1.5767 95% CI : -3.1529 to -0.0004 Unpaired Wilcoxon test (Mann Whitney) Two-tailed probability, P = 0.0546 Statistics & graphics for the laboratory 62 t-tests t-test – different variances The difference is still normally distributed given σ1 σ2 and the difference of means has the variance: σ12/N1 + σ22/N2, which is estimated as: s12/N1 + s22/N2. However, the t value: t´= (xm2 – xm1)/[s12/N1 + s22/N2]0.5 does not strictly follow the t-distribution. The problem is mainly of academic interest and special tables for t´ have been provided (Behrens, Fisher, Welch). >Perform F-test before t-test! paired t-test – comparison of mean values (paired data) Example: Measurements before and after treatment in patients. When testing for a difference with paired measurements, the paired t-test is preferable. This is because such measurements are correlated and pairing of the data reduces the random variation. Thereby, it increases the probability of detecting a difference. Calculations The individual paired differences are computed: difi = x2i – x1i The mean and standard deviation of the N (=N1 = N2) differences are computed: difm = Σ difi /N ; sdif = [Σ (difi – difm)2/(N-1)]0.5 SEdif = sdif/N0.5 Testing for whether the mean paired difference deviates from zero: t = (difm – 0)/SEdif (N-1 degrees of freedom) Example t-Test: Paired Mean Variance Observations Pearson Correl. Hypoth. Mean Diff. df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail A B 99.98 98.78 9.995 11.61 30 30 0.894 0 29 4.299 9E-05 1.699 2E-04 2.045 Not given by EXCEL Mean difference : 1.2033 95 % CI : 0.6309 to 1.7758 [Paired] Wilcoxon test 2-tailed probability, P = 0.0005 z-tests When SD is known, t can be substituted by z in the above t-tests. Nevertheless, the same propagation rules apply when pooled SDs and SDs of differences are calculated. Statistics & graphics for the laboratory 63 Outliers Outliers Outliers have great influence on parametric statistical tests. Therefore, it is desirable to investigate the data for outliers (see Figure, for example). The upper point is an outlier according to the "Grubb's test" (P < 0.05) Testing for outliers can be done with the following techniques • k • SD = Grubbs (http://www.graphpad.com/articles/outlier.htm) • Dixon's Q (Annex: n = 3 to 25) All assume normal distributed data. The k • SD method (outlier = point > k • SD away from the mean) With this method, it is important to know that the statistical chance to find an outlier will increase with the number of data investigated. Outlier probability: Approximately NP, where P is the probability corresponding to the SD distance for the normal distribution. Example: N 100; 3SD Probability ~ 100 x 0.0013 x 2 ~ 0.26 Exact formula: P = 1-(1-0.0027)N Example: N 100; 4SD Probability ~ 100 x 0.000032 x 2 ~ 0.006 Note: When you use a k • SD criterion (e.g., 3 SD), expand that when N becomes high (e.g., 4 SD)! >Preferably: perform Grubbs test Grubbs test (free download from: www.graphpad.com/articles/outlier.htm) This test is recommended by ISO Grubbs test, critical values 95% n Value (software required). The Grubbs test 3 1.153 can be used iteratively 5 1.672 (after removal of the 1st, look for a second). Dixon's Q-test A common outlier test for moderate sample sizes. Requires software or a table with critical values (see Annex). 10 2.176 20 2.557 50 2.956 100 3.207 140 3.318 Statistics & graphics for the laboratory 64 F-test & 2 (=Chi2)-test F-test: Comparing variances If we have two data sets, we may want to compare the dispersions of the distributions. Given normal distributions, the ratio between the variances is considered. The variance ratio test was developed by Fisher, therefore the ratio is usually referred to as the F-ratio and related to tables of the F-distribution. Calculation F =s22 /s12 Note: The greater value should be in the numerator F 1! Numerator Denominator Example F =s22 /s12 = (0.228)2/(0.182)2 = 1.6 n.s. Degrees of freedom: df2(Num) = 14-1 =13 df1(Denom) = 21-1 = 20 Critical(0.05) F = 2.25 F-test: Some notes • When testing whether two variances are equal or not, take the larger one and divide by the smaller one: F 1. • Testing is here two-sided, i.e. the one sided P-value from a F-table should be multiplied by two. In other situations, testing may be one-sided, e.g. F-tests in ANOVA and Regression. • Notice that the correct numbers of degrees of freedom are used for the numerator and denominator variance! Statistics & graphics for the laboratory 65 F-test & 2 (=Chi2)-test F-test (ctd.) 2 (=Chi2)-test (or 1-sample F-test) Comparing a variance with a target or limit Chi2exp = [s2exp • n]/s2Man Test whether Chi2exp Chi2critical (1-sided, 0.05). One-sided: because we test versus a targhet or a limit. The Chi2-test is used in the CLSI EP 5 protocol. Relationships between F, t, and Chi2 Relationship between Chi2 and F Chi2/n = Fn, n = degrees of freedom. Relationship between F and t The one-tailed F-test with 1 and n degree of freedom is equivalent to the t-test with n degree of freedom. The relationship t2 = F holds for both calculated and tabular values of these two distributions: t(12,0.05) = 2.17881; F(1,12,0.05) = 4.7472 Peculiarities and problems with the EXCEL F-test Dietmar Stöckl, Diego Rodríguez Cabaleiro, Linda M. Thienpont. Clin Chem Lab Med 2004;42(12):1455. EXCEL includes two different versions of the F-test. The first can be accessed via Tools (main toolbar)/Data Analysis/F-Test Two-Sample for Variances. It yields the F-value, the one-tailed P-value and the one-tailed F-critical (the Alpha-level is specified during input; e.g. Alpha = 0.05). The second can be accessed with the function FTEST: fx (icon in the Standard toolbar)/Statistical (function category)/FTEST (function name). This function simply returns the P-value. Remarkably, however, the function returns the two-tailed P-value, while the explanation in the pop-up menu states that FTEST(Array1;Array2) "Returns the result of an F-test, the one-tailed probability that the variances in Array1 and Array2 are not significantly different”. For completeness of information, EXCEL includes two other functions dealing with F-statistics, namely, FINV(probability,degrees_freedom1,degrees_freedom2), which returns F, and FDIST(x,degrees_freedom1,degrees_freedom2), which returns P. Both functions return values with one-tailed probability and correspond to the F-test function in the Data Analysis menu. Statistics & graphics for the laboratory 66 P-values Interpretation of the P-value A test for statistical significance (at a certain probability P), tests whether a hypothesis has to be rejected or not, for example, the nulhypothesis. The nulhypothesis of the F-test is that 2 variances are not different or that an experimentally found difference is only by chance. The nulhypothesis of the F-test will not be rejected when the calculated probability Pexp is greater or equal than the chosen probability P (P usually chosen as 0.05 = 5%), or when the experimental Fexp value is smaller or equal than the critical Fcrit value. Example Fexp (calculated) Critical value of Fcrit Pexp (from experiment) 1.554 2.637 0.182 Chosen probability P 0.05 Observation The calculated P-value (0.182 = ~18%) is greater than the chosen P-value (0.05 = 5%). However, the experimental F-value is < the critical F-value. Conclusion The nulhypothesis is not rejected, this means that the difference of the variances is only by chance. NOTE The P-value is a fixed calculated value for a test performed on a specific data-set. Don't confuse that with the a-level of a test that can be chosen in EXCEL (see screenshot). Different a-values do NOT change the P-value in the output, BUT the critical value for the test (here Fcrit) Statistics & graphics for the laboratory 67 Tests and confidence limits Tests and confidence limits We have seen for the 1-sample t-test the close relationship between confidence intervals and significance testing. In many situations, one can use either of them for the same purpose. Confidence intervals have the advantage that they can be shown in graphs and they provide information about the spread of an estimate (e.g., a mean). The tables below give an overview about the concordance between CI's and significance testing for means and variances (SD's). -t: 2-sided, or 1-sided; 1-sided for comparison with claims -When stable s is known, z may be chosen instead of t Statistics & graphics for the laboratory 68 Exercises CI&NHST; CI&NHST-EXCEL This tutorial/EXCEL template explains the connection between Significance Tests and Confidence Intervals when the purpose is Null Hypothesis Significance Testing (NHST). Indeed, for the specific purpose of NHST, P-values as well as CI's can be used (look whether the null value or target value is inside or outside the CI), they are just two sides of the same medal. Examples are the comparison of i) a standard deviation (SD) with a target value, ii) two standard deviations, iii) a mean with a target value, iv) two means, and v) a mean paired difference with a target value. The statistical tests involved are the 1-sample F-test, F-test, 1-sample t-test, t-test, and the paired t-test, respectively, the CI's of SD, F, mean, mean difference, and mean paired difference. Another exercise shows how NHST is influenced by -The magnitude of the difference -The number of data-points -The magnitude of the SD Please follow the guidance given in the "Exercise Icons" and read the comments. CI&NHST-Exercise This file contains exercises for the concepts dealt in the CI&NHST-file. It treats the following cases: -1-sample t-test (P = 0.0366; Different) -t-test (1st example: Outlier Grubbs 96.06; P = 1.19E-5, Pipettes cannot be exchanged); 2nd example: F-test P = 0.0028 > t-test with unequal variances: P = 0.2162; Precision problems) -Paired t-test (P = 0.03680; Batches are different) -1-sample F-test (CI-Calculator: CI = 1.10 – 2.48 : Different >Out of specification) -F-test (P = 0.001398; Performance of Tech2 is worse) Please follow the instructions given in the respective Worksheets. Grubbs Not included in the package. Please download it free from: http://www.graphpad.com/articles/outlier.htm Statistics & graphics for the laboratory 69 Notes Notes Statistics & graphics for the laboratory 70 ANOVA Analysis of Variance: ANOVA The Three Universal Assumptions of Analysis of Variance 1. Independence 2. Normality 3. Homogeneity of Variance Overview of the concepts • Model I (Assessing treatment effects) Comparison of mean values of several groups. • Model II (Random effects) Study of variances: Analysis of components of variance Model I and II: Identical computations - but different purposes and interpretations! Why ANOVA? Model I (Assessing treatment effects) • ANOVA is an extension of the commonly used t-test for comparing the means of two groups. • The aim is a comparison of mean values of several groups. • The tool is an assessment of variances. 12 12 10 10 t-test comparison of two groups 8 6 4 2 Variable value Variable value Model I: t-test versus ANOVA ANOVA Comparison of more than two groups 8 6 4 2 0 0 Group Group Why not multiple t-tests? • With several groups, many t-tests are necessary for pair-wise comparisons, e.g. 6 times for 4 groups. • Multiple comparisons inflate the t-value, i.e. too often one will get a “significant” result, i.e. a P-value below 5%. • Thus, ANOVA is useful when dealing with several groups. Note: ANOVA cannot tell us which individual mean or means are different from the consensus value and in what direction they deviate. The most effective way to show this is to plot the data. Statistics & graphics for the laboratory 71 ANOVA Introduction – Types of ANOVA One-way: Only one type of classification, e.g. into various treatment groups Ex.: Study of serum cholesterol level in various treatment groups Two-way: Subclassification within treatment groups, e.g. according to gender Ex.: Do various treatments influence serum cholesterol in the same way in men and women? (not considered further here) Principle of One-way ANOVA 12 Distances within (- - -) and between (—) groups are squared and summed, and finally compared. Variable value 10 8 6 4 2 0 1 2 3 4 Group No. Case 1: Null-hypothesis valid 12 Variable value 10 8 No significant difference between groups 6 4 Red — distances are small = the main source of variation is within-groups. 2 0 1 2 3 4 Group No. Case 2: Alternative hypothesis valid 12 Variable value 10 8 Significant difference between groups 6 4 Red — distances are large = the main source of variation is between-groups. 2 0 1 2 3 4 Group No. Statistics & graphics for the laboratory 72 ANOVA Introduction – Mathematical model One-way ANOVA Mathematical model (example: treatment) Yij = Grand mean + (j) treatment (between-group) effectj + ij (within-group) • Null hypothesis: Treatment group effects are zero • Alternative hypothesis: Treatment group effects present Avoiding some of the pitfalls using ANOVA In ANOVA it is assumed that the data are normally distributed. Usually in ANOVA we don’t have a large amount of data so it is difficult to prove any departure from normality. It has been shown, however, that even quite large deviations do not affect the decisions made on the basis of the F-test. A more important assumption about ANOVA is that the variance (spread) between groups is homogeneous (homoscedastic). The best way to avoid this pitfall is, as ever, to plot the data. There also exist a number of tests for heteroscedasity (i.e., Bartlett's test and Levene's test). It may be possible to overcome this type of problem in the data structure by transforming it, such as by taking logs. If the variability within a group is correlated with its mean value then ANOVA may not be appropriate and/or it may indicate the presence of outliers in the data. Cochran's test can be used to test for variance outliers. Statistics & graphics for the laboratory 73 ANOVA Model I ANOVA – Violation of assumptions 12 12 Outlier within subgroups Large variance in subgroup 10 Variable value Variable value 10 8 6 4 2 8 6 4 2 0 0 Set of measurements 12 Set of measurements 12 Variance heterogeneity Variable value Variable value Variance increasing with level 10 10 8 6 4 8 6 4 2 2 0 0 Set of measurements Set of measurements 12 Valid ANOVA Outlying subgroup, F significant Variable value 10 8 6 4 2 0 Set of measurements PERFORMANCE STRATEGY Inspection/testing for outliers within a group • Grubbs test Variance evaluation • Cochran´s test$ for a deviating variance of a subgroup (see Annex for critical values). The test should not be used iteratively. Assumes ~ the same number of data in the groups. • Bartlett´s tests for variance homogeneity (Sokal & Rohlf. Biometry; p: 398) Consider whether there is a relation between level and variance, e.g. Proportionality. In the latter case, consider to do a logarithmic transformation. Statistics & graphics for the laboratory 74 ANOVA Model I ANOVA – Short summary • Plot your data • Generally, the procedure is robust towards deviations from normality. • However, it is indeed sensitive towards outliers, i.e. investigate for outliers within groups. • When the variance within groups is not constant, e.g. being proportional to the level, logarithmic transformation may be appropriate. • Testing for variance homogeneity may be carried out by Bartlett´s test. • Cochran's test can be used to test for variance outliers. When F is significant Supplementary analyses: will not be addressed in more detail! • Maximum against minimum (Student-Newman-Keuls’ procedure) • Pairwise comparisons with control of type I error (Tukey) • Post test for trend (regression analysis) • Control versus others (Dunnett) Control group (C) versus treatment groups 10 Treatment 8 6 Control Apply Dunnett´s Test based on the principle of “least significant difference” (LSD), i.e. critical t-values for differences between treatment groups and the control group are adjusted to ensure the correct overall type I error. (J Am Stat Assoc 1955;50:1096). 12 Variable value Often, focus is on effects in treatment groups versus the control group. 4 2 0 Group Hitherto, we considered Model I ANOVA: Treatment (fixed) effects: Effect of planned (controlled) interventions. Another approach is to look at the variation within and between groups. This leads us to Model II ANOVA: Variation among groups due to random effects, e.g. nature´s (uncontrolled) intervention. Statistics & graphics for the laboratory 75 ANOVA Model II (random effects) ANOVA Example: Ranges of serum cholesterol in different subjects. 12 Variable value 10 8 6 4 2 0 Set of measurements Model II (random effects) ANOVA (analysis of components of variation) Mathematical model Yij = Grand mean + Between-group variation j (B) + Within-group variation i (W) Reminder In model II ANOVA, the main point is to estimate components of variation and NOT hypothesis testing. Example: components of variation • Total dispersion of a single measurement: T2 = B2 + W2 • Total dispersion of means of n measurements in each group (XGP): GP2 = B2 + W2/n The analysis of components of variation has shown us that, generally, standard deviations are propagated by summing their squares (= variances). This means for a "total" standard deviation itself, that it is the square root of the sum of the variances. However, depending on the mathematical relationship between components that make up a total SD, different propagation rules have to be used (see next page). Statistics & graphics for the laboratory 76 ANOVA Total variance (total standard deviation) The standard deviation (s) of calculated results (propagation of s) 1. Sums and differences y = a(±sa) + b(±sb) + c(±sc) sy = SQRT[sa2 + sb2 + sc2] (SQRT = square root) Do not propagate CV! 2. Products and quotients y = a(±sa) • b(±sb) / c(±sc) sy/y = SQRT[(sa/a)2 + (sb/b)2 + (sc/c)2] 3. Exponents (the x in the exponent is error-free) y = a(±sa)x sy/y = x • sa/a Addition of variances: stot : SQRT[s21 + s22] A large component will dominate: Forms the basis for the suggestion by Cotlove et al.: SDA < 0.5 x SDI • A: analytical variation • I: within-individual biological variation In a monitoring situation the total random variation of changes is only increased up to 12% as long as this relation holds true. Component SDs Total SD 1+1 = 1.41 1+0.5 = 1.12 Applications of model II ANOVA • Quality control/Assessment • Method evaluation • Biological variation • Goal-setting Statistics & graphics for the laboratory 77 ANOVA Software output One-way ANOVA: Output of statistical programs Variances within and between groups are evaluated XGP: Group mean XGM: Grand mean *df: Degrees of freedom **(Mean square = Variance = Squared SD) Interpretation of model I ANOVA: The F-ratio If the ratio of between- to within-mean square exceeds a critical F-value (refer to a table or look at the P-value), a significant difference between group means has been disclosed. F: Fisher published the ANOVA approach in 1918. Components of variation: Relation to standard output of statistics programs F = MSB/MSW = [n SDB2 + SDW2]/SDW2 For unequal group sizes, a sort of average n is calculated according to a special formula: n0 = [1/(K-1)][N - ni2/N] Interpretation of model II ANOVA: The components of variation W2 and B2 Note forB Due to the formula, a negative SQRT may occur. EXCEL will give an error. In that case, set SDbetween to zero! Statistics & graphics for the laboratory 78 ANOVA Conclusion Model I ANOVA A general tool for assessing differences between group means Model II ANOVA Useful for assessing components of variation Nonparametric ANOVA • Kruskall-Wallis test: A generalization of the Mann-Whitney test to deal with > 2 groups. • Friedman´s test: A generalization of Wilcoxon’s paired rank test to more than two repeats. The study of components of variation: not suitable for nonparametric analysis. Software ANOVA is included in standard statistical packages (SPSS, BMDP, StatView, STATA, StatGraphics etc.) Variance components may be given or be derived from mean squares as outlined in the tables. Direct estimation of components of variation, e.g. within- and between-run SD in quality control or inter/intra-individual biological variation. CBstat: A Windows program distributed by K. Linnet Information and download: http://www.cbstat.com References Snedecor GW, Cochran WG. Statistical methods, 8.ed. Iowa State University Press: Ames, Iowa, 1989, Chapters 12-13. Fraser CG. Biological variation: From principles to practice. AACC Press, Washington, 2001. Statistics & graphics for the laboratory 79 Exercises Cochran&Bartlett Many statistical programs do not include the Cochran or Bartlett test. Therefore, they have been elaborated in an EXCEL-file. The Cochran&Bartlett file contains the formula's for the -Cochran test for an outlying variance (including the critical values) -Bartlett test for variance homogeneity Both are important for ANOVA -A calculation example More experienced EXCEL users may be able to adapt this template to their own applications. ANOVA This tutorial contains interactive exercises for self-education in Analysis of Variance (ANOVA). ANOVA can be used for 2 purposes: -Model I (Assessing treatment effects) Comparison of MEAN values of several groups. -Model II (Random effects) Study of VARIANCES: Analysis of components of variance Model I and II have IDENTICAL computations but different purposes and interpretations! Worksheets 1 - 6 describe a systematic approach to Model I ANOVA. They address: -Outlier detection -Investigation of variance homogeneity or outlying variance -Performing ANOVA with EXCEL >Tools>Data Analysis>Anova: Single Factor. A "Screen Shot" guides the application. Worksheets 8 & 9 contain Model II ANOVA applications. NOTE! EXCEL cannot do Model II ANOVA by default. However, it is easy to calculate the components of variation from the ANOVA output. >See the formulas in the respective cells >and the explanation in the PICTURE. Note forB Due to the formula, a negative SQRT may occur. EXCEL will give an error. In that case, set SDbetween to zero! Statistics & graphics for the laboratory 80 Notes Notes Statistics & graphics for the laboratory 81 Notes Notes Statistics & graphics for the laboratory 82 Power and sample size The statistical Power concept & sample size calculations When testing statistical hypotheses, we can make 2 types of errors. The so-called type I (or a error) and the type II (or b error). The power of a statistical test is defined as 1- b error. The power concept is demonstrated in the figure below, denoting the probability of the a-error by p, and the one of the b-error by q. Like significance testing, power calculations can be done 1-and 2-sided. Relative frequency of estimated difference H0 HA 1-q • Type I error (p) The probability of detecting a difference when it is not present • Type II error (q) The probability of not detecting a difference when it is present • Power = 1 - q q p/2 0 True difference D Purpose of power analysis and sample-size calculation Some key decisions in planning any experiment are, "How precise will my parameter estimates tend to be if I select a particular sample size?" and "How big a sample do I need to attain a desirable level of precision?” Power analysis and sample-size calculation allow you to decide (a) how large a sample is needed to enable statistical judgments that are accurate and reliable and (b) how likely your statistical test will be to detect effects of a given size in a particular situation. " Statistics & graphics for the laboratory 83 Power and sample size The statistical Power concept & sample size calculations Calculations Definitions zp/2 = probability of the nul-hypothesis (usually 95%, 1- or 2-sided; e.g.: zp/2 = 1.65 or 1.96) z1-q = probability of the alternative-hypothesis (usually 90%, always 1-sided; e.g.: z1-q = 1.28) N = number of measurements to be performed Mean versus a target value N = [SD/(mean – target)]2 • (zp/2 + z1-q)2 Detecting a relevant difference (gives the number required in each group) N = (SDDelta/Delta)2 • (zp/2 + z1-q)2 Delta = Difference to be detected SDDelta = SQRT(SDx2 + SDy2), usually: SDx = SDy >SDDelta = 2 • SD (requires previous knowledge of the SD) Example: difference between 2 groups Software Assumptions: -Delta = 5; SD = 4.5 for both; from this: SDDelta = 2 • SD = 6.36 -Significance level 2-sided 95%, P = 0.05 (zp/2 = 1.96) Sampling -Power = 90%, P = 0.1 (z1-q = 1.28) -N = (6.36/5)2 • (1.96 + 1.28)2 = 17 Conclusion: To detect a difference of 5, we would need 17 measurements for each group. Statistics & graphics for the laboratory 84 Exercises Power This file contains 2 worksheets that explain the power concept and allow simple sample-size calculations. Please use dedicated software for routine power calculations. Concept Use the respective "Spinners" to change the values (or enter the values directly in the blue cells) for: -Mean -SD For comparison of a sample mean versus a target, use sample SD For comparison of 2 sample means with the same SD, use SD = SQRT(2)*SD -Sample size -Significance level (Only with Spinner!!!) Limited to the same value for alpha- and beta-error! NOTE: alpha = 2-sided, beta = 1-sided!!! >Observe the effect on the power. Calculations This worksheet allows the calculation of sample sizes for -Comparing a mean with a target -Comparing 2 means. The calculations are explained in this text and in the "Exercise Icon". Statistics & graphics for the laboratory 85 Notes Notes Statistics & graphics for the laboratory 86