Chapter 2-3. Choice of significance test The choice of a test statistic, or significance test, is based on the measurement scale of your data, how many groups you want to compare, and whether the groups are independent or related (paired). Once you have identified what category your data fall into, you then choose the test that best fits your research question. This chapter contains only a small list of the available significance tests. The purpose of the chapter is simply illustrate that the decision is made based on measurement scales, number of groups to compare, and whether the groups are independent or related. To use this chapter, determine the measurement scale of your data and the type of statistical problem. Then look up a test in this chapter. If you are unfamiliar with the test, simply look it up in the manual for the statistical software. These manuals will give a description of what each test specifically does. Searching the internet for the statistical test is also helpful. When computing statistics for someone else, it is generally best to use a statistic they are familiar with. Trust me, this makes them happier, even if it’s not the most appropriate statistic; and if it leads to the same conclusion, who cares! I have therefore listed the statistical tests found in introductory statistical courses (the most widely known) at the top of the lists. If you are not familiar with the other tests, just use the first one listed. Many times, statisticians will use a test which requires only ordered categorical data to analyze continuous data. This works because the continuous data contain all the properties of ordered categorical data (the extra properties are simply not utilized). This might be done to eliminate the effect of an outlier (skewed distribution), for example. Generally, however, this results in a loss of power (basically, the probability that you will get a statistically significant P value) because you are using less information in the data. _____________________ Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual. Salt Lake City, UT: University of Utah School of Medicine. Chapter 2-3 Choice of Significance Test. (Accessed July 24, 2013, at http://medicine.utah.edu/ccts/sdbc/stoddard_textbook.php). Chapter 2-3 (revision 24 Jul 2013) p. 1 Measurement Scale (also called level of measurement) Nominal scale name unordered categories e.g., cancer therapies: chemo, radiation, surgery Ordinal scale name + order ordered categories e.g., quality of life: lousy, okay, great Interval scale name + order + equal intervals + arbitrary zero point continuous measurement with arbitrary zero e.g., body temperature: 0°F does not imply absence of temperature (although perhaps absence of life). The 0 point is just a convention of the scale. Ratios do not make sense--you would not say 101.8°F is 1.05 times as hot as 97°F. Ratio scale name + order + equal intervals + absolute zero point continuous measurement with absolute zero e.g., hematocrit: 0% means no hematocrit, however unlikely. Ratios make sense (at least arithmetically)–a Hct of 48% is 1.2 times a Hct of 40%, although at opposite ends of the normal range (so does not necessarily equate to 1.2 times better health). Dichotomous scale (a special case of the nominal scale, in that it always has just two categories) e.g., gender: male or female Note 1: it makes sense to do arithmetic on interval scaled variables, since this scale is sufficiently close to our notion of integers and real numbers (both number systems have equal intervals). It does not make sense to do arithmetic on nominal and ordinal scales, since these scales do not have equal intervals. Note 2: for purposes of choosing test statistics, interval and ratio scales are considered equivalent. Note 3: although it is rarely claimed as such, a dichotomous scale could be considered an interval scale, since it has order (although perhaps an arbitrary order), it has equal intervals (one interval that is equal to itself), and one category can be selected to represent 0. Detailed support for this is given in Chapter 2-6. A second measurement scale scheme is: Binary data Unordered categorical data Ordered categorical data Continuous data Chapter 2-3 (revision 24 Jul 2013) (dichotomous scale) (nominal scale) (ordinal scale) (interval & ratio scales) p. 2 Although the following two tables assumes much more than we have covered so far, here is a quick list of an appropriate statistical test for the most commonly encountered statistical problems. Most commonly used tests when do NOT need to control for confounding (unadjusted analysis, or univariable analysis) [randomized clinical trial (RCT), or Table 1 Patient characteristics table of article (you never discuss Table 1 statistics in a grant, since they are not testing a study aim), or a well-control basic science experiment]. Level of Two Three or more Two Three or more Measurement Independent Independent Correlated* Correlated Samples of outcome Groups Groups Samples variable Dichotomous chi-square or chi-square or McNemar test Cochran Q test Fisher’s exact Fisher-Freeman(extension of test Halton test McNemar test) Ordered WilcoxonOld School***: Wilcoxon sign Old School# categorical MannKruskal-Wallis rank test Friedman two-way Whitney analysis of variance ANOVA by ranks (WMW) test (ANOVA) New School# New School***: Mulitiplicity adjusted multiplicity Wilcoxon sign rank adjusted WMW tests tests Continuous independent Old school***: paired t-test mixed effects linear groups t-test oneway ANOVA regression New school***: multiplicity adjusted independent groups t tests Censored: log-rank test Multiplicity Shared-frailty Shared-frailty Cox time to event adjusted log-rank Cox regression regression test * Correlated case: repeated measurements on same person, or clustered data (e.g., patients nested within clinics) **Mixed effects regression models are synonymously called: multilevel models, hierarchal models, longitudinal models ***Old school: it is commonly thought that ANOVA must precede the multiplicity adjusted pairwise comparisons. This is not true and many time causes you to lose significance [see Chapter 2-8, page 23, section, “Common Misconception of Thinking Analysis of Variance (ANOVA) Must Precede Pairwise Comparisons”]. New school: just go straight to the multiplicity adjusted pairwise comparisons if the pairwise comparisons are of interest, which is usually the case, and limit the comparisons to only those needed for the hypothesis being tested. #same argument as ***, only using paired sample tests. Most commonly used tests when DO need to control for confounding Chapter 2-3 (revision 24 Jul 2013) p. 3 (adjusted analysis, or multivariable analysis) [always the case with observational studies, since randomization is not employed] Level of Two Three or more Two Three or more Measurement Independent Independent Correlated* Correlated of outcome Groups Groups Samples samples variable Dichotomous logistic logistic conditional mixed effects regression regression & logistic logistic consider need regression, or regression for multiplicity mixed effects adjustment logistic regression Ordered ordinal ordinal logistic mixed effects mixed effects categorical logistic regression & ordinal ordinal logistic regression consider need logistic regression for multiplicity regression adjustment Continuous linear linear mixed effects mixed effects regression regression & linear linear regression consider need regression for multiplicity adjustment Censored: Cox Cox regression shared-frailty shared-frailty time to event regression & consider Cox Cox regression need for regression multiplicity adjustment * Correlated case: repeated measurements on same person, or clustered data (e.g., patients nested within clinics) **Mixed effects regression models are synonymously called: multilevel models, hierarchical models, longitudinal models Chapter 2-3 (revision 24 Jul 2013) p. 4 The following pages provide a much larger list of tests, and there are many more than these available. Type of Statistical Problem single sample (one group, frequently compared to a constant) related sample comparisons: a variable is measured on the same person more than once (e.g., pretest-posttest, baseline, time 1, time 2....) (also called: paired samples, repeated measurements) unrelated sample comparisons: groups being compared are different people (e.g., treatment group vs control group) (also called: independent samples) measures of association (correlation statistics) measures of agreement (e.g., interrater agreement) regression models power and sample size The layout of the following pages closely follows the one found on the inside front cover of the StatXact-4 Manual (1998). Siegel and Castellan (1988) also provide a similar digest for tests covered in their text. That following notation is used: SX = StatXact-5® Statistical Software SPSS = SPSS® 11.0 Statistical Software SPow = SamplePower™ 2.0 Statistical Software Stata = Stata 11.0 Statistical Software C.I. = Confidence Interval Chapter 2-3 (revision 24 Jul 2013) p. 5 Type of Statistical Problem/Measurement Scale One Sample Dichotomous Scale Binomial Test (SX, SPSS, Stata: bitest , bitesti) Binomial C.I. (SX, Stata: ci , propci , propcii) Runs Test (SX,SPSS, Stata: runtest) One Sample Nominal Scale Chi-Square Goodness-of-Fit Test (SX,SPSS,Stata:mgof—use “findit mgof” to install) Multinomial C.I. (SX) Test of Homogeneity of Poisson Rates (SX) Poisson C.I. (SX,Stata: ci, cii) One Sample Ordinal Scale Chi-Square Goodness-of-Fit Test (SX Stata:mgof—use “findit mgof” to install) One Sample Interval Scale Tests for Location (average) One Sample T-Test (SPSS,Stata: ttest, ttesti) C.I. for Mean (SPSS, Stata: ci, cii, ttest) Tests for Scale (variance) Chi-Square test SD = # (Stata: sdtest, sdtesti) Tests for Goodness-of-Fit (e.g., Normality Test) Shapiro-Wilk Test for normality (SX,SPSS,Stata: swilk) Shapiro-Francia test for normality (Stata: sfrancia) Kolmogorov Test (SX,SPSS,Stata: ksmirnov) Lilliefors Test (SX,SPSS) Runs Test (with cut-off) (SX,Stata: runtest) Two Related Samples Dichotomous Scale McNemar’s Test (SX,SPSS,Stata: mcc,mcci) C.I. for proportion difference (Stata: mcc,mcci) Sign Test (SX,SPSS,Stata: signtest) Odds Ratio (Stata: mcc,mcci) C.I. for Odds Ratio (Stata: mcc,mcci) Two Related Samples Nominal Scale Stuart-Maxwell Test (Stata: symmetry,symmi) (this test is an extension of the McNemar test to more than 2 categories) Marginal homogeneity test (Stata: symmetry,symmi)—SX version 5 assumes an ordinal scale, and so is not appropriate for the nominal scale case) Two Related Samples Ordinal Scale Wilcoxon Matched-Pairs Signed-Ranks Test, also called the Wilcoxon Signed Rank Test (SX,SPSS,Stata: signrank) Sign Test (SX,SPSS,Stata: signtest) Marginal Homogeneity Test (SX,Stata: symmetry,symmi) (this test is an extension of the McNemar test to more than 2 categories) Two Related Samples Interval Scale Paired Sample T-Test (SPSS,Stata: ttest) Chapter 2-3 (revision 24 Jul 2013) p. 6 C.I. for Mean Difference (SPSS,Stata: ttest) Fisher-Putman Permutation Test for Paired Replications (Stata: permtest1) Permutation Tests with General Scores (SX) Hodges-Lehmann C.I. for Shift (SX,Stata: npshift) Two Unrelated Samples Dichotomous Data Pearson’s Chi-Square Test (SX,SPSS,Stata: tabulate,tab2,tabi) Fisher’s Exact Test (SX,SPSS,Stata: tabulate,tab2,tabi) Likelihood Ratio Test (SX,SPSS,Stata: tabulate,tab2,tabi) Odds Ratio (SX,SPSS,Stata: cc, cci) C.I. for Odds Ratio (SX,SPSS,Stata: cc, cci) Barnard’s Test (SX) Risk Difference (SX,Stata: cs, csi) Risk Ratio (SX,Stata: cs, csi) C.I. for P2 - P1 or Risk Difference (SX,Stata: cs, csi) C.I. for P2 / P1 or Risk Ratio (SX,Stata: cs, csi) Two Unrelated Samples Nominal Scale Pearson’s Chi-Square Test (SX,SPSS,Stata: tabulate) Fisher-Freeman-Halton Test (SX,SPSS,Stata: tabulate) (called Fisher’s Exact Test in Stata) Likelihood Ratio Test (SX,SPSS,Stata: tabulate) Poisson Samples Test of Homogeneity of Poisson Rates (SX) C.I. for Common Poisson Ratio (SX) Two Unrelated Samples Ordinal Scale Tests for Location (average) Wilcoxon-Mann-Whitney Test (SX,SPSS,Stata: ranksum) Median Test (Stata: median) Normal Scores Test (SX) Savage Scores Test (SX) Permutation Tests with General Scores (SX) Permutation Tests with MERT Scores (SX) C.I. for Common Odds Ratio (SX,Stata: mhodds) Tests for Scale (variance) Siegel-Tukey Test (SX) Mood Test (SX) Ansari-Bradley Test (SX) Klotz Test (SX) Conover Test (SX) Omnibus Tests (shape and location) Kolmogorov-Smirnov Test (SX,SPSS,Stata: ksmirnov) Wald-Wolfowitz Runs Test (SX,SPSS) Two Unrelated Samples Interval Scale Tests for Location (average) Independent Groups T-Test (SPSS,Stata: ttest, ttesti) C.I. for Difference Between Means (SPSS,Stata: ttest) Fisher-Putman Permutation Test for Two Independent Samples (Stata: permtest2) Hodges-Lehmann C.I. for Shift (SX,Stata: npshift) Chapter 2-3 (revision 24 Jul 2013) p. 7 Tests for Scale (variance) Levene’s test for equality of variances (SPSS,Stata: robvar) F-Test for equality of variances (SPSS,Stata: sdtest, sdtesti) Moses Rank-Like Test (SPSS) Tests for Censored Survival Data Logrank Test (SX,SPSS,Stata: ltable, sts) also called Mantel-Cox Test Wilcoxon-Gehan Test (SX,Stata: sts) Breslow Test (SPSS) also called Beslow Generalized Wilcoxon Test Tarone-Ware Test (SPSS,Stata: sts) Peto and Peto’s Generalized Wilcoxon Test (Stata: sts) Two Unrelated Samples: Stratified Analysis Dichotomous Scale Test of Homogeneity of Odds-Ratios (SX,SPSS,Stata: mhodds) Test for Common Odds-Ratio (SX,SPSS,Stata: mhodds,tabodds) C.I. for Common Odds-Ratio (SX,SPSS,Stata: mhodds,tabodds) Two Unrelated Samples: Stratified Analysis Nominal Scale Poisson Samples Test of Homogeneity of Poisson Rate Ratio (SX) C.I. for Common Poisson Rate Ratio (SX) Two Unrelated Samples: Stratified Analysis Ordinal Scale Wilcoxon Rank Sum Test (SX) Savage Scores Test (SX) Permutation Tests with General Scores (SX) Permutation Tests with MERT Scores (SX) Permutation Tests with Stratum-Specific Scores (SX) Score Test for Trend in Odds (Stata: tabodds) C.I. for Common Odds Ratio (SX,Stata: mhodds,tabodds) Two Unrelated Samples: Stratified Analysis Interval Scale Tests for Censored Survival Data Logrank Test (SX,SPSS,Stata: sts) also called Mantel-Cox Test Breslow Test (SPSS) also called Beslow Generalized Wilcoxon Test Wilcoxon-Gehan Test (SX,Stata: sts) Tarone-Ware Test (SPSS,Stata: sts) K Related Samples: Unordered Treatments Dichotomous Scale Cochran Q Test (SX,SPSS,Stata: cochran, use “findit cochran” to download) K Related Samples: Unordered Treatments Nominal Scale K Related Samples: Unordered Treatments Ordinal Scale Friedman two-way ANOVA by rankst (SX,SPSS,Stata: friedman) Quade Test (SX) K Related Samples: Unordered Treatments Interval Scale K Related Samples: Ordered Treatments Chapter 2-3 (revision 24 Jul 2013) Dichotomous Scale p. 8 K Related Samples: Ordered Treatments Nominal Scale K Related Samples: Ordered Treatments Ordinal Scale Page Test (SX) K Related Samples: Ordered Treatments Interval Scale Repeated Measures Analysis of Variance (SPSS,Stata: anova) K Unrelated Samples: Unordered Treatments Dichotomous Scale Pearson Chi-Square Test (SX,SPSS,Stata: tabulate) Fisher-Freeman-Halton Test (SX, SPSS Exact Tests module, Stata: tabulate) (called Fisher’s Exact Test in Stata) Likelihood Ratio Test (SX,SPSS,Stata: tabulate) K Unrelated Samples: Unordered Treatments Nominal Scale Pearson Chi-Square Test (SX,SPSS,Stata: tabulate) Fisher-Freeman-Halton Test (SX, SPSS Exact Tests module, Stata: tabulate) (called Fisher’s Exact Test in Stata) Likelihood Ratio Test (SX,SPSS,Stata: tabulate) K Unrelated Samples: Unordered Treatments Ordinal Scale Kruskal-Wallis Test (SX,SPSS,Stata: kwallis) Median Test (SX,SPSS,Stata: median) Normal Scores Test (SX) Savage Scores Test (SX) One-Way ANOVA with General Scores (SX) K Unrelated Samples: Unordered Treatments Interval Scale Tests for Location (average) One-way Analysis of Variance (SPSS,Stata: oneway, anova) Post-Hoc Multiple Comparison Procedures (SPSS,Stata: oneway) Tests for Censored Survival Data Logrank Test (SX,SPSS,Stata: sts) also called Mantel-Cox Test Breslow Test (SPSS) also called Beslow Generalized Wilcoxon Test Wilcoxon-Gehan Test (SX,Stata: sts) Tarone-Ware Test (SPSS,Stata: sts) K Unrelated Samples: Ordered Treatments Dichotomous Scale Cochran-Armitage Trend Test (with stratification) (SX) C.I. for Common Odds Ratio (SX Stata: mhodds,tabodds) K Unrelated Samples: Ordered Treatments Nominal Scale K Unrelated Samples: Ordered Treatments Ordinal Scale Linear by Linear Association Test (SX,SPSS) also called Mantel-Haenszel Chi-Square Jonckheere-Terpstra Test (SX,SPSS) K Unrelated Samples: Ordered Treatments Interval Scale Tests for Censored Survival Data Tarone-Ware Trend Test for Censored Survival Data (SX,Stata: sts) Measures of Association Dichotomous Scale odds ratio (SX,SPSS,Stata: cc, cci ) Chapter 2-3 (revision 24 Jul 2013) p. 9 phi (SPSS, Stata: phi) Contingency Coefficients (SX) Goodman Kruskal Tau (SX) Uncertainty Coefficients (SX) Measures of Association Nominal Scale Contingency Coefficients (SX) Goodman Kruskal Tau (SX,Stata: tabulate,tab2,tabi) Uncertainty Coefficients (SX) Measures of Association Ordinal Scale Spearman’s Rank-Order Correlation Coefficient (SX,SPSS, Stata: spearman) Kendall’s Tau Coefficient (SX,SPSS,Stata: tabulate,tab2,tabi,spearman) Somer’s D Coefficient (SX,Stata: somersd) Goodman-Kruskal Gamma Coefficient (SX,Stata: tabulate,tab2,tabi) Kendall’s Coefficient of Concordance (SX,Stata: friedman) Measures of Association Interval Scale Pearson’s Product-Moment Correlation Coefficient (SX,SPSS,Stata: pwcorr, correlate) Measures of Agreement Dichotomous Scale Cohen’s Kappa (SX,SPSS,Stata: kappa) Measures of Agreement Nominal Scale Cohen’s Kappa (SX,SPSS,Stata: kappa) Measures of Agreement Ordinal Scale Weighted Kappa (SX,Stata: kapwgt) Kendall (SPSS) Measures of Agreement Interval Scale Interclass and Intraclass Correlation Coefficient (SPSS, Stata:xtreg) Cronbach's alpha (SPSS,Stata: alpha) Regression Models Dichotomous Dependent Variable Logistic Regression (SPSS,Stata: logistic) Conditional Logistic Regression (matched pairs) (Stata: clogit) Probit Analysis (SPSS,Stata: probit,biprobit,hetprob,keckprob,glm) Discriminate Analysis (SPSS,Stata: discrim) Monotonic Regression (SPSS GOLDMineR module) Regression Models Nominal Dependent Variable Loglinear Analysis (SPSS,Stata: loglin,ipf) Logit Loglinear Analysis (SPSS,Stata: glogit,nlogit,xtlogit,ologit,scobit) Discriminate Analysis (SPSS,Stata: discrim) Poisson Regression (Stata: poisson,xtpois) Regression Models Ordinal Dependent Variable Ordinal Regression (SPSS, Stata: ologit,oprobit) Monotonic Regression (SPSS GOLDMineR module) Regression Models Interval Dependent Variable Linear Regression (SPSS,Stata: regress) Analysis of Variance (SPSS,Stata: anova) Nonlinear Regression (SPSS,Stata: nl) Factor Analysis (SPSS,Stata: factor) Cluster Analysis (SPSS,Stata: cluster,cldis,clavg,clcomp,clkmeans, Chapter 2-3 (revision 24 Jul 2013) p. 10 clkmed,clsing) Multidimensional Scaling (SPSS) Time Series Analysis (SPSS,Stata: arima,xt) Variance Components Analysis (SPSS) Censored Survival Data Kaplan-Meier Survival Curves (SPSS,Stata: ltable,sts) Cox Regression (SPSS,Stata: cox,stcox) Life Tables (SPSS,Stata: ltable) Power & Sample Size Dichotomous Scale Fisher’s Exact Test (SX,SPow) Pearson’s Chi-Square Test (SX,SPow, Stata: sampsi) Likelihood Ratio Test (SX) Equivalence Test (SX) Binomial Test (SPow) McNemar Test (SPow) Sign Test (SPow) Logistic Regression (SPow) Survival Analysis (Spow, Stata:stpower) Cohen’s Kappa Inter-rater agreement (Stata: sskdlg) Power & Sample Size Nominal Scale Pearson’s Chi-Square Test (Spow) Power & Sample Size Ordinal Scale Wilcoxon-Mann-Whitney Test (SX) Trend Tests on K Binomial Samples (SX) Linear Rank Tests on Two Multinomial Samples (SX) Power & Sample Size Interval Scale One Sample T Test (SPow,Stata: sampsi) Paired Sample T Test (SPow,Stata: sampsi) Two Sample T Test (SPow,Stata: sampsi) One Sample Pearson Correlation = # (SPow) Two Sample Equality of Pearson Correlations (SPow) One-Way Analysis of Variance/Covariance (SPow) K-Way (Factorial) Analysis of Variance/Covariance (SPow) Linear Regression (SPow) Nonlinear Regression (SPow) Equivalence Study Two Sample T Test (SPow) References Siegel S, Castellan NJ. (1988). Nonparametric Statistics for the Behavioral Sciences. 2nd ed. New York, McGraw-Hill. StatXact 5 for Windows: Statistical Software for Exact Nonparametric Inference User Manual. (2001). Cambridge MA, Cytel Software Corporation. Chapter 2-3 (revision 24 Jul 2013) p. 11