Chapter 2-3. Choice of significance test The choice of a test statistic, or significance test, is based on the measurement scale of your data, how many groups you want to compare, and whether the groups are independent or related (paired). Once you have identified what category your data fall into, you then choose the test that best fits your research question. This chapter contains only a small list of the available significance tests. The purpose of the chapter is mostly to illustrate how the decision is made based on measurement scales, number of groups to compare, and whether the groups are independent or related. To use this chapter, determine the measurement scale of your data and the type of statistical problem. Then look up a test in this chapter. If you are unfamiliar with the test, simply look it up in the manual for the statistical software. These manuals will give a description of what each test specifically does. Searching the internet for the statistical test is also helpful. When computing statistics for someone else, it is generally best to use a statistic they are familiar with. Trust me, this makes them happier, even if it’s not the most appropriate statistic; and if it leads to the same conclusion, who cares! I have therefore listed the statistical tests found in introductory statistical courses (the most widely known) at the top of the lists. If you are not familiar with the other tests, just use the first one listed. Many times, statisticians will use a test which requires only ordered categorical data to analyze continuous data. This works because the continuous data contain all the properties of ordered categorical data (the extra properties are simply not utilized). This might be done to eliminate the effect of an outlier (skewed distribution), for example. Generally, however, this results in a loss of power (the probability that you will get a statistically significant P value) because you are using less information in the data. _____________________ Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2010. Chapter 2-3 (revision 16 May 2010) p. 1 Measurement Scale (also called level of measurement) Nominal scale name unordered categories e.g., cancer therapies: chemo, radiation, surgery Ordinal scale name + order ordered categories e.g., quality of life: lousy, okay, great Interval scale name + order + equal intervals + arbitrary zero point continuous measurement with arbitrary zero e.g., body temperature: 0°F does not imply absence of temperature (although perhaps absence of life). The 0 point is just a convention of the scale. Ratios do not make sense--you would not say 101.8°F is 1.05 times as hot as 97°F. Ratio scale name + order + equal intervals + absolute zero point continuous measurement with absolute zero e.g., hematocrit: 0% means no hematocrit, however unlikely. Ratios make sense (at least arithmetically)–a Hct of 48% is 1.2 times a Hct of 40%, although at opposite ends of the normal range (so does not necessarily equate to 1.2 times better health). Dichotomous scale (a special case of the nominal scale, in that it always has just two categories) e.g., gender: male or female Note 1: it makes sense to do arithmetic on interval scaled variables, since this scale is sufficiently close to our notion of integers and real numbers (both number systems have equal intervals). It does not make sense to do arithmetic on nominal and ordinal scales, since these scales do not have equal intervals. Note 2: for purposes of choosing test statistics, interval and ratio scales are considered equivalent. Note 3: although it is rarely claimed as such, a dichotomous scale could be considered an interval scale, since it has order (although perhaps an arbitrary order), it has equal intervals (one interval that is equal to itself), and one category can be selected to represent 0. A second measurement scale scheme is: Binary data Unordered categorical data Ordered categorical data Continuous data Chapter 2-3 (revision 16 May 2010) (dichotomous scale) (nominal scale) (ordinal scale) (interval & ratio scales) p. 2 Although the following two tables assumes much more than we have covered so far, here is a quick list of an appropriate statistical test for the most commonly encountered statistical problems. Most commonly used tests when do NOT need to control for confounding (unadjusted analysis, or univariable analysis) [randomized clinical trial (RCT) or Table 1 Patient characteristics table of article (you never discuss Table 1 statistics in a grant, since they are not testing a study aim)]. Level of Two Three or more Two Three or more Measurement Independent Independent Correlated* Correlated of outcome Groups Groups Samples Samples variable Dichotomous chi-square or chi-square or McNemar mixed effects** Fisher’s exact Fisher’s exact test logistic test test regression Ordered WilcoxonOld school***: Wilcoxon mixed effects categorical MannKruskal-Wallis sign rank ordinal logistic Whitney analysis of test for regression (WMW) test variance matched (ANOVA) data Continuous independent groups t test New school***: multiplicity adjusted WMW tests Old school***: oneway ANOVA paired t test mixed effects linear regression New school***: multiplicity adjusted independent groups t tests Censored: log-rank test Multiplicity SharedShared-frailty time to event adjusted logfrailty Cox Cox regression rank test regression * Correlated case: repeated measurements on same person, or clustered data (e.g., patients nested within clinics) **Mixed effects regression models are synonymously called: multilevel models, hierarchal models, longitudinal models ***Old school: it is commonly thought that ANOVA must precede the multiplicity adjusted pairwise comparisons. This is not true and many time causes you to lose significance [see Chapter 2-8, page 23, section, “Common Misconception of Thinking Analysis of Variance (ANOVA) Must Precede Pairwise Comparisons”]. New school: just go straight to the multiplicity adjusted pairwise comparisons if the pairwise comparisons are of interest, which is usually the case. Chapter 2-3 (revision 16 May 2010) p. 3 Most commonly used tests when DO need to control for confounding (adjusted analysis, or multivariable analysis) [always the case with observational studies, since randomization is not employed] Level of Two Three or more Two Three or more Measurement Independent Independent Correlated* Correlated of outcome Groups Groups Samples samples variable Dichotomous Logistic Logistic conditional mixed effects regression regression & logistic logistic consider need regression, or regression for multiplicity mixed effects adjustment logistic regression Ordered Ordinal Ordinal mixed effects mixed effects categorical logistic logistic ordinal ordinal logistic regression regression & logistic regression consider need regression for multiplicity adjustment Continuous Linear Linear mixed effects mixed effects regression regression & linear linear regression consider need regression for multiplicity adjustment Censored: Cox Cox regression SharedShared-frailty time to event regression & consider frailty Cox Cox regression need for regression multiplicity adjustment * Correlated case: repeated measurements on same person, or clustered data (e.g., patients nested within clinics) **Mixed effects regression models are synonymously called: multilevel models, hierarchal models, longitudinal models Chapter 2-3 (revision 16 May 2010) p. 4 The following pages provide a much larger list of tests. Type of Statistical Problem single sample (one group, frequently compared to a constant) related sample comparisons: a variable is measured on the same person more than once (e.g., pretest-posttest, baseline, time 1, time 2....) (also called: paired samples, repeated measurements) unrelated sample comparisons: groups being compared are different people (e.g., treatment group vs control group) (also called: independent samples) measures of association (correlation statistics) measures of agreement (e.g., inter-rater agreement) regression models power and sample size The layout of the following pages closely follows the one found on the inside front cover of the StatXact-4 Manual (1998). Siegel and Castellan (1988) also provide a similar digest for tests covered in their text. That following notation is used: SX = StatXact-5® Statistical Software SPSS = SPSS® 11.0 Statistical Software SPow = SamplePower™ 2.0 Statistical Software Stata = Stata 7.0 Statistical Software C.I. = Confidence Interval Chapter 2-3 (revision 16 May 2010) p. 5 Type of Statistical Problem/Measurement Scale One Sample Dichotomous Scale Binomial Test (SX, SPSS, Stata: bitest , bitesti) Binomial C.I. (SX, Stata: ci , propci , propcii) Runs Test (SX,SPSS, Stata: runtest) One Sample Nominal Scale Chi-Square Goodness-of-Fit Test (SX,SPSS) Multinomial C.I. (SX) Test of Homogeneity of Poisson Rates (SX) Poisson C.I. (SX,Stata: ci, cii) One Sample Ordinal Scale Chi-Square Goodness-of-Fit Test (SX) One Sample Interval Scale Tests for Location (average) T Test (SPSS,Stata: ttest, ttesti) C.I. for Mean (SPSS, Stata: ci, cii, ttest) Tests for Scale (variance) Chi-Square test SD = # (Stata: sdtest, sdtesti) Tests for Goodness-of-Fit (e.g., Normality Test) Shapiro-Wilk Test for normality (SX,SPSS,Stata: swilk) Shapiro-Francia test for normality (Stata: sfrancia) Kolmogorov Test (SX,SPSS,Stata: ksmirnov) Lilliefors Test (SX,SPSS) Runs Test (with cut-off) (SX,Stata: runtest) Two Related Samples Dichotomous Scale McNemar’s Test (SX,SPSS,Stata: mcc,mcci) C.I. for proportion difference (Stata: mcc,mcci) Sign Test (SX,SPSS,Stata: signtest) Odds Ratio (Stata: mcc,mcci) C.I. for Odds Ratio (Stata: mcc,mcci) Two Related Samples Nominal Scale Marginal Homogeneity Test (Stata: symmetry,symmi) (this test is an extension of the McNemar test to more than 2 categories—SX version 5 assumes an ordinal scale, and so is not appropriate for the nominal scale case) Two Related Samples Ordinal Scale Wilcoxon Matched-Pairs Signed-Ranks Test (SX,SPSS,Stata: signrank) Sign Test (SX,SPSS,Stata: signtest) Marginal Homogeneity Test (SX,Stata: symmetry,symmi) (this test is an extension of the McNemar test to more than 2 categories) Two Related Samples Interval Scale Paired T Test (SPSS,Stata: ttest) C.I. for Mean Difference (SPSS,Stata: ttest) Fisher-Putman Permutation Test for Paired Replications (Stata: permtest1) Permutation Tests with General Scores (SX) Hodges-Lehmann C.I. for Shift (SX,Stata: npshift) Chapter 2-3 (revision 16 May 2010) p. 6 Two Unrelated Samples Dichotomous Data Pearson’s Chi-Square Test (SX,SPSS,Stata: tabulate,tab2,tabi) Fisher’s Exact Test (SX,SPSS,Stata: tabulate,tab2,tabi) Likelihood Ratio Test (SX,SPSS,Stata: tabulate,tab2,tabi) Odds Ratio (SX,SPSS,Stata: cc, cci) C.I. for Odds Ratio (SX,SPSS,Stata: cc, cci) Barnard’s Test (SX) Risk Difference (SX,Stata: cs, csi) Risk Ratio (SX,Stata: cs, csi) C.I. for P2 - P1 or Risk Difference (SX,Stata: cs, csi) C.I. for P2 / P1 or Risk Ratio (SX,Stata: cs, csi) Two Unrelated Samples Nominal Scale Pearson’s Chi-Square Test (SX,SPSS,Stata: tabulate) Fisher-Freeman-Halton Test (SX,SPSS,Stata: tabulate) (called Fisher’s Exact Test in Stata) Likelihood Ratio Test (SX,SPSS,Stata: tabulate) Poisson Samples Test of Homogeneity of Poisson Rates (SX) C.I. for Common Poisson Ratio (SX) Two Unrelated Samples Ordinal Scale Tests for Location (average) Wilcoxon-Mann-Whitney Test (SX,SPSS,Stata: ranksum) Median Test (Stata: median) Normal Scores Test (SX) Savage Scores Test (SX) Permutation Tests with General Scores (SX) Permutation Tests with MERT Scores (SX) C.I. for Common Odds Ratio (SX,Stata: mhodds) Tests for Scale (variance) Siegel-Tukey Test (SX) Mood Test (SX) Ansari-Bradley Test (SX) Klotz Test (SX) Conover Test (SX) Omnibus Tests (shape and location) Kolmogorov-Smirnov Test (SX,SPSS,Stata: ksmirnov) Wald-Wolfowitz Runs Test (SX,SPSS) Two Unrelated Samples Interval Scale Tests for Location (average) T Test (SPSS,Stata: ttest, ttesti) C.I. for Difference Between Means (SPSS,Stata: ttest) Fisher-Putman Permutation Test for Two Independent Samples (Stata: permtest2) Hodges-Lehmann C.I. for Shift (SX,Stata: npshift) Tests for Scale (variance) Levene’s test for equality of variances (SPSS,Stata: robvar) F Test for equality of variances (SPSS,Stata: sdtest, sdtesti) Chapter 2-3 (revision 16 May 2010) p. 7 Moses Rank-Like Test (SPSS) Tests for Censored Survival Data Logrank Test (SX,SPSS,Stata: ltable, sts) also called Mantel-Cox Test Wilcoxon-Gehan Test (SX,Stata: sts) Breslow Test (SPSS) also called Beslow Generalized Wilcoxon Test Tarone-Ware Test (SPSS,Stata: sts) Peto and Peto’s Generalized Wilcoxon Test (Stata: sts) Two Unrelated Samples: Stratified Analysis Dichotomous Scale Test of Homogeneity of Odds-Ratios (SX,SPSS,Stata: mhodds) Test for Common Odds-Ratio (SX,SPSS,Stata: mhodds,tabodds) C.I. for Common Odds-Ratio (SX,SPSS,Stata: mhodds,tabodds) Two Unrelated Samples: Stratified Analysis Nominal Scale Poisson Samples Test of Homogeneity of Poisson Rate Ratio (SX) C.I. for Common Poisson Rate Ratio (SX) Two Unrelated Samples: Stratified Analysis Ordinal Scale Wilcoxon Rank Sum Test (SX) Savage Scores Test (SX) Permutation Tests with General Scores (SX) Permutation Tests with MERT Scores (SX) Permutation Tests with Stratum-Specific Scores (SX) Score Test for Trend in Odds (Stata: tabodds) C.I. for Common Odds Ratio (SX,Stata: mhodds,tabodds) Two Unrelated Samples: Stratified Analysis Interval Scale Tests for Censored Survival Data Logrank Test (SX,SPSS,Stata: sts) also called Mantel-Cox Test Breslow Test (SPSS) also called Beslow Generalized Wilcoxon Test Wilcoxon-Gehan Test (SX,Stata: sts) Tarone-Ware Test (SPSS,Stata: sts) K Related Samples: Unordered Treatments Dichotomous Scale Cochran Q Test (SX,SPSS) K Related Samples: Unordered Treatments Nominal Scale K Related Samples: Unordered Treatments Ordinal Scale Friedman Test (SX,SPSS,Stata: friedman) Quade Test (SX) K Related Samples: Unordered Treatments Interval Scale K Related Samples: Ordered Treatments Dichotomous Scale K Related Samples: Ordered Treatments Nominal Scale K Related Samples: Ordered Treatments Ordinal Scale Chapter 2-3 (revision 16 May 2010) p. 8 Page Test (SX) K Related Samples: Ordered Treatments Interval Scale Repeated Measures Analysis of Variance (SPSS,Stata: anova) K Unrelated Samples: Unordered Treatments Dichotomous Scale Pearson Chi-Square Test (SX,SPSS,Stata: tabulate) Fisher-Freeman-Halton Test (SX, SPSS Exact Tests module, Stata: tabulate) (called Fisher’s Exact Test in Stata) Likelihood Ratio Test (SX,SPSS,Stata: tabulate) K Unrelated Samples: Unordered Treatments Nominal Scale Pearson Chi-Square Test (SX,SPSS,Stata: tabulate) Fisher-Freeman-Halton Test (SX, SPSS Exact Tests module, Stata: tabulate) (called Fisher’s Exact Test in Stata) Likelihood Ratio Test (SX,SPSS,Stata: tabulate) K Unrelated Samples: Unordered Treatments Ordinal Scale Kruskal-Wallis Test (SX,SPSS,Stata: kwallis) Median Test (SX,SPSS,Stata: median) Normal Scores Test (SX) Savage Scores Test (SX) One-Way ANOVA with General Scores (SX) K Unrelated Samples: Unordered Treatments Interval Scale Tests for Location (average) One-way Analysis of Variance (SPSS,Stata: oneway, anova) Post-Hoc Multiple Comparison Procedures (SPSS,Stata: oneway) Tests for Censored Survival Data Logrank Test (SX,SPSS,Stata: sts) also called Mantel-Cox Test Breslow Test (SPSS) also called Beslow Generalized Wilcoxon Test Wilcoxon-Gehan Test (SX,Stata: sts) Tarone-Ware Test (SPSS,Stata: sts) K Unrelated Samples: Ordered Treatments Dichotomous Scale Cochran-Armitage Trend Test (with stratification) (SX) C.I. for Common Odds Ratio (SX Stata: mhodds,tabodds) K Unrelated Samples: Ordered Treatments Nominal Scale K Unrelated Samples: Ordered Treatments Ordinal Scale Linear by Linear Association Test (SX,SPSS) also called Mantel-Haenszel Chi-Square Jonckheere-Terpstra Test (SX,SPSS) K Unrelated Samples: Ordered Treatments Interval Scale Tests for Censored Survival Data Tarone-Ware Trend Test for Censored Survival Data (SX,Stata: sts) Measures of Association Dichotomous Scale odds ratio (SX,SPSS,Stata: cc, cci ) phi (SPSS, Stata: phi) Contingency Coefficients (SX) Chapter 2-3 (revision 16 May 2010) p. 9 Goodman Kruskal Tau (SX) Uncertainty Coefficients (SX) Measures of Association Nominal Scale Contingency Coefficients (SX) Goodman Kruskal Tau (SX,Stata: tabulate,tab2,tabi) Uncertainty Coefficients (SX) Measures of Association Ordinal Scale Spearman’s Rank-Order Correlation Coefficient (SX,SPSS, Stata: spearman) Kendall’s Tau Coefficient (SX,SPSS,Stata: tabulate,tab2,tabi,spearman) Somer’s D Coefficient (SX,Stata: somersd) Goodman-Kruskal Gamma Coefficient (SX,Stata: tabulate,tab2,tabi) Kendall’s Coefficient of Concordance (SX,Stata: friedman) Measures of Association Interval Scale Pearson’s Product-Moment Correlation Coefficient (SX,SPSS,Stata: pwcorr, correlate) Measures of Agreement Dichotomous Scale Cohen’s Kappa (SX,SPSS,Stata: kappa) Measures of Agreement Nominal Scale Cohen’s Kappa (SX,SPSS,Stata: kappa) Measures of Agreement Ordinal Scale Weighted Kappa (SX,Stata: kapwgt) Kendall (SPSS) Measures of Agreement Interval Scale Intra-class Correlation Coefficient (SPSS) Cronbach's alpha (SPSS,Stata: alpha) Regression Models Dichotomous Dependent Variable Logistic Regression (SPSS,Stata: logistic) Conditional Logistic Regression (matched pairs) (Stata: clogit) Probit Analysis (SPSS,Stata: probit,biprobit,hetprob,keckprob,glm) Discriminate Analysis (SPSS,Stata: discrim) Monotonic Regression (SPSS GOLDMineR module) Regression Models Nominal Dependent Variable Loglinear Analysis (SPSS,Stata: loglin,ipf) Logit Loglinear Analysis (SPSS,Stata: glogit,nlogit,xtlogit,ologit,scobit) Discriminate Analysis (SPSS,Stata: discrim) Poisson Regression (Stata: poisson,xtpois) Regression Models Ordinal Dependent Variable Ordinal Regression (SPSS, Stata: ologit,oprobit) Monotonic Regression (SPSS GOLDMineR module) Regression Models Interval Dependent Variable Linear Regression (SPSS,Stata: regress) Analysis of Variance (SPSS,Stata: anova) Nonlinear Regression (SPSS,Stata: nl) Factor Analysis (SPSS,Stata: factor) Cluster Analysis (SPSS,Stata: cluster,cldis,clavg,clcomp,clkmeans, clkmed,clsing) Chapter 2-3 (revision 16 May 2010) p. 10 Multidimensional Scaling (SPSS) Time Series Analysis (SPSS,Stata: arima,xt) Variance Components Analysis (SPSS) Censored Survival Data Kaplan-Meier Survival Curves (SPSS,Stata: ltable,sts) Cox Regression (SPSS,Stata: cox,stcox) Life Tables (SPSS,Stata: ltable) Power & Sample Size Dichotomous Scale Fisher’s Exact Test (SX,SPow) Pearson’s Chi-Square Test (SX,SPow, Stata: sampsi) Likelihood Ratio Test (SX) Equivalence Test (SX) Binomial Test (SPow) McNemar Test (SPow) Sign Test (SPow) Logistic Regression (SPow) Survival Analysis (SPow) Cohen’s Kappa Inter-rater agreement (Stata: sskdlg) Power & Sample Size Nominal Scale Pearson’s Chi-Square Test (Spow) Power & Sample Size Ordinal Scale Wilcoxon-Mann-Whitney Test (SX) Trend Tests on K Binomial Samples (SX) Linear Rank Tests on Two Multinomial Samples (SX) Power & Sample Size Interval Scale One Sample T Test (SPow,Stata: sampsi) Paired Sample T Test (SPow,Stata: sampsi) Two Sample T Test (SPow,Stata: sampsi) One Sample Pearson Correlation = # (SPow) Two Sample Equality of Pearson Correlations (SPow) One-Way Analysis of Variance/Covariance (SPow) K-Way (Factorial) Analysis of Variance/Covariance (SPow) Linear Regression (SPow) Nonlinear Regression (SPow) Equivalence Study Two Sample T Test (SPow) References Siegel S, Castellan NJ. (1988). Nonparametric Statistics for the Behavioral Sciences. 2nd ed. New York, McGraw-Hill. StatXact 5 for Windows: Statistical Software for Exact Nonparametric Inference User Manual. (2001). Cambridge MA, Cytel Software Corporation. Chapter 2-3 (revision 16 May 2010) p. 11