Chapter 2-3. Choice of significance test

advertisement
Chapter 2-3. Choice of significance test
The choice of a test statistic, or significance test, is based on the measurement scale of your data,
how many groups you want to compare, and whether the groups are independent or related
(paired). Once you have identified what category your data fall into, you then choose the test that
best fits your research question. This chapter contains only a small list of the available
significance tests. The purpose of the chapter is mostly to illustrate how the decision is made
based on measurement scales, number of groups to compare, and whether the groups are
independent or related.
To use this chapter, determine the measurement scale of your data and the type of statistical
problem. Then look up a test in this chapter. If you are unfamiliar with the test, simply look it
up in the manual for the statistical software. These manuals will give a description of what each
test specifically does. Searching the internet for the statistical test is also helpful.
When computing statistics for someone else, it is generally best to use a statistic they are familiar
with. Trust me, this makes them happier, even if it’s not the most appropriate statistic; and if it
leads to the same conclusion, who cares! I have therefore listed the statistical tests found in
introductory statistical courses (the most widely known) at the top of the lists. If you are not
familiar with the other tests, just use the first one listed.
Many times, statisticians will use a test which requires only ordered categorical data to analyze
continuous data. This works because the continuous data contain all the properties of ordered
categorical data (the extra properties are simply not utilized). This might be done to eliminate
the effect of an outlier (skewed distribution), for example. Generally, however, this results in a
loss of power (the probability that you will get a statistically significant P value) because you are
using less information in the data.
_____________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2010.
Chapter 2-3 (revision 16 May 2010)
p. 1
Measurement Scale (also called level of measurement)
Nominal scale
name
unordered categories
e.g., cancer therapies: chemo, radiation, surgery
Ordinal scale
name + order
ordered categories
e.g., quality of life: lousy, okay, great
Interval scale
name + order + equal intervals + arbitrary zero point
continuous measurement with arbitrary zero
e.g., body temperature: 0°F does not imply
absence of temperature (although perhaps absence of
life). The 0 point is just a convention of the scale.
Ratios do not make sense--you would not say 101.8°F
is 1.05 times as hot as 97°F.
Ratio scale
name + order + equal intervals + absolute zero point
continuous measurement with absolute zero
e.g., hematocrit: 0% means no hematocrit, however
unlikely. Ratios make sense (at least arithmetically)–a
Hct of 48% is 1.2 times a Hct of 40%, although at
opposite ends of the normal range (so does not
necessarily equate to 1.2 times better health).
Dichotomous scale (a special case of the nominal scale, in that it always has just two
categories)
e.g., gender: male or female
Note 1: it makes sense to do arithmetic on interval scaled variables, since this scale is
sufficiently close to our notion of integers and real numbers (both number
systems have equal intervals). It does not make sense to do arithmetic on
nominal and ordinal scales, since these scales do not have equal intervals.
Note 2: for purposes of choosing test statistics, interval and ratio scales are considered
equivalent.
Note 3: although it is rarely claimed as such, a dichotomous scale could be considered an
interval scale, since it has order (although perhaps an arbitrary order), it has equal
intervals (one interval that is equal to itself), and one category can be selected
to represent 0.
A second measurement scale scheme is:
Binary data
Unordered categorical data
Ordered categorical data
Continuous data
Chapter 2-3 (revision 16 May 2010)
(dichotomous scale)
(nominal scale)
(ordinal scale)
(interval & ratio scales)
p. 2
Although the following two tables assumes much more than we have covered so far, here is a
quick list of an appropriate statistical test for the most commonly encountered statistical
problems.
Most commonly used tests when do NOT need to control for confounding (unadjusted
analysis, or univariable analysis)
[randomized clinical trial (RCT) or Table 1 Patient characteristics table of article (you never
discuss Table 1 statistics in a grant, since they are not testing a study aim)].
Level of
Two
Three or more
Two
Three or more
Measurement Independent
Independent
Correlated* Correlated
of outcome
Groups
Groups
Samples
Samples
variable
Dichotomous chi-square or
chi-square or
McNemar
mixed effects**
Fisher’s exact Fisher’s exact
test
logistic
test
test
regression
Ordered
WilcoxonOld school***: Wilcoxon
mixed effects
categorical
MannKruskal-Wallis sign rank
ordinal logistic
Whitney
analysis of
test for
regression
(WMW) test
variance
matched
(ANOVA)
data
Continuous
independent
groups t test
New school***:
multiplicity
adjusted WMW
tests
Old school***:
oneway
ANOVA
paired t test
mixed effects
linear regression
New school***:
multiplicity
adjusted
independent
groups t tests
Censored:
log-rank test
Multiplicity
SharedShared-frailty
time to event
adjusted logfrailty Cox
Cox regression
rank test
regression
* Correlated case: repeated measurements on same person, or clustered data (e.g., patients nested
within clinics)
**Mixed effects regression models are synonymously called: multilevel models, hierarchal
models, longitudinal models
***Old school: it is commonly thought that ANOVA must precede the multiplicity adjusted
pairwise comparisons. This is not true and many time causes you to lose significance [see
Chapter 2-8, page 23, section, “Common Misconception of Thinking Analysis of Variance
(ANOVA) Must Precede Pairwise Comparisons”]. New school: just go straight to the multiplicity
adjusted pairwise comparisons if the pairwise comparisons are of interest, which is usually the case.
Chapter 2-3 (revision 16 May 2010)
p. 3
Most commonly used tests when DO need to control for confounding
(adjusted analysis, or multivariable analysis)
[always the case with observational studies, since randomization is not employed]
Level of
Two
Three or more Two
Three or more
Measurement Independent
Independent
Correlated*
Correlated
of outcome
Groups
Groups
Samples
samples
variable
Dichotomous Logistic
Logistic
conditional
mixed effects
regression
regression &
logistic
logistic
consider need
regression, or regression
for multiplicity mixed effects
adjustment
logistic
regression
Ordered
Ordinal
Ordinal
mixed effects mixed effects
categorical
logistic
logistic
ordinal
ordinal logistic
regression
regression &
logistic
regression
consider need
regression
for multiplicity
adjustment
Continuous
Linear
Linear
mixed effects mixed effects
regression
regression &
linear
linear regression
consider need
regression
for multiplicity
adjustment
Censored:
Cox
Cox regression SharedShared-frailty
time to event regression
& consider
frailty Cox
Cox regression
need for
regression
multiplicity
adjustment
* Correlated case: repeated measurements on same person, or clustered data (e.g., patients nested
within clinics)
**Mixed effects regression models are synonymously called: multilevel models, hierarchal
models, longitudinal models
Chapter 2-3 (revision 16 May 2010)
p. 4
The following pages provide a much larger list of tests.
Type of Statistical Problem
single sample (one group, frequently compared to a constant)
related sample comparisons: a variable is measured on the same person more than once
(e.g., pretest-posttest, baseline, time 1, time 2....)
(also called: paired samples, repeated measurements)
unrelated sample comparisons: groups being compared are different people
(e.g., treatment group vs control group)
(also called: independent samples)
measures of association (correlation statistics)
measures of agreement (e.g., inter-rater agreement)
regression models
power and sample size
The layout of the following pages closely follows the one found on the inside front cover of the
StatXact-4 Manual (1998). Siegel and Castellan (1988) also provide a similar digest for tests
covered in their text.
That following notation is used:
SX = StatXact-5® Statistical Software
SPSS = SPSS® 11.0 Statistical Software
SPow = SamplePower™ 2.0 Statistical Software
Stata = Stata 7.0 Statistical Software
C.I. = Confidence Interval
Chapter 2-3 (revision 16 May 2010)
p. 5
Type of Statistical Problem/Measurement Scale
One Sample
Dichotomous Scale
Binomial Test (SX, SPSS, Stata: bitest , bitesti)
Binomial C.I. (SX, Stata: ci , propci , propcii)
Runs Test (SX,SPSS, Stata: runtest)
One Sample Nominal Scale
Chi-Square Goodness-of-Fit Test (SX,SPSS)
Multinomial C.I. (SX)
Test of Homogeneity of Poisson Rates (SX)
Poisson C.I. (SX,Stata: ci, cii)
One Sample Ordinal Scale
Chi-Square Goodness-of-Fit Test (SX)
One Sample Interval Scale
Tests for Location (average)
T Test (SPSS,Stata: ttest, ttesti)
C.I. for Mean (SPSS, Stata: ci, cii, ttest)
Tests for Scale (variance)
Chi-Square test SD = # (Stata: sdtest, sdtesti)
Tests for Goodness-of-Fit (e.g., Normality Test)
Shapiro-Wilk Test for normality (SX,SPSS,Stata: swilk)
Shapiro-Francia test for normality (Stata: sfrancia)
Kolmogorov Test (SX,SPSS,Stata: ksmirnov)
Lilliefors Test (SX,SPSS)
Runs Test (with cut-off) (SX,Stata: runtest)
Two Related Samples Dichotomous Scale
McNemar’s Test (SX,SPSS,Stata: mcc,mcci)
C.I. for proportion difference (Stata: mcc,mcci)
Sign Test (SX,SPSS,Stata: signtest)
Odds Ratio (Stata: mcc,mcci)
C.I. for Odds Ratio (Stata: mcc,mcci)
Two Related Samples Nominal Scale
Marginal Homogeneity Test (Stata: symmetry,symmi)
(this test is an extension of the McNemar test to more than 2
categories—SX version 5 assumes an ordinal scale, and so is not
appropriate for the nominal scale case)
Two Related Samples Ordinal Scale
Wilcoxon Matched-Pairs Signed-Ranks Test (SX,SPSS,Stata: signrank)
Sign Test (SX,SPSS,Stata: signtest)
Marginal Homogeneity Test (SX,Stata: symmetry,symmi)
(this test is an extension of the McNemar test to more than 2 categories)
Two Related Samples Interval Scale
Paired T Test (SPSS,Stata: ttest)
C.I. for Mean Difference (SPSS,Stata: ttest)
Fisher-Putman Permutation Test for Paired Replications (Stata: permtest1)
Permutation Tests with General Scores (SX)
Hodges-Lehmann C.I. for Shift (SX,Stata: npshift)
Chapter 2-3 (revision 16 May 2010)
p. 6
Two Unrelated Samples Dichotomous Data
Pearson’s Chi-Square Test (SX,SPSS,Stata: tabulate,tab2,tabi)
Fisher’s Exact Test (SX,SPSS,Stata: tabulate,tab2,tabi)
Likelihood Ratio Test (SX,SPSS,Stata: tabulate,tab2,tabi)
Odds Ratio (SX,SPSS,Stata: cc, cci)
C.I. for Odds Ratio (SX,SPSS,Stata: cc, cci)
Barnard’s Test (SX)
Risk Difference (SX,Stata: cs, csi)
Risk Ratio (SX,Stata: cs, csi)
C.I. for P2 - P1 or Risk Difference (SX,Stata: cs, csi)
C.I. for P2 / P1 or Risk Ratio (SX,Stata: cs, csi)
Two Unrelated Samples Nominal Scale
Pearson’s Chi-Square Test (SX,SPSS,Stata: tabulate)
Fisher-Freeman-Halton Test (SX,SPSS,Stata: tabulate)
(called Fisher’s Exact Test in Stata)
Likelihood Ratio Test (SX,SPSS,Stata: tabulate)
Poisson Samples
Test of Homogeneity of Poisson Rates (SX)
C.I. for Common Poisson Ratio (SX)
Two Unrelated Samples Ordinal Scale
Tests for Location (average)
Wilcoxon-Mann-Whitney Test (SX,SPSS,Stata: ranksum)
Median Test (Stata: median)
Normal Scores Test (SX)
Savage Scores Test (SX)
Permutation Tests with General Scores (SX)
Permutation Tests with MERT Scores (SX)
C.I. for Common Odds Ratio (SX,Stata: mhodds)
Tests for Scale (variance)
Siegel-Tukey Test (SX)
Mood Test (SX)
Ansari-Bradley Test (SX)
Klotz Test (SX)
Conover Test (SX)
Omnibus Tests (shape and location)
Kolmogorov-Smirnov Test (SX,SPSS,Stata: ksmirnov)
Wald-Wolfowitz Runs Test (SX,SPSS)
Two Unrelated Samples Interval Scale
Tests for Location (average)
T Test (SPSS,Stata: ttest, ttesti)
C.I. for Difference Between Means (SPSS,Stata: ttest)
Fisher-Putman Permutation Test for Two Independent Samples
(Stata: permtest2)
Hodges-Lehmann C.I. for Shift (SX,Stata: npshift)
Tests for Scale (variance)
Levene’s test for equality of variances (SPSS,Stata: robvar)
F Test for equality of variances (SPSS,Stata: sdtest, sdtesti)
Chapter 2-3 (revision 16 May 2010)
p. 7
Moses Rank-Like Test (SPSS)
Tests for Censored Survival Data
Logrank Test (SX,SPSS,Stata: ltable, sts)
also called Mantel-Cox Test
Wilcoxon-Gehan Test (SX,Stata: sts)
Breslow Test (SPSS)
also called Beslow Generalized Wilcoxon Test
Tarone-Ware Test (SPSS,Stata: sts)
Peto and Peto’s Generalized Wilcoxon Test (Stata: sts)
Two Unrelated Samples: Stratified Analysis Dichotomous Scale
Test of Homogeneity of Odds-Ratios (SX,SPSS,Stata: mhodds)
Test for Common Odds-Ratio (SX,SPSS,Stata: mhodds,tabodds)
C.I. for Common Odds-Ratio (SX,SPSS,Stata: mhodds,tabodds)
Two Unrelated Samples: Stratified Analysis Nominal Scale
Poisson Samples
Test of Homogeneity of Poisson Rate Ratio (SX)
C.I. for Common Poisson Rate Ratio (SX)
Two Unrelated Samples: Stratified Analysis Ordinal Scale
Wilcoxon Rank Sum Test (SX)
Savage Scores Test (SX)
Permutation Tests with General Scores (SX)
Permutation Tests with MERT Scores (SX)
Permutation Tests with Stratum-Specific Scores (SX)
Score Test for Trend in Odds (Stata: tabodds)
C.I. for Common Odds Ratio (SX,Stata: mhodds,tabodds)
Two Unrelated Samples: Stratified Analysis Interval Scale
Tests for Censored Survival Data
Logrank Test (SX,SPSS,Stata: sts)
also called Mantel-Cox Test
Breslow Test (SPSS)
also called Beslow Generalized Wilcoxon Test
Wilcoxon-Gehan Test (SX,Stata: sts)
Tarone-Ware Test (SPSS,Stata: sts)
K Related Samples: Unordered Treatments Dichotomous Scale
Cochran Q Test (SX,SPSS)
K Related Samples: Unordered Treatments Nominal Scale
K Related Samples: Unordered Treatments Ordinal Scale
Friedman Test (SX,SPSS,Stata: friedman)
Quade Test (SX)
K Related Samples: Unordered Treatments Interval Scale
K Related Samples: Ordered Treatments
Dichotomous Scale
K Related Samples: Ordered Treatments
Nominal Scale
K Related Samples: Ordered Treatments
Ordinal Scale
Chapter 2-3 (revision 16 May 2010)
p. 8
Page Test (SX)
K Related Samples: Ordered Treatments Interval Scale
Repeated Measures Analysis of Variance (SPSS,Stata: anova)
K Unrelated Samples: Unordered Treatments Dichotomous Scale
Pearson Chi-Square Test (SX,SPSS,Stata: tabulate)
Fisher-Freeman-Halton Test (SX, SPSS Exact Tests module,
Stata: tabulate) (called Fisher’s Exact Test in Stata)
Likelihood Ratio Test (SX,SPSS,Stata: tabulate)
K Unrelated Samples: Unordered Treatments Nominal Scale
Pearson Chi-Square Test (SX,SPSS,Stata: tabulate)
Fisher-Freeman-Halton Test (SX, SPSS Exact Tests module,
Stata: tabulate) (called Fisher’s Exact Test in Stata)
Likelihood Ratio Test (SX,SPSS,Stata: tabulate)
K Unrelated Samples: Unordered Treatments Ordinal Scale
Kruskal-Wallis Test (SX,SPSS,Stata: kwallis)
Median Test (SX,SPSS,Stata: median)
Normal Scores Test (SX)
Savage Scores Test (SX)
One-Way ANOVA with General Scores (SX)
K Unrelated Samples: Unordered Treatments Interval Scale
Tests for Location (average)
One-way Analysis of Variance (SPSS,Stata: oneway, anova)
Post-Hoc Multiple Comparison Procedures (SPSS,Stata: oneway)
Tests for Censored Survival Data
Logrank Test (SX,SPSS,Stata: sts)
also called Mantel-Cox Test
Breslow Test (SPSS)
also called Beslow Generalized Wilcoxon Test
Wilcoxon-Gehan Test (SX,Stata: sts)
Tarone-Ware Test (SPSS,Stata: sts)
K Unrelated Samples: Ordered Treatments Dichotomous Scale
Cochran-Armitage Trend Test (with stratification) (SX)
C.I. for Common Odds Ratio (SX Stata: mhodds,tabodds)
K Unrelated Samples: Ordered Treatments Nominal Scale
K Unrelated Samples: Ordered Treatments Ordinal Scale
Linear by Linear Association Test (SX,SPSS)
also called Mantel-Haenszel Chi-Square
Jonckheere-Terpstra Test (SX,SPSS)
K Unrelated Samples: Ordered Treatments Interval Scale
Tests for Censored Survival Data
Tarone-Ware Trend Test for Censored Survival Data (SX,Stata: sts)
Measures of Association Dichotomous Scale
odds ratio (SX,SPSS,Stata: cc, cci )
phi (SPSS, Stata: phi)
Contingency Coefficients (SX)
Chapter 2-3 (revision 16 May 2010)
p. 9
Goodman Kruskal Tau (SX)
Uncertainty Coefficients (SX)
Measures of Association Nominal Scale
Contingency Coefficients (SX)
Goodman Kruskal Tau (SX,Stata: tabulate,tab2,tabi)
Uncertainty Coefficients (SX)
Measures of Association Ordinal Scale
Spearman’s Rank-Order Correlation Coefficient (SX,SPSS,
Stata: spearman)
Kendall’s Tau Coefficient (SX,SPSS,Stata: tabulate,tab2,tabi,spearman)
Somer’s D Coefficient (SX,Stata: somersd)
Goodman-Kruskal Gamma Coefficient (SX,Stata: tabulate,tab2,tabi)
Kendall’s Coefficient of Concordance (SX,Stata: friedman)
Measures of Association Interval Scale
Pearson’s Product-Moment Correlation Coefficient (SX,SPSS,Stata:
pwcorr, correlate)
Measures of Agreement Dichotomous Scale
Cohen’s Kappa (SX,SPSS,Stata: kappa)
Measures of Agreement Nominal Scale
Cohen’s Kappa (SX,SPSS,Stata: kappa)
Measures of Agreement Ordinal Scale
Weighted Kappa (SX,Stata: kapwgt)
Kendall (SPSS)
Measures of Agreement Interval Scale
Intra-class Correlation Coefficient (SPSS)
Cronbach's alpha (SPSS,Stata: alpha)
Regression Models Dichotomous Dependent Variable
Logistic Regression (SPSS,Stata: logistic)
Conditional Logistic Regression (matched pairs) (Stata: clogit)
Probit Analysis (SPSS,Stata: probit,biprobit,hetprob,keckprob,glm)
Discriminate Analysis (SPSS,Stata: discrim)
Monotonic Regression (SPSS GOLDMineR module)
Regression Models Nominal Dependent Variable
Loglinear Analysis (SPSS,Stata: loglin,ipf)
Logit Loglinear Analysis (SPSS,Stata: glogit,nlogit,xtlogit,ologit,scobit)
Discriminate Analysis (SPSS,Stata: discrim)
Poisson Regression (Stata: poisson,xtpois)
Regression Models Ordinal Dependent Variable
Ordinal Regression (SPSS, Stata: ologit,oprobit)
Monotonic Regression (SPSS GOLDMineR module)
Regression Models Interval Dependent Variable
Linear Regression (SPSS,Stata: regress)
Analysis of Variance (SPSS,Stata: anova)
Nonlinear Regression (SPSS,Stata: nl)
Factor Analysis (SPSS,Stata: factor)
Cluster Analysis (SPSS,Stata: cluster,cldis,clavg,clcomp,clkmeans,
clkmed,clsing)
Chapter 2-3 (revision 16 May 2010)
p. 10
Multidimensional Scaling (SPSS)
Time Series Analysis (SPSS,Stata: arima,xt)
Variance Components Analysis (SPSS)
Censored Survival Data
Kaplan-Meier Survival Curves (SPSS,Stata: ltable,sts)
Cox Regression (SPSS,Stata: cox,stcox)
Life Tables (SPSS,Stata: ltable)
Power & Sample Size Dichotomous Scale
Fisher’s Exact Test (SX,SPow)
Pearson’s Chi-Square Test (SX,SPow, Stata: sampsi)
Likelihood Ratio Test (SX)
Equivalence Test (SX)
Binomial Test (SPow)
McNemar Test (SPow)
Sign Test (SPow)
Logistic Regression (SPow)
Survival Analysis (SPow)
Cohen’s Kappa Inter-rater agreement (Stata: sskdlg)
Power & Sample Size Nominal Scale
Pearson’s Chi-Square Test (Spow)
Power & Sample Size Ordinal Scale
Wilcoxon-Mann-Whitney Test (SX)
Trend Tests on K Binomial Samples (SX)
Linear Rank Tests on Two Multinomial Samples (SX)
Power & Sample Size Interval Scale
One Sample T Test (SPow,Stata: sampsi)
Paired Sample T Test (SPow,Stata: sampsi)
Two Sample T Test (SPow,Stata: sampsi)
One Sample Pearson Correlation = # (SPow)
Two Sample Equality of Pearson Correlations (SPow)
One-Way Analysis of Variance/Covariance (SPow)
K-Way (Factorial) Analysis of Variance/Covariance (SPow)
Linear Regression (SPow)
Nonlinear Regression (SPow)
Equivalence Study
Two Sample T Test (SPow)
References
Siegel S, Castellan NJ. (1988). Nonparametric Statistics for the Behavioral Sciences. 2nd ed.
New York, McGraw-Hill.
StatXact 5 for Windows: Statistical Software for Exact Nonparametric Inference User
Manual. (2001). Cambridge MA, Cytel Software Corporation.
Chapter 2-3 (revision 16 May 2010)
p. 11
Download