Group Comparisons Part 3: Nonparametric Tests, Chi-squares and Fisher Exact Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Core Director for Biostatistics Center for Aging and Population Health Dept. of Epidemiology, GSPH Flow chart for group comparisons Measurements to be compared continuous discrete ( binary, nominal, ordinal with few values) Distribution approx normal or N ≥ 20? No Yes Non-parametrics T-tests Chi-square Fisher’s Exact A physiologic index of comorbidity – relationship to mortality and disability. Anne B. Newman, MD, MPH, Robert M. Boudreau, PhD, Barbara L. Naydeck, MPH, Linda F. Fried, MD, MPH and Tamara B. Harris, MD, MS J Gerontol Med Sci. 2008 5 Physiologic System Measures Cystatin C Internal Carotid Artery Wall Thickness (ICA) Pulmonary: Forced Vital Capacity (FVC) Fasting Glucose White Matter Grade N=2928 elderly participants in longitudinal cohort study 0-2 scale on each: 0=healthiest, 2=worst tertiles or clinical cutpoints (e.g. glucose <100, 100-126, 126+) Physiologic Index= sum (range=0 to 10) * Mortality rates based on 9 yrs followup √ Comparisons Using 2-Sample Independent T-tests ? √ Comparisons Using 2-Sample Independent T-tests ? √ √ √ √ Comparisons Using Chi-Square ? (categorical) √ Comparisons Using Chi-Square ? (categorical) √ √ √ Pooled or Unequal Variance 2-sample T-test ? Pooled or Unequal Variance 2-sample T-test ? Pooled df=(1237-1)+ (1691-1) = 2926 Unequal Vars (Satterthwaite) Unequal Vars (Satterthwaite) 2-Sample T-test, Non-parametric: Wilcoxon Rank-Sum Test Three-dimensional and thermal surface imaging produces reliable measures of joint shape and temperature: a potential tool for quantifying arthritis Steven J Spalding, C Kent Kwoh, Robert Boudreau, Joseph Enama, Julie Lunich, Daniel Huber, Louis Denes and Raphael Hirsch Arthritis Research & Therapy 2008 Will focus on HDI Heat Distribution Index = SD of temps in standard reproducibly defined HDI of MCPs: RA vs Controls MCP Region …………... HDI (Heat Distribution Index) of MCPs 10 adults controls vs 9 adults with active RA HDI (Heat Distribution Index) of MCPs 10 adults controls vs 9 adults with active RA T-test (2-sample independent) vs Wilcoxon Rank-Sum (aka Mann-Whitney) Control (n=10) Arthritis (n=9) 1.2 1.4 1.1 2.4 1.0 2.3 1.2 2.1 0.6 3.0 0.5 1.1 1.0 1.4 1.0 1.3 1.3 1.1 1.2 Mean 1.01 1.79 SD 0.26 0.70 Median 1.05 1.40 HDI (Heat Distribution Index) of MCPs 10 adults controls vs 9 adults with active RA T-test (2-sample independent) “pooled” df = 10+9-2=17 T-Tests Variable Method Variances HDI HDI Pooled Satterthwaite Equal Unequal DF t Value Pr > |t| 17 10.2 3.36 3.23 0.0037 0.0089 Test for Equality of Variances Variable Method HDI Folded F Num DF Den DF F Value Pr > F 8 9 6.60 0.0105 HDI (Heat Distribution Index) of MCPs 10 adults controls vs 9 adults with active RA T-test (2-sample independent) T-Tests Variable Method Variances HDI HDI Pooled Satterthwaite Equal Unequal DF t Value Pr > |t| 17 10.2 3.36 3.23 0.0037 0.0089 Test for Equality of Variances Variable Method HDI Folded F Num DF Den DF F Value Pr > F 8 9 6.60 0.0105 Test of equality of variances is rejected => Use Unequal Variance t-test (Satterthwaite) HDI (Heat Distribution Index) of MCPs 10 adults controls vs 9 adults with active RA Wilcoxon Rank-Sum (aka Mann-Whitney) The idea/motivation: Method should work for any distribution non-parametric Base statistical test on ranks rank = order when all data is sorted from lowest to highest each group then gets a “rank sum” Won’t be affected by outliers Like all statistical tests, p-value is based on distribution (of difference in rank-sums here) assuming there is no difference between the groups HDI (Heat Distribution Index) of MCPs 10 adults controls vs 9 adults with active RA Wilcoxon Rank-Sum (aka Mann-Whitney) Base statistical test on ranks each group gets a “rank sum” p-value is based on distribution of difference in rank-sums assuming there is no difference between the groups just like shuffling cards (with only two colors on cards; even if different n’s) the critical values are the “extreme” differences in rank-sums between the two groups (α = 0.05 => the most extreme 5% of differences ) Sorted then assigned ranks Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 group HDI HDI_rank Control Control Control Control Control Control Arthritis Arthritis Control Control Control Control Arthritis Arthritis Arthritis Arthritis Arthritis Arthritis Arthritis 0.5 0.6 1.0 1.0 1.0 1.1 1.1 1.1 1.2 1.2 1.2 1.3 1.3 1.4 1.4 2.1 2.3 2.4 3.0 1.0 2.0 4.0 4.0 4.0 7.0 7.0 7.0 10.0 10.0 10.0 12.5 12.5 14.5 14.5 16.0 17.0 18.0 19.0 Average rank (= 12.5) HDI (Heat Distribution Index) of MCPs 10 adults controls vs 9 adults with active RA Wilcoxon Rank-Sum (aka Mann-Whitney) Wilcoxon Scores (Rank Sums) for Variable HDI Classified by Variable Group Group Control Arthritis N 10 9 Sum of Scores 64.50 125.50 Expected Under H0 100.0 90.0 Std Dev Under H0 12.172013 12.172013 Average scores were used for ties. Mean Score 6.45000 13.94444 HDI (Heat Distribution Index) of MCPs 10 adults controls vs 9 adults with active RA Wilcoxon Rank-Sum (aka Mann-Whitney) Wilcoxon Two-Sample Test Statistic (S) 125.5000 Normal Approximation Z One-Sided Pr > Z Two-Sided Pr > |Z| 2.8754 0.0020 0.0040 t Approximation One-Sided Pr > Z Two-Sided Pr > |Z| 0.0050 0.0101 Exact Test One-Sided Pr >= S Two-Sided Pr >= |S - Mean| 0.0012 0.0023 Z includes a continuity correction of 0.5. Comparing Groups in the Percentage Falling into Categories Example: Treatment for RA Compare MTX vs MTX+ETN Outcomes (@ 3 months) Dichotomous: e.g. % in remission % with DAS28 drop > 1.2 pts Multiple Categories: ACR 20/50/70 % of pts reaching each level (sum to 100%) Comparisons Using Chi-Square ? (categorical) √ √ √ Comparing Groups on the Percentage Falling into Categories Rule of thumb: [1] All cell sizes ≥ 5 => Use Chi-square [2] Any cell size < 5 => Use Fisher’s Exact Reason: Criterion [1] is a condition for the Central Limit Theorem to hold with good accuracy (… so p-values are accurate) Comparing Groups on the Percentage Falling into Categories Sharma L, et.al. Quadriceps Strength and OA Progression in Malaligned and Lax Knees, Ann Intern Med. 2003 Inclusions: KLgrade ≥ 2 At least a little difficulty (Likert category) on at least two items in Western Ontario and McMaster University osteoarthritis index physical function scale Exclusions: corticosteroid injection < 3 months, avascular necrosis, rheumatoid or other inflammatory arthritis, periarticular fracture, Paget disease, villonodular synovitis, … (etc.) Comparing Groups on the Percentage Falling into Categories JSN Progression No Yes # Knees Low quadraceps Strength 111 (88.8%) 14 (11.2%) 125 High quadraceps Strength 111 (88.8%) 14 (11.2%) 125 Low quadraceps Strength 28 (74.4%) 10 (26.3%) 38 High quadraceps Strength 20 (50.0%) 20 (50.0%) 40 More neutral alignment (< 5 degrees) Malignment ( ≥ 5 degrees ) Comparing Groups on the Percentage Falling into Categories JSN Progression No Yes # Knees Low quadraceps Strength 28 (74.4%) 10 (26.3%) 38 (48.7%) High quadraceps Strength 20 (50.0%) 20 (50.0%) 40 (51.3%) Column totals 48 (61.5%) 30 (38.5) Total = 78 Malignment ( ≥ 5 degrees ) Comparing Groups on the Percentage Falling into Categories Chi-square Statistic df=(rows-1) x (cols-1) Note: ni j = observed (actual) cell count eij = (row %) x (col %) x (total # knees) = (# knees in row) x (col %) = expected cell count as if groups are the “same” (eij effectively applies the “pooled” average JSN Progression rate to both groups) Cells are: # observed (# expected) JSN Progression No Yes Row %’s Low quadraceps strength 28 (23.4) 10 (14.6) 38 (48.7%) High quadraceps strength 20 (24.6) 20 (15.4) 40 (51.3%) Column %’s 61.5% 38.5% Total = 78 Malignment ( ≥ 5 degrees ) High quadraceps strength: Expected # Yes = 0.513*0.385*78=0.1975*78 = 0.385 * 40 knees=15.4 Comparing Groups on the Percentage Falling into Categories JSN Progression No Yes # Knees Low quadraceps Strength 28 (74.4%) 10 (26.3%) 38 (48.7%) High quadraceps Strength 20 (50.0%) 20 (50.0%) 40 (51.3%) Column totals 48 (61.5%) 30 (38.5) Total = 78 Malignment ( ≥ 5 degrees ) Chi-square = 4.6184, p=0.0316 Fisher’s Exact: p=0.0383 df = (2-1) x (2-1) = 1 Cells are: Obs # (Alt #) Fisher’s Exact uses all (Alt #)’s that retain same row/col counts JSN Progression No Yes # Knees Low quadraceps Strength 28 (29) 10 (9) 38 High quadraceps Strength 20 (19) 20 (21) 40 Column totals 48 (61.5%) 30 Total = 78 Malignment ( ≥ 5 degrees ) Fisher’s Exact p-value is the hypergeometric proportion of tables that are at least as “extreme” as the observed table. (above table is more “extreme”) Comparing Groups on the Percentage Falling into Categories Rule of thumb: [1] All cell sizes ≥ 5 => Use Chi-square [2] Any cell size < 5 => Use Fisher’s Exact