Statistics for the Terrified Paul F. Cook, PhD Center for Nursing Research What Good are Statistics? • “How big?” (“how much?”, “how many?”) – Descriptive statistics, including effect sizes – Describe a population based on a sample – Help you make predictions • “How likely?” – Inferential statistics – Tell you whether a finding is reliable, or probably just due to chance (sampling error) Answering the 2 Questions • Inferential statistics tell you “how likely” – Can’t tell you how big – Can’t tell you how important – “Success” is based on a double negative • Descriptive statistics tell you “how big” – Cohen’s d = x1 – x2 SDpooled – Pearson r (or other correlation coefficient) – Odds ratio How Big is “Big”? • Correlations – – – – 0 = no relationship, 1 = upper limit + = + effect, - = - effect .3 for small, .5 for medium, .7 for large r2 = “percent of variability accounted for” • Cohen’s d – Means are how many SDs apart?: 0 = no effect – .5 for small, .75 for medium, 1.0 for large • Odds Ratio – 1 = no relationship, <1 = - effect, >1 = + effect • All effect size statistics are interchangeable! How Likely is “Likely”? - Test Statistics A ratio of “signal” vs. “noise” x1 – x2 z= s12/n1 + s22/n2 X1 X2 “signal” : AKA, “between-groups variability” or “model” “noise” : AKA, “within-groups variability” or “error” How do We Get the p-value? Chebyshev’s Theorem: z > + 1.96 is the critical value for p < .05 (half above, half below: always use a 2-tailed test unless you have reason not to) 2.5% 2.5% -1.96 SD 1.96 SD -1.96 1.96 Hypothesis Testing – 5 Steps 1. 2. 3. 4. State null and alternative hypotheses Calculate a test statistic Find the corresponding p-value “Reject” or “fail to reject” the null hypothesis (your only 2 choices) 5. Draw substantive conclusions Red = statistics, blue = logic, black = theory How Are the Questions Related? • “Large” z = a large effect (d) and a low p • But z depends on sample size; d does not – Every test statistic is the product of an effect size and the sample size – Example: f = c2 / N • A significant result (power) depends on: – What alpha level (a) you choose – How large an effect (d) there is to find – What sample size (n) is available What Type of Test? • N-level predictor (2 groups): t-test or z-test • N-level predictor (3+ groups): ANOVA (F-test) • I/R-level predictor: correlation/regression • N-level dependent variable: c2 or logistic reg. Correlation answers the “how big” question, but can convert to a t-test value to also answer the “how likely” question The F test • ANOVA = “analysis of variance” • Compares variability between groups to variability within groups MSEb F= avg. difference among means = MSEw • Signal vs. noise avg. variability within each group Omnibus and Post Hoc Tests • • • • The F-test compares 3+ groups at once Benefit: avoids “capitalizing on chance” Drawback: can’t see individual differences Solution: post hoc tests – Bonferroni correction for 1-3 comparisons (uses an “adjusted alpha” of .025 or .01) – Tukey test for 4+ comparisons F and Correlation (eta-squared) The F-Table: SS Between Within df Between Within Total Total (= SSb + SSw) (= dfb + dfw) MS SSb / dfb SSw / dfw F MSb / MSw SSb eta2 = SStotal = % of total variability that is due to the IV (i.e., R-squared) p .05 Correlation Seen on a Graph Same Direction, Weak Correlation Moderate Correlation Same Direction, Strong Correlation Regression and the F-test The line of best fit (minimizes sum of squared residuals) Actual value Error variance (residual) Predicted value Avg. SSmodel variance F = Model variance (predicted) Avg. SSerror variance Parametric Test Assumptions • Tests have restrictive assumptions: – – – – Normality Independence Homogeneity of variance Linear relationship between IV and DV • If assumptions are violated, use a nonparametric alternative test: – Mann-Whitney U instead of t – Kruskal-Wallis H instead of F – Chi-square for categorical data Chi-Square • The basic nonparametric test • Also used in logistic regression, SEM • Compares observed values (model) to observed minus predicted values (error) c2 = S ( Fo - Fe )2 Fe • Signal vs. noise again • Easily converts to phi coefficient: f = √c2 / N 2-by-2 Contingency Tables Dependent Observations • Independence is a major assumption of parametric tests (but not nonparametrics) • Address non-independence by collapsing scores to a single observation per participant: – Change score = posttest score – pretest score – Can calculate SD (variability) of change scores • Determine if the average change is significantly different from zero (i.e., “no change”): – t = (average change – zero) / (SDchange / √ n ) – Nonparametric version: Wilcoxon signed-rank test ANCOVA / Multiple Regression • Statistical “control” for confounding variables – no competing explanations • Method adds a “covariate” to the model: – That variable’s effects are “partialed out” – Remaining effect is “independent” of confound • One important application: ANCOVA – Test for post-test differences between groups – Control for pre-test differences • Multiple regression: Same idea, I/R-level DV – Stepwise regression finds “best” predictors This is the amount of variability in the DV that can be accounted “Unique Variability” for IV1 for by its association with IV1 Independent Variable #1 This circle represents “Shared Variability” Unexplained all variability of the variability for IV1 & IV2 remaining for the that exists in the dependent variable dependent variable Independent Variable #2 “Unique Variability” for IV2 This is the amount of variability in the DV that can be accounted When What’s The “unique “shared two left IVs over variability” variability” account (variability for is is the in the the same the part part DV variability of ofnot the theaccounted variability variability in the for DV in in the by the (i.e., any DV DV when that that for association IV2 there predictor) canbyis beits shared accounted is considered variability), forwith by only “error”—random more they by this than are IV “multicollinear” one (and DV. (i.e., not unexplained) by any with other each IV) variability other. This graph can also be used to show the percentage of the variability in the DV that can be accounted for by each IV. IV1 If … is 30% of … … then the Total R2 is .30 DV IV2 “Total” R2 for IV1 & IV2 together = All variability for the IVs (notDV including shared) Theunique percentage of variability in the that canany be accounted Total DVdefinition of R2—the coefficient of for bySSanfor IVthe is the determination. A related concept is the idea of a “semipartial R2, which 2 for IV1 = Semipartial tells you whatR% of the variability in the DV can be accounted for byType eachIII IVSS on for its own, IV1 not counting any shared variability. Total SS for the DV IV1 If … is 20% of … … then the semipartial R2 for IV1 is .20 DV IV2 The semipartial R2 is the percentage of variability in the DV that can be accounted for by one individual predictor, independent of the effects of all of the other predictors. Lying with Statistics • • • • • Does the sample reflect your population? Is the IV clinically reasonable? Were the right set of controls in place? Is the DV the right way to measure outcome? Significant p-value = probably replicable – APA task force: always report exact p-value • Large effect size = potentially important – APA, CONSORT guideline: always report effect size • Clinical significance still needs evaluation What We Cover in Quant II • Basic issues – Missing data, data screening and cleaning – Meta-analysis – Factorial ANOVA and interaction effects • Multivariate analyses – MANOVA – Repeated-Measures ANOVA • Survival analysis • Classification – Logistic regression – Discriminant Function Analysis • Data simplification and modeling – Factor Analysis – Structural Equation Modeling • Intensive longitudinal data (hierarchical linear models) • Exploratory data analysis (cluster analysis, CART)