Statistics for the Terrified

advertisement
Statistics for the Terrified
Paul F. Cook, PhD
Center for Nursing Research
What Good are Statistics?
• “How big?” (“how much?”, “how many?”)
– Descriptive statistics, including effect sizes
– Describe a population based on a sample
– Help you make predictions
• “How likely?”
– Inferential statistics
– Tell you whether a finding is reliable, or probably
just due to chance (sampling error)
Answering the 2 Questions
• Inferential statistics tell you “how likely”
– Can’t tell you how big
– Can’t tell you how important
– “Success” is based on a double negative
• Descriptive statistics tell you “how big”
– Cohen’s d =
x1 – x2
SDpooled
– Pearson r (or other correlation coefficient)
– Odds ratio
How Big is “Big”?
• Correlations
–
–
–
–
0 = no relationship, 1 = upper limit
+ = + effect, - = - effect
.3 for small, .5 for medium, .7 for large
r2 = “percent of variability accounted for”
• Cohen’s d
– Means are how many SDs apart?: 0 = no effect
– .5 for small, .75 for medium, 1.0 for large
• Odds Ratio
– 1 = no relationship, <1 = - effect, >1 = + effect
• All effect size statistics are interchangeable!
How Likely is “Likely”? - Test Statistics
A ratio of “signal” vs. “noise”
x1 – x2
z=
s12/n1 + s22/n2
X1
X2
“signal” : AKA, “between-groups variability” or “model”
“noise” : AKA, “within-groups variability” or “error”
How do We Get the p-value?
Chebyshev’s Theorem:
z > + 1.96 is the critical value for p < .05
(half above, half below: always use a 2-tailed test unless you have reason not to)
2.5%
2.5%
-1.96 SD
1.96 SD
-1.96
1.96
Hypothesis Testing – 5 Steps
1.
2.
3.
4.
State null and alternative hypotheses
Calculate a test statistic
Find the corresponding p-value
“Reject” or “fail to reject” the null hypothesis
(your only 2 choices)
5. Draw substantive conclusions
Red = statistics, blue = logic, black = theory
How Are the Questions Related?
• “Large” z = a large effect (d) and a low p
• But z depends on sample size; d does not
– Every test statistic is the product of an effect size
and the sample size
– Example: f = c2 / N
• A significant result (power) depends on:
– What alpha level (a) you choose
– How large an effect (d) there is to find
– What sample size (n) is available
What Type of Test?
• N-level predictor (2 groups): t-test or z-test
• N-level predictor (3+ groups): ANOVA (F-test)
• I/R-level predictor: correlation/regression
• N-level dependent variable: c2 or logistic reg.
Correlation answers the “how big” question, but
can convert to a t-test value to also answer the
“how likely” question
The F test
• ANOVA = “analysis of variance”
• Compares variability between groups to
variability within groups
MSEb
F=
avg. difference among means
=
MSEw
• Signal vs. noise
avg. variability within each group
Omnibus and Post Hoc Tests
•
•
•
•
The F-test compares 3+ groups at once
Benefit: avoids “capitalizing on chance”
Drawback: can’t see individual differences
Solution: post hoc tests
– Bonferroni correction for 1-3 comparisons
(uses an “adjusted alpha” of .025 or .01)
– Tukey test for 4+ comparisons
F and Correlation (eta-squared)
The F-Table:
SS
Between
Within
df
Between
Within
Total
Total
(= SSb + SSw)
(= dfb + dfw)
MS
SSb / dfb
SSw / dfw
F
MSb / MSw
SSb
eta2 =
SStotal
= % of total variability that is due
to the IV (i.e., R-squared)
p
.05
Correlation Seen on a Graph
Same Direction,
Weak
Correlation
Moderate
Correlation
Same Direction,
Strong Correlation
Regression and the F-test
The line of best fit
(minimizes sum of
squared residuals)
Actual value
Error variance (residual)
Predicted value
Avg. SSmodel variance
F =
Model variance (predicted)
Avg. SSerror variance
Parametric Test Assumptions
• Tests have restrictive assumptions:
–
–
–
–
Normality
Independence
Homogeneity of variance
Linear relationship between IV and DV
• If assumptions are violated, use a
nonparametric alternative test:
– Mann-Whitney U instead of t
– Kruskal-Wallis H instead of F
– Chi-square for categorical data
Chi-Square
• The basic nonparametric test
• Also used in logistic regression, SEM
• Compares observed values (model) to
observed minus predicted values (error)
c2 =
S
( Fo - Fe )2
Fe
• Signal vs. noise again
• Easily converts to phi coefficient:
f = √c2 / N
2-by-2 Contingency Tables
Dependent Observations
• Independence is a major assumption of
parametric tests (but not nonparametrics)
• Address non-independence by collapsing scores
to a single observation per participant:
– Change score = posttest score – pretest score
– Can calculate SD (variability) of change scores
• Determine if the average change is significantly
different from zero (i.e., “no change”):
– t = (average change – zero) / (SDchange / √ n )
– Nonparametric version: Wilcoxon signed-rank test
ANCOVA / Multiple Regression
• Statistical “control” for confounding variables –
no competing explanations
• Method adds a “covariate” to the model:
– That variable’s effects are “partialed out”
– Remaining effect is “independent” of confound
• One important application: ANCOVA
– Test for post-test differences between groups
– Control for pre-test differences
• Multiple regression: Same idea, I/R-level DV
– Stepwise regression finds “best” predictors
This is the amount of variability in
the DV that can be accounted
“Unique Variability” for IV1
for by its association with IV1
Independent Variable #1
This circle represents
“Shared Variability”
Unexplained all
variability
of the variability
for IV1 & IV2
remaining for the
that exists in the
dependent variable
dependent variable
Independent Variable #2
“Unique Variability” for IV2
This is the amount of variability in
the
DV
that
can
be accounted
When
What’s
The
“unique
“shared
two
left
IVs
over
variability”
variability”
account
(variability
for is
is
the
in
the
the
same
the
part
part
DV
variability
of
ofnot
the
theaccounted
variability
variability
in the for
DV
in
in the
by
the
(i.e.,
any
DV
DV
when
that
that
for
association
IV2
there
predictor)
canbyis
beits
shared
accounted
is considered
variability),
forwith
by
only
“error”—random
more
they
by this
than
are IV
“multicollinear”
one
(and
DV.
(i.e.,
not unexplained)
by any
with
other
each
IV)
variability
other.
This graph can also be used to show the percentage of the
variability in the DV that can be accounted for by each IV.
IV1
If …
is 30% of …
… then
the Total
R2 is .30
DV
IV2
“Total” R2 for IV1 & IV2 together =
All
variability
for the IVs
(notDV
including
shared)
Theunique
percentage
of variability
in the
that canany
be accounted
Total
DVdefinition of R2—the coefficient of
for bySSanfor
IVthe
is the
determination.
A related concept is the idea of a “semipartial R2, which
2 for IV1 =
Semipartial
tells
you whatR%
of the variability in the DV can be accounted
for byType
eachIII
IVSS
on for
its own,
IV1 not counting any shared variability.
Total SS for the DV
IV1
If …
is 20% of …
… then the
semipartial R2
for IV1 is .20
DV
IV2
The semipartial R2 is the percentage of variability in the DV
that can be accounted for by one individual predictor,
independent of the effects of all of the other predictors.
Lying with Statistics
•
•
•
•
•
Does the sample reflect your population?
Is the IV clinically reasonable?
Were the right set of controls in place?
Is the DV the right way to measure outcome?
Significant p-value = probably replicable
– APA task force: always report exact p-value
• Large effect size = potentially important
– APA, CONSORT guideline: always report effect size
• Clinical significance still needs evaluation
What We Cover in Quant II
• Basic issues
– Missing data, data screening and cleaning
– Meta-analysis
– Factorial ANOVA and interaction effects
• Multivariate analyses
– MANOVA
– Repeated-Measures ANOVA
• Survival analysis
• Classification
– Logistic regression
– Discriminant Function Analysis
• Data simplification and modeling
– Factor Analysis
– Structural Equation Modeling
• Intensive longitudinal data (hierarchical linear models)
• Exploratory data analysis (cluster analysis, CART)
Download