The use & abuse of tests

advertisement
The use & abuse of tests
• Statistical significance ≠ practical significance
• Significance ≠ proof of effect (confounds)
• Lack of significance ≠ lack of effect
Factors that affect a hypothesis test
• the actual obtained difference X  
• the magnitude of the sample variance (s2)
• the sample size (n)
• the significance level (alpha)
• whether the test is one-tail or two-tail
Why might a hypothesis test fail to find
a real result?
Two types of error
We either accept or reject the H0.
Either way, we could be wrong:
Two types of error
We either accept or reject the H0.
Either way, we could be wrong:
False positive rate
“sensitivity” or “power”

False negative rate

Error probabilities
When the null hypothesis is true:
P(Type I Error) = alpha
When the alternative hypothesis is true:
P(Type II Error) = beta
Two types of error
False positive rate
“sensitivity” or “power”

False negative rate

Type I error
The “false positive rate”
• We decide there is an effect when none exists; we reject the null
wrongly
• By choosing an alpha as our criterion, we are deciding the amount
of Type I error we are willing to live with.
• P-value is the likelihood that we would commit a Type I error in
rejecting the null
Type II error
The “false negative” rate
• We decide there is nothing going on, and we miss the boat – the
effect was really there and we didn’t catch it.
• Cannot be directly set but fluctuates with sample size, sample
variability, effect size, and alpha
• Could be due to high variability… or if measure is insensitive or
effect is small
Power
The “sensitivity” of the test
• The likelihood of picking up on an effect, given that it is really
there.
• Related to Type II error: power = 1- 
A visual example
(We are only going to work through a one-tailed example.)
We are going to collect a sample of 10 highly successful
leaders & innovators and measure their scores on scale that
measures tendencies toward manic states.
We hypothesize that this group has more tendency to mania
than does the general population (   50 and   5 )
Step 1: Decide on alpha and identify your decision rule (Zcrit)
null distribution
Rejection region

µ0 = 50
Z=0
Zcrit = 1.64
Step 2: State your decision rule in units of sample mean (Xcrit )
null distribution
Rejection region

µ0 = 50
Xcrit = 52.61
Z=0
Zcrit = 1.64
Step 3: Identify µA, the suspected true population mean for your
sample
Acceptance region
µ0 = 50
alternative distribution
Rejection region
Rejection region
Xcrit = 52.61
µA = 55
Step 4: How likely is it that this alternative distribution would
produce a mean in the rejection region?
power
alternative distribution
Rejection region
beta
µ0 = 50
Xcrit = 52.61
µA = 55
Z = -1.51
Z=0
Power & Error
beta
µ0
alpha
Xcrit
µA
Power is a function of
 The
chosen alpha level ()
 The
true difference between 0 and A
 The
size of the sample (n)
 The
standard deviation (s or )
standard error
Changing alpha
beta
µ0
alpha
Xcrit
µA
Changing alpha
beta
µ0
alpha
Xcrit
µA
Changing alpha
beta
µ0
alpha
Xcrit
µA
Changing alpha
beta
µ0
alpha
Xcrit
µA
Changing alpha
beta
µ0
alpha
Xcrit
µA
• Raising alpha gives you less Type II error (more power) but
more Type I error. A trade-off.
Changing distance between 0 and A
beta
µ0
alpha
Xcrit
µA
Changing distance between 0 and A
beta
µ0
alpha
Xcrit
µA
Changing distance between 0 and A
beta
µ0
alpha
Xcrit
µA
Changing distance between 0 and A
beta
µ0
alpha
Xcrit µA
Changing distance between 0 and A
beta
µ0
alpha
Xcrit
µA
• Increasing distance between 0 and A lowers Type II error
(improves power) without changing Type I error
Changing standard error
beta
µ0
alpha
Xcrit
µA
Changing standard error
beta
µ0
alpha
Xcrit
µA
Changing standard error
beta
µ0
alpha
Xcrit
µA
Changing standard error
beta
µ0
alpha
Xcrit
µA
Changing standard error
beta
µ0
alpha
Xcrit
µA
• Decreasing standard error simultaneously reduces both kinds
of error and improves power.
To increase power

Try to make  really different from the null-hypothesis value (if
possible)

Loosen your alpha criterion (from .05 to .10, for example)

Reduce the standard error (increase the size of the sample, or
reduce variability)
For a given level of alpha and a given sample size, power is
directly related to effect size. See Cohen’s power tables,
described in your text
Download