Theories - the Department of Psychology at Illinois State University

advertisement
Statistics
Psych 231: Research
Methods in Psychology

Tomorrow’s office hours Cancelled

Readings:
Required: chapter 7
 Recommended: chapter 14 (covers some
specific hypothesis tests and reviews some
of the SPSS how-to’s)

Announcements




Step 1: State your hypotheses
Step 2: Set your decision criteria
Step 3: Collect your data from your sample(s)
Step 4: Compute your test statistics



Descriptive statistics (means, standard deviations, etc.)
Inferential statistics (t-tests, ANOVAs, etc.)
Step 5: Make a decision about your null hypothesis


Reject H0
Fail to reject H0

“statistically significant differences”
“not statistically significant differences”
When you “reject your null hypothesis”
• Essentially this means that the observed difference is
above what you’d expect by chance
Testing Hypotheses

Example Experiment:
 Group A - gets treatment to improve memory
 Group B - gets no treatment (control)
 After treatment period test both groups for memory
 Results:
 Group A’s average memory score is 80%
 Group B’s is 76%
 Is the 4% difference a “real” difference (statistically
significant) or is it just sampling error?
Two sample
distributions
Experimenter’s
conclusions
XB
XA
76%
80%
From last time
About populations
H0: μA = μB
H0: there is no difference
between Grp A and Grp B
Reject
H0
Fail to
Reject
H0
Real world (‘truth’)
H0 is
correct
H0 is
wrong
Type I
error
a
Type II
error
b

The generic test statistic - is a ratio of sources of variability
Computed
Observed difference
=
test statistic
Difference from chance

80% - 76%
??
“Statistically significant differences”

When you “reject your null hypothesis”
• Essentially this means that the observed difference is above what
you’d expect by chance
• “Chance” is determined by estimating how much sampling
error there is
• Factors affecting “chance”
• Sample size
• Population variability
Testing for statistical significance
Population mean
Population mean
Population
Population
Distribution
Distribution
Sample mean
x
Sampling error
n=1
Population
x
(Pop mean - sample mean)
Sample
mean
x
n = 10
Sampling error
(Pop mean - sample mean)
Population mean
Distribution
x
n=2
x
x
x
x x
x
x xx
 Generally, as the sample
size (n) increases, the
“chance” (sampling error)
decreases
Sampling error
(Pop mean - sample mean)
Sample size on Sampling error

Typically the narrower the population distribution, the
narrower the range of possible samples, and the smaller the
“chance” (sampling error)
Small population variability
Large population variability
Pop. Var. on Sampling error

These two factors combine to impact the distribution of
sample means.

The distribution of sample means is a distribution of all possible
sample means of a particular sample size that can be drawn
from the population
Population
Distribution of
sample means
Samples
of size = n
XA XB XC XD
“chance”
Sampling error
Avg. Sampling
error

Tests the question:

Are there differences between groups
due to a treatment?
Real world (‘truth’)
H0 is
H0 is
correct
wrong
Reject
H0
Experimenter’s
conclusions
Fail to
Reject
H0
Two possibilities in the “real world”
H0 is true (no treatment effect)
One
population
Two sample
distributions
XB
XA
76%
80%
“Generic” statistical test
Type I
error
Type II
error

Tests the question:

Real world (‘truth’)
H0 is
H0 is
correct
wrong
Are there differences between groups
due to a treatment?
Reject
H0
Experimenter’s
conclusions
Fail to
Reject
H0
Two possibilities in the “real world”
H0 is true (no treatment effect)
Type I
error
a
Type II
error
b
H0 is false (is a treatment effect)
Two
populations:
Two
populations:
No treatment
Treatment
XB
XA
XB
XA
76%
80%
76%
80%
People who get the treatment change,
they form a new population
(the “treatment population)
“Generic” statistical test
XA




XB
ER: Random sampling error
ID: Individual differences (if between subjects factor)
TR: The effect of a treatment
Why might the samples be different?
(What is the source of the variability between groups)?
“Generic” statistical test
XA




XB
ER: Random sampling error
ID: Individual differences (if between subjects factor)
TR: The effect of a treatment
The generic test statistic - is a ratio of sources of
variability
Computed
Observed difference
TR + ID + ER
=
=
test statistic
Difference from chance
ID + ER
“Generic” statistical test

The generic test statistic distribution

To reject the H0, you want a computed test statistics that is large
• reflecting a large Treatment Effect (TR)

What’s large enough? The alpha level gives us the decision criterion
Distribution of
the test statistic
TR + ID + ER
ID + ER
Test
statistic
Distribution of
sample means
α-level determines where
these boundaries go
“Generic” statistical test

The generic test statistic distribution

To reject the H0, you want a computed test statistics that is large
• reflecting a large Treatment Effect (TR)

What’s large enough? The alpha level gives us the decision criterion
Distribution of
the test statistic
Reject H0
Fail to reject H0
“Generic” statistical test

The generic test statistic distribution

To reject the H0, you want a computed test statistics that is large
• reflecting a large Treatment Effect (TR)

What’s large enough? The alpha level gives us the decision criterion
Distribution of
the test statistic
Reject H0
“One tailed test”: sometimes you
know to expect a particular
difference (e.g., “improve memory
performance”)
Fail to reject H0
“Generic” statistical test

Things that affect the computed test statistic

Size of the treatment effect
• The bigger the effect, the bigger the computed test
statistic

Difference expected by chance (sample error)
• Sample size
• Variability in the population
“Generic” statistical test

“A statistically significant difference” means:



the researcher is concluding that there is a difference above
and beyond chance
with the probability of making a type I error at 5% (assuming an
alpha level = 0.05)
Note “statistical significance” is not the same thing as
theoretical significance.


Only means that there is a statistical difference
Doesn’t mean that it is an important difference
Significance

Failing to reject the null hypothesis


Generally, not interested in “accepting the null hypothesis”
(remember we can’t prove things only disprove them)
Usually check to see if you made a Type II error (failed to
detect a difference that is really there)
• Check the statistical power of your test
• Sample size is too small
• Effects that you’re looking for are really small
• Check your controls, maybe too much variability
Non-Significance

1 factor with two groups

T-tests
• Between groups: 2-independent samples
• Within groups: Repeated measures samples (matched, related)

1 factor with more than two groups


Analysis of Variance (ANOVA) (either between groups or
repeated measures)
Multi-factorial

Factorial ANOVA
Some inferential statistical tests

Design


2 separate experimental conditions
Degrees of freedom
• Based on the size of the sample and the kind of t-test

Formula:
Observed difference
T=
X1 - X2
Diff by chance
Computation differs for
between and within t-tests
T-test
Based on sample error

Reporting your results





The observed difference between conditions
Kind of t-test
Computed T-statistic
Degrees of freedom for the test
The “p-value” of the test

“The mean of the treatment group was 12 points higher than the
control group. An independent samples t-test yielded a significant
difference, t(24) = 5.67, p < 0.05.”

“The mean score of the post-test was 12 points higher than the
pre-test. A repeated measures t-test demonstrated that this
difference was significant significant, t(12) = 5.67, p < 0.05.”
T-test

Designs

XA
XB
XC
More than two groups
• 1 Factor ANOVA, Factorial ANOVA
• Both Within and Between Groups Factors


Test statistic is an F-ratio
Degrees of freedom


Several to keep track of
The number of them depends on the design
Analysis of Variance
XA

XB
XC
More than two groups


Now we can’t just compute a simple difference score since
there are more than one difference
So we use variance instead of simply the difference
• Variance is essentially an average difference
Observed variance
F-ratio =
Variance from chance
Analysis of Variance
XA

XB
XC
1 Factor, with more than two levels

Now we can’t just compute a simple difference score since
there are more than one difference
• A - B, B - C, & A - C
1 factor ANOVA
Null hypothesis:
XA
XB
XC
H0: all the groups are equal
XA = XB = XC
Alternative hypotheses
HA: not all the groups are equal
XA ≠ XB ≠ XC
XA = XB ≠ XC
1 factor ANOVA
The ANOVA
tests this one!!
Do further tests to
pick between these
XA ≠ XB = XC
XA = XC ≠ XB
Planned contrasts and post-hoc tests:
- Further tests used to rule out the different
Alternative hypotheses
XA ≠ XB ≠ XC
Test 1: A ≠ B
Test 2: A ≠ C
Test 3: B = C
XA = XB ≠ XC
XA ≠ XB = XC
XA = XC ≠ XB
1 factor ANOVA

Reporting your results
The observed differences
 Kind of test
 Computed F-ratio
 Degrees of freedom for the test
 The “p-value” of the test
 Any post-hoc or planned comparison results
“The mean score of Group A was 12, Group B was 25, and
Group C was 27. A 1-way ANOVA was conducted and the
results yielded a significant difference, F(2,25) = 5.67, p < 0.05.
Post hoc tests revealed that the differences between groups A
and B and A and C were statistically reliable (respectively t(1) =
5.67, p < 0.05 & t(1) = 6.02, p <0.05). Groups B and C did not
differ significantly from one another”


1 factor ANOVA


We covered much of this in our experimental design lecture
More than one factor



Factors may be within or between
Overall design may be entirely within, entirely between, or mixed
Many F-ratios may be computed


An F-ratio is computed to test the main effect of each factor
An F-ratio is computed to test each of the potential interactions
between the factors
Factorial ANOVAs

Reporting your results

The observed differences
• Because there may be a lot of these, may present them in a table
instead of directly in the text

Kind of design
• e.g. “2 x 2 completely between factorial design”

Computed F-ratios
• May see separate paragraphs for each factor, and for interactions

Degrees of freedom for the test
• Each F-ratio will have its own set of df’s

The “p-value” of the test
• May want to just say “all tests were tested with an alpha level of
0.05”

Any post-hoc or planned comparison results
• Typically only the theoretically interesting comparisons are
presented
Factorial ANOVAs
Download