Inferential Statistics II

advertisement
Inferential Statistics II
Confounding Variations
• Anticipating the direction of the change in
mean scores
• The similarity of the samples
• Random selection
• Sample size
• Multiple simultaneous t-tests
Direction
Anticipated Mean Shift
5% (.05)
In some cases it can be assumed that the difference between means
scores will represent a positive shift. When we give a lesson
we expect that the test scores will rise from pre to post test.
Direction
Anticipated Mean Shift
5% (.05)
In some cases it can be assumed that the difference between means
scores will represent a negative shift. If we do conflict management
training we would anticipate that the number of conflicts
would be reduced.
Direction
?
5% (.05)
5% (.05)
What happens if you can’t anticipate which way the mean will shift?
Will canceling inter-mural sports affect
achievement positively or negatively?
p < .05
Direction
?
2.5%
2.5%
(.025)
(.025)
The possible means which would be considered significant must be
split to both ends of the sampling distribution—a two-tailed test
of significance. It is the researchers job to demonstrate that a
significance test should be one-tailed or two-tailed.
EZAnalyze Results Report - Paired T-Test of Pretest with Posttest
Pretest
Mean:
Std. Dev.:
N Pairs:
Mean
Difference:
SE of Diff.:
Eta Squared:
T-Score:
P:
Posttest
74.611
82.611
13.349
11.850
36
-8.000
2.936
.171
2.724
.010
EZAnalyze always reports two-tailed results
To compute one-tailed results divide p value in half.
p < .05
5% (.05)
Direction
?
5% (.05)
2.5%
2.5%
(.025)
(.025)
Confounding Variations
• Anticipating the direction of the change in
mean scores
• The similarity of the samples
• Random selection
• Sample size
• Multiple simultaneous t-tests
Using EZAnalyze for t -Tests
Similarity of Samples
• Paired—Significance is easier to demonstrate if the
two samples include exactly the same individuals. The
random error based on the respondents being different
is gone.
• Independent Samples—Significance is more difficult
to demonstrate if the two groups are dissimilar.
Random error that appears because the respondents
are different has to be accounted for.
Confounding Variations
• Anticipating the direction of the change in
mean scores
• The similarity of the samples
• Random selection
• Sample size
• Multiple simultaneous t-tests
Random Selection
• With random sampling all members of the population to which you
wish to generalize have an equal chance of being in the sample.
• Scientific studies use true random sampling which is also called
probability sampling. (simple and stratified)
• When all of the members of a population do not have an equal chance
of being in the sample it is called nonprobablity sampling. (samples of
convenience)
• If your sample is random you have to carefully explain how you made
it that way. (methods section)
• If the sample isn’t random then you have to work hard at showing that
your sample is not potentially dissimilar from the population.
(methods section)
Random Error
• Random normal variation in groups.
• Outside of the researchers control.
• Inferential statistics deals with random error
really well.
• That is why groups should be formed
randomly.
• We will figure out how to deal with nonrandom error when we talk about validity.
Dealing with Non-Random Samples
• Carefully explain how the sample was formed in the
methods section.
• Carefully describe important elements of the context
of the study that support the idea (or not) that the
sample is like the population.
• Carefully explain how the analysis of data will be
done.
• List the possible effect of sampling procedures in the
limitations section of the conclusions.
Confounding Variations
• Anticipating the direction of the change in
mean scores
• The similarity of the samples
• Random selection
• Sample size
• Multiple simultaneous t-tests
Sample Sizes
n=1000
n=100
Population Distribution
n=30
As the sample size increases the
Sampling Distribution of the Mean gets narrower.
The standard error gets numerically smaller.
Effect Size
(Practical Significance)
• With large samples it is possible that significant
differences will appear from very small mean differences.
• When statistical significance appears, practical
significance can be reported by showing the mean
differences in units of standard deviation—not standard
error (remember z scores).
• The simplest calculation is to determine the distance
between the two mean scores and divide by the average
standard deviation. (Cohen’s d)
• Effect sizes over .5 are considered substantial.
Effect Size—Practical Significance
How many standard deviations is the new mean from the first mean?
Effect size of .2 is weak; .5 is moderate; .8 is strong
Practical Significance
The difference of the means in units of standard deviation
T able 1
Mean Scores on Johnson P roblem Solving I nventory for Students With and Without
Conflict Resolution T raining.
Mean
SD
Pre-Test
N = 36
Post-Test
N = 36
74.61
13.35
82.61*
11.85
* = p < .01
Difference in means: 74.61 - 82.61= -8
Average standard deviation: (13.35 + 11.85)/2 = 12.6
Practical significance: -8/12.6 = -.63
Practical Significance
The difference of the means in units of standard deviation
T able 1
Mean Scores on Johnson P roblem Solving I nventory for Students With and Without
Conflict Resolution T raining.
Mean
SD
Only
report practical significance
Pre-Test
Post-Testif the mean
N = 36 are statistically significant
N = 36
differences
to begin
with.
74.61
13.35
82.61*
11.85
* = p < .01
Difference in means: 74.61 - 82.61= -8
Average standard deviation: (13.35 + 11.85)/2 = 12.6
Practical significance: -8/12.6 = -.63
Confounding Variations
• Anticipating the direction of the change in
mean scores
• The similarity of the samples
• Random selection
• Sample size
• Multiple simultaneous t-tests
Research Design and Analysis
23
23
Research Design
Groups by Treatment
(Independent Variable)
Data Gathering Events
(Dependent Variable)
Did direct instruction improve
students’ ability to recall math facts?
Independent 
Dependent
DI to 4th Grade Class
Pre-Test
Post-Test
group data
group data
t –test, paired if possible
Do students who receive DI achieve
better than those that don’t?
Independent 
Dependent
Test
DI to 4th Grade Class
group data
Non-DI to different 4th
Grade Class
group data
t –test, independent samples
Do students who receive DI for math
facts retain learning over the
summer?
Independent 
Dependent
DI to 4th Grade Class
Pre-Test
Post-Test
group data
group data
Repeated Measures
Post-Post-Test
group data
Which instructional strategy works
better for teaching math facts?
Independent 
Dependent
Test
DI
group data
Cooperative
group data
Inquiry
group data
Single Factor
Multiple Tests Simultaneously
• When multiple (more than two) groups are to be
compared on the same measure it is not
appropriate to test each pair separately. The
comparisons are not independent.
• Analysis of Variance ANOVA
• An ANOVA only tells if significant differences
exist between at least two groups. It does tell
which group pairs. A post hoc analysis is
necessary to figure out which group differences
are significant.
• Download OWM Data from the site
ANOVA in EZAnalyze
• Single factor compares different groups on a
single measure.
• Repeated measures compares a single group on
multiple uses of a single measure.
Significance of the
whole ANOVA
Post hoc of pre
postand
anddelayed
post
delayed
ANOVA Post Hoc Tests
• Use a Tukey HSD (honestly significant
difference) to compute multiple mean
differences.
• Accurate with groups of equal size.
• Conservative with unequal variance.
• Estimate by doing multiple t-tests.
Factorial ANOVA
Did direct instruction
improve students’
• You will need three columns in Excel.
ability
to recall
math facts?
• The first will
be the respondent
number.
• The second will indicate which of the four (or more) groups a
score represents. In our case this is DI Girls, DI Boys, Non-DI
Girls, and Non-DI Boys.
• The
third ANOVA—Two
column will haveindependent
the score for
each individual.
Factorial
variables
simultaneously.
• Use a singleOne
factor
ANOVA.variable
dependent
• If significant you will have 6 comparisons to examine post hoc.
Girls
Boys
DI to 4th Grade Class
group data
group data
Non-DI to different 4th
Grade Class
group data
group data
Things to remember…
• You have to figure out which t-test to use by
judging the similarity of the groups.
• Decide if your comparison should be one-tailed or
two.
• If you are comparing more than two groups
simultaneously you have to use an ANOVA not a
t-test.
• Compute effect sizes, particularly if the groups are
large.
• Be random when you can.
Exercise
• Go to the Variable Exercise sheet on the Web site.
• Identify the independent and dependent variables
for all of the studies.
• Pick one of the studies. Design a study following
the prompts on the page.
Excel Again
• Download the data set called Reading Data
• Students were asked about the amount of time
they spent each week reading online, reading
for pleasure (not online), and reading for
homework. Is there a significant difference
among those reported times?
Being Wrong
Test Group Mean
5% (.05)
• We say that occurring randomly less than 5% of the
time is really unlikely so it isn’t random. But, that
statement would be wrong 5% of the time.
• Type 1 Error: Saying it is not random when it was. (A
false positive)
Being Wrong
Test Group Mean
5% (.05)
• We say that occurring randomly more than 5% of the time
is too likely so we say chance is the best explanation. But,
sometimes real differences occur even though they look
like chance.
• Type 2 Error: Saying it is random when it was not. (A false
negative)
Reducing Being Wrong
5% (.05)
• Reduce Type 1 errors by lowering the alpha level or using
more conservative calculations.
• Reduce Type 2 errors by increasing the sample size.
• Reduce all errors by improving the study design (validity).
Download