PSY 5100/5110 Lecture 10

advertisement
Lecture 10 – Single Factor Designs
Factor
New name for nominal/categorical independent variable
In ANOVA literature, IV's are called Factors.
Values of factor are called Levels of Factors
So, a Factor is a nominal (aka categorical) independent variable.
One Factor design: Research involving only one nominal IV, i.e., one factor
Three general types of design
1. Between subjects, no matching
Different groups of participants. No attempt to match
2. Between-subjects, participants matched.
With two groups, fairly easy. With more than two groups, gets harder.
Matching variable must be correlated with dv.
3. Within-subjects design, same people serve at all levels of the factor.
These are sometimes called repeated measures designs.
This should seem familiar, because it’s the same trichotomy we encountered in the Comparing Two Groups
lecture.
Single Factor Designs - 1
2/8/2016
The Various Tests Comparing K Research Conditions
by Design and Dependent Variable Characteristics
Design
Independent
Groups /
Between Subjects
Design
Matched Participants
or
Within-subjects /
Repeated Measures
Dependent Variable Characteristics
Interval / Ratio
Ordinal
Categorical
Dependent Variable
Dependent Variable
Dependent
Variable
US: One way Between
Subjects Analysis of
Crosstabulation
Variance
Kruskal-Wallis
with
Chi-square Test
Skewed: Kruskal-Wallis
Repeated Measures
ANOVA
Friedman ANOVA
Advanced
analyses
If this looks familiar, it should. It’s the same table presented in Lecture 9 on Two group comparisons, except
that it now covers comparisons of two or more groups.
Single Factor Designs - 2
2/8/2016
One-Way Between-subjects Analysis of Variance
Comparing the means of 3 or more groups.
Suppose there are three groups – Group A, Group B, and Group C.
Why not just perform multiple t-tests.
t-test comparing Mean of Group A with Mean of Group B
t-test comparing mean of Group A with Mean of Group C
t-test comparing mean of Group B with Mean of Group C
The above 3 t-tests exhaust the possible comparisons between 3 groups.
Problem with the above method: It’s very difficult to compute the correct p-value for the tests, which makes it
difficult to use in hypothesis testing.
What is needed is a single omnibus test.
Such a test was provided by Sir R. A. Fisher. It’s based on the following idea
Single Factor Designs - 3
2/8/2016
Consider 3 populations whose means are all equal:
Now consider samples from each of those populations
o
o
o
o
o
o o
o
o
o
o
o
o
o o o
o
o
o
o
o
Finally, consider the means of the three samples . . .
Sample means not very variable, so N* S2X-bar about
o oo
equals σ2.
Now think about the variability in the above dots.
Within-group variability
There is variability of the individual scores in each sample.
The variance of scores within each sample would be an estimate of the population variance, σ2.
So the average of the 3 sample variances, (S12 +S22+S32)/3, would be a really good estimate of σ2.
Between-group variability
But there is more variability in the above situation. There is variability of the sample means, S2X-bar.
The variability of the means S2X-bar would be equal to σ2/N from our study of the standard error of the mean.
Equivalently, N* S2X-bar would be approximately equal to σ2.
That is, N times the variance of the sample means would be about equal to the population variance.
So, we have two estimates of the population variance in the above scenario.
1) The estimate based on the average of the variances of the three samples.
2) The estimate based on the variance of the sample means.
When the samples are from populations with equal means, the two estimates will be about equal.
Single Factor Designs - 4
2/8/2016
Now consider 3 populations whose means are NOT equal.
Now consider samples from each of these populations
o
o
o o
o
o
o
o
o
o
o
o
o o o
o
o
o
o
o
o
Now consider the means of those samples . . .
O
o
o
Sample means ARE very variable, so N* S2X-bar is
> σ2.
Note that the variability of the individual scores within each sample is about the same as above.
BUT, note that when the population means are not equal, the means of samples from those populations are
quite variable, much more so than they were when the population means were equal.
This means that in this case S2X-bar would be LARGER THAN σ2/N.
Equivalently, it means that N* S2X-bar would be LARGER THAN σ2.
This suggests that N* S2X-bar is an indicator of whether or not the population means are equal or not.
If the means were equal, N* S2X-bar would be equal to σ2.
But if the means were not equal, N* S2X-bar would be larger than σ2
Since the variability of individual scores within samples was the same in both situations,
Fisher proposed the ratio: N* S2X-bar / Mean of the individual sample variances as a test statistic.
N*S2X-bar
N times variance of sample means
F = --------------------------- = ---------------------------------Mean of sample variances
Mean of sample variances
If the population means are equal, F will be about equal to 1.
If the population means are not equal, F will be larger than 1.
Fisher computed the sampling distribution of F and proposed it as an omnibus test of the equality of population
means. (He did not name the statistic F after himself.)
Single Factor Designs - 5
2/8/2016
Specifics of the One-Way Between-subjects Analysis of Variance
The research design employs two or more independent conditions (no pairing).
The groups are identified by different levels of a single factor.
The dependent variable is interval / ratio scaled.
The distribution of scores within groups is unimodal and symmetric.
Variances of the populations being compared are equal.
Hypotheses:
H0: All population means are equal
H1: At least one inequality is present.
Test Statistic:
Estimate of population variance based on differences between sample means
F=
Estimate of population variance based on differences between individual scores within samples
Likely values if null is true
Values around 1
Likely values if null is false.
Values larger than 1
Single Factor Designs - 6
2/8/2016
Example problem
Michelle Hinton Watson, a 95 graduate of the program interviewed employees and former employees of a local
company, Company X (cxfile.sav in In-class Datasets). A set of 7 questions assessing overall job satisfaction
was given to all respondents. She interviewed 107 persons who had left the company prior to her
contacting them. She also interviewed 49 persons who left the company within a year after she contacted
them, and 51 persons who were still working for the company a year after the initial contact. The interest
here is on whether the three groups are distinguished by their average job satisfaction – persons who had
previously left the company, persons who left after the initial interview or persons who stayed with the
company after the initial interview.
Specifying the analysis
Analyze -> General Linear Model -> Univariate
Click on the
Plot… button
to specify a
graph of
means.
Click on the
Post Hoc…
button to
specifyPost
Hoc
comparisons
of means
Click on the
Options…
button to
specify
descriptive
statistics and
estimates of
effect size.
Specifying a plot of means –
Single Factor Designs - 7
2/8/2016
Specifying Post Hocs
If the overall F statistic is significant, post hoc comparisons are often used to determine exactly which pairs of
means are significantly different.
Post Hoc tests vary on a dimension of liberalness vs conservativeness.
Liberal Test
Tends to find differences, some of them Type I errors
Most Powerful – able to find small differences
For Affordable Care Act
Conservative Test
Tends to not find differences, even those that exist
Least powerful, unable to see small differences
Supports Big Business
The LSD test is the most liberal.
The Scheffé test is the most conservative.
The Tukey’s-b test is a compromise between the above two extremes.
LSD --------------------------------------------------- Tukey-b------------------------------------------------------ Scheffé
Strategy:
If a conservative test rejects the null, most likely a difference.
If a liberal test fails to reject the null, most likely not a difference.
Single Factor Designs - 8
2/8/2016
The Options: Specifying Printing of Effect Size and Observed Power:
Single Factor Designs - 9
2/8/2016
The output
The p-value of .000 for Levene’s test of equality of error variance means that we should be particularly cautious
when interpreting the comparisons of means that follow.
We should inspect distributions for each group. We should also consider a nonparametric test of equality of
location, which I will do.
Single Factor Designs - 10
2/8/2016
For this semester, ignore the “Corrected Model” and the “Intercept” lines.
Partial Eta squared:
This is the effect size for one way ANOVA. See effect sizes for ANOVA in Power lecture for a
characterization. Recall: Eta2 = .01 for small; Eta2=.059 for medium; Eta2 = .138 for large.
Eta2 = .262 means we have a SuperSized with Fries effect size.
Observed Power
Observed power is power if the population means were as different as the sample means.
The value, 1.000, means that if the population means were as different as the sample means, and many
independent tests of the null hypothesis of equality of population means were run, the F would be significant in
about 100% of those tests.
Start here on 11/17/15.
Single Factor Designs - 11
2/8/2016
vs.
Homogeneous Subsets
If two means are in the same
column, they are not significantly
different.
If two different means are only in
different columns, they ARE
significantly different.
Single Factor Designs - 12
2/8/2016
Profile Plots
A picture is worth 1000 words.
Single Factor Designs - 13
2/8/2016
Kruskal-Wallis One way Analysis of Variance by Ranks
The research design employs two or more independent conditions (no pairing).
The groups are distinguished by different levels of the independent variable.
The dependent variable is ordinal or better.
This test is also used when the DV is interval/ratio scaled but the distributions within groups are skewed or
have unequal variances between groups.
Hypotheses:
H0: All population locations are equal
H1: At least one inequality is present.
From Howell, D. (1997). Statistical Methods for Psychology. 4th Ed. p. 658. "It tests the hypothesis that all
samples were drawn from identical populations and is particularly sensitive to differences in central tendency."
Test Statistic:
Kruskal-Wallis H statistic. The probability distribution of the H statistic when the null is true is the Chi-square
distribution with degrees of freedom equal to the number of groups being compared minus 1.
Example problem
(Same problem as above).
The interest here is on whether the three groups are distinguished by their overall job satisfaction – persons who
had previously left the company, persons who left after the initial interview or persons who stayed with the
company after the initial interview.
This test is appropriate since the variances were not homogenous in the above analysis.
Single Factor Designs - 14
2/8/2016
Specifying the analysis
Analyze -> Nonparametric tests -> Legacy Dialogs -> K Independent Samples
Put the name(s) of the
dependent variable(s)
in this box.
Click on this button to
invoke the dialog box
below. Put the
minimum group no.
and maximum group
no. in the two boxes.
The Results
Kruskal-Wallis Test
Ranks
finaldest
ovsat
N
Mean Rank
.00 Left Co. before Q given
107
74.59
1.00 Left Co. after Q given
49
142.67
2.00 Stayed w. Co.
51
128.55
Total
207
Test Statisticsa,b
ovsat
Chi-Square
54.996
df
Asymp. Sig.
Ranks are from smallest
to largest, so group 0
appears to have the
smallest scores.
2
This is the probability of a chi-square value
as large as the obtained value of 54.996 if
the null hypothesis of equal distributions
were true.
.000
a. Kruskal Wallis Test
b. Grouping Variable: finaldest
Alas, there are no post-hoc tests of which I’m aware for the Kruskal-Wallis situation. Some investigators will
follow up with Mann-Whitney U-tests, using that test as a substitute for a true post-hoc test.
Single Factor Designs - 15
2/8/2016
Chi-Square Analysis of a
Dichotomous Dependent Variable
The research design employs two or more independent conditions (no pairing).
The groups are distinguished by categories of an independent variable or factor.
The dependent variable is categorical. This test may used when the DV is interval/ratio scaled or ordinal but
you are uncomfortable with the numeric values. But you definitely should not categorize a variable that can
be analyzed as a quantitative variable. You should categorize only in emergencies. It represents the most
conservative assumption you can make about your dependent variable, that its values are only categorizable into
High and Low.
Hypotheses:
H0: Percentages in each category are equal across populations
H1: At least one inequality is present.
Test Statistic:
Two-way chi-square. If the null is true, its probability distribution is the Chi-square distribution with degrees of
freedom equal to the product of (No. of DV categories - 1) x (No. of Groups -1).
Example problem
Same problem as above.
Each OVSAT score was categorized as 0 if it was less than or equal to the median of all the OVSAT scores or
1 if it was greater than the overall median. This is called performing a median split.
The categorized variable is called SATGROUP.
frequencies variable=ovsat /sta=median.
Frequencies
Statistics
ovsat
N
Valid
Missing
207
0
3.8571
Median
recode ovsat (lowest thru 3.8571=0)(else=1) into satgroup.
frequencies variable=satgroup.
satgroup
Valid
.00
1.00
Total
Frequency
101
106
Percent
48.8
51.2
Valid Percent
48.8
51.2
207
100.0
100.0
Cumulative
Percent
48.8
100.0
Single Factor Designs - 16
2/8/2016
Specifying the analysis
Analyze -> Descriptive Statistics -> Crosstabs
Put
the dependent
variable in the
Row(s) box.
Put the independent
variable in the
Column(s) box.
Click
on the "Cells"
button to invoke this
dialog box.
Check "Column"
percentages.
Single Factor Designs - 17
2/8/2016
Click on the “Statistics” button and check
the Chi-square box.
The Results
Crosstabs
All three tests – analysis of variance, Kruskal-Wallis, and chi-square resulted in the same conclusion,
suggesting that there are significant differences between the satisfaction scores of the three groups. It appears
that members of group 0 – those that had left prior to the interview – were generally least satisfied.
Single Factor Designs - 18
2/8/2016
One way Repeated Measures ANOVA
In Mike Clark’s thesis, three versions of the Big Five questionnaire was given to participants under three
instructional conditions . . .
1) Honest: Respond honestly
2) Dollar: Respond honestly, but participants who score highest will be entered into a drawing
3) Instructed: Respond to maximize your chances of obtaining a customer service job.
These three conditions are called the Honest, Dollar, and Instructed - H, D, and I - conditions respectively.
The question here concerns the mean score on the Conscientiousness scale across the three conditions. If
the participants were not paying attention to the instructions, then we’d expect the means to be equal. But if
participants faked in the second two conditions, we’d expect an increase in Conscientiousness scores across the
three conditions. The data are in G:\MdbR\Clark\ClarkDataFiles\ClarkAndNewDataCombined070710.sav
The data should be in 3 columns – an H column, a D column, and an I column.
Analysis
Menu sequence: Analyze -> General Linear Model -> Repeated Measures
Enter a name for the Repeated
Measures factor here
Enter the number of levels of the factor.
Click the [Add] button.
Highlight the name of
one of the variables to be
included in the analysis
and then click on the [>]
button.
Single Factor Designs - 19
2/8/2016
Click on the [Plots] button
in the main dialog box and
put the name of the
repeated measures factor
as the Horizontal Axis of
the plot.
Click on the [Options]
button in the main
dialog box and check the
three boxes shown
below.
Single Factor Designs - 20
2/8/2016
The Output
General Linear Model
[DataSet1] G:\MdbR\Clark\ClarkDataFiles\ClarkAndNewDataCombined070710.sav
Within-S ubj e cts Factor s
Me asure : ME ASURE_1
con dit
1
De pend ent
Va riable
hc
2
dc
3
ic
De scriptiv e Statis tics
Me an
Std . Deviatio n
4.4 029
.92 630
hc
N
24 9
dc
4.7 979
1.0 5333
24 9
ic
5.4 779
.96 747
24 9
The GLM procedure first prints Multivariate Tests of the hypothesis of no difference between means. These
tests are robust with respect to violations of the various assumptions of the analysis, although less powerful
than the tests below, if those tests meet the assumptions.
Multiv a riate Tests c
Eff ect
con dit
Va lue
.47 1
Pil lai's T race
F
Hyp othe sis df
110 .015 b
2.0 00
Error df
247 .000
.00 0
Pa rtial E ta
Sq uared
.47 1
No ncent .
Pa rame ter
220 .031
Sig .
a
Ob serve d Power
1.0 00
Wil ks' La mbd a
.52 9
110 .015 b
2.0 00
247 .000
.00 0
.47 1
220 .031
1.0 00
Ho tellin g's Trace
.89 1
110 .015 b
2.0 00
247 .000
.00 0
.47 1
220 .031
1.0 00
Ro y's La rgest Root
.89 1
110 .015 b
2.0 00
247 .000
.00 0
.47 1
220 .031
1.0 00
a. Co mput ed using a lpha = .05
b. Exa ct sta tistic
c.
De sign: Intercept
Wit hin S ubje cts De sign: cond it
Mauchly’s test should be nonsignficant. If it is significant, as it is below, then the most powerful test, labeled
“Sphericity Assumed” below should not be reported.
Ma uchly's Te st of Sphe ricity b
Me asure : ME ASURE_1
Ep silon
Wit hin S ubje cts Ef fect
con dit
Ma uchly's W
.90 9
Ap prox.
Ch i-Squ are
23. 571
df
Sig .
2
.00 0
Gre enho useGe isser
.91 7
a
Hu ynh-Feldt
.92 3
Lower-b ound
.50 0
Te sts the null hypo thesi s that the e rror covari ance matrix of t he orthono rmal ized transf orme d dep ende nt va riable s is p ropo rtiona l to a n
ide ntity matri x.
a. Ma y be used to ad just th e de grees of fre edom for the a verag ed te sts of signi fican ce. Correct ed te sts are disp laye d in t he
Te sts of Withi n-Sub jects Effe cts ta ble.
b.
De sign: Intercept
Wit hin S ubje cts De sign: cond it
Single Factor Designs - 21
2/8/2016
Since Mauchly’s test was significant, only the last 3 tests below should be used. It happens, though, that for
these data, all tests give the same result, so in this particular case, it doesn’t make a difference.
Tes ts of Within-Subj ec ts Effects
Me asure : ME ASURE_1
Sp hericity Assume d
Type III Sum
of Squa res
14 7.241
Gre enho use-Geisser
14 7.241
1.8 33
80 .321
12 1.599
.00 0
.32 9
22 2.909
1.0 00
Hu ynh-Feldt
14 7.241
1.8 46
79 .756
12 1.599
.00 0
.32 9
22 4.487
1.0 00
Lo wer-b ound
14 7.241
1.0 00
14 7.241
12 1.599
.00 0
.32 9
12 1.599
1.0 00
Error(co ndit) Sp hericity Assume d
30 0.297
49 6
.60 5
Gre enho use-Geisser
30 0.297
45 4.622
.66 1
Hu ynh-Feldt
30 0.297
45 7.840
.65 6
Lo wer-b ound
30 0.297
24 8.000
1.2 11
So urce
con dit
df
2
Me an S quare
73 .620
F
12 1.599
.00 0
Pa rtial E ta
Sq uared
.32 9
No ncen t.
Pa rame ter
24 3.197
Sig .
a
Ob serve d Po wer
1.0 00
a. Co mput ed using a lpha = .05
Tes ts of Within-Subj ec ts Contras ts
Me asure : ME ASURE_1
So urce
con dit
con dit
Lin ear
Type III Sum
of Squa res
14 3.866
1
14 3.866
F
21 7.487
3.3 74
1
3.3 74
6.1 42
16 4.050
24 8
.66 1
13 6.246
24 8
.54 9
Qu adrat ic
Error(co ndit) Lin ear
Qu adrat ic
IgnoreMe
for
class
an this
S quare
df
.00 0
Pa rtial E ta
Sq uared
.46 7
No ncen t.
Pa rame ter
21 7.487
.01 4
.02 4
6.1 42
Sig .
a
Ob serve d Po wer
1.0 00
.69 5
a. Co mput ed using a lpha = .05
Tes ts of Betw een-Subj ects Effec ts
Me asure : ME ASURE_1
Tra nsformed Varia ble: A vera ge
So urce
Inte rcep t
Error
Typ e III Sum
of S qua res
178 83.5 68
df
419 .784
Ignore for this situation.
1
Me an S quare
178 83.5 68
248
1.6 93
F
105 65.2 48
Sig .
.00 0
Pa rtial E ta
Sq uared
.97 7
No ncent .
a
Pa rame ter Ob serve d Power
105 65.2 48
1.0 00
a. Co mput ed using a lpha = .05
Profile Plots
Again, worth 1000 words.
Mean Conscientiousness scores increased
significantly from 1 (Honest) to 2
(Dollar) to 3 (Instructed) conditions. The
participants responded to the instructions
in the expected fashion.
Single Factor Designs - 22
2/8/2016
Download