PSY 5130 Lecture 9 - Qualitative Independent Variables

advertisement
Lecture 9 Qualitative Independent Variables
Comparing means using Regression
(I don’t need no stinkin’ ANOVA)
In linear regression analysis, the dependent variable should always be a continuous variable. The same
restriction does not apply to the independent variables, however.
This lecture shows how qualitative variables – variables whose values represent different groups of people, not
different quantities, are incorporated into regression analyses, allowing comparison of means of the groups.
We’ll discover that IVs with only 2 values can be treated as if they are continuous IVs in any regression.
But IVs with 3 or more values must be treated specially. Once that’s done, they also can be included in
regression analyses.
Regression with a single two-valued (dichotomous) predictor
Any two-valued independent variable can be included in a simple or multiple regression analysis. The
regression can be used to compare the means of the two groups yielding the same conclusion as the equalvariances independent groups t-test.
Suppose the performance of two groups trained using different methods is being compared. Group 1 was
trained using a Lecture only method. Group 2 was trained using a Lecture+CAI method. Performance was
measured using scores on a final exam covering the material being taught. So, the dependent variable is PERF
– performance in the final exam. The independent variable is TP – Training program: Lecture only vs.
Lecture+CAI.
The data follow
ID
TP PERF
ID
1
2
3
4
5
6
7
8
9
10
11
12
1
1
1
1
1
1
1
1
1
1
1
1
37
69
64
43
37
54
52
40
61
48
44
65
TP PERF
13
14
15
16
17
18
19
20
21
22
23
24
1
1
1
1
1
1
1
1
1
1
1
1
57
50
58
65
48
34
44
58
45
35
45
52
ID
25
26
27
28
29
30
31
32
33
34
35
36
37
TP PERF
1
2
2
2
2
2
2
2
2
2
2
2
2
ID
37
53
62
56
61
63
34
56
54
60
59
67
42
38
39
40
41
42
43
44
45
46
47
48
49
50
TP PERF
2
2
2
2
2
2
2
2
2
2
2
2
2
56
61
62
72
46
64
60
58
73
57
53
43
61
How should the groups be coded?
In the example data, Training program (TP) was coded as 1 for the Lecture method and 2 for the L+CAI
method. But any two values could have been used. For example 0 and 1 could have been used. Or, 3 and 47
could have been used. When the IV is a dichotomy, the specific values used to represent the two groups formed
by the two values of the IV are completely arbitrary.
When one of the groups has whatever the other has plus something else, my practice is to give it the larger of
the two values, often 0 for the group with less and 1 for the group with more.
When one is a control and the other is an experimental group, my practice is to use 0 for the control and 1 for
the experimental.
Qualitative Independent Variables - 1
2/5/2016
Visualizing regressions when the independent variable is a dichotomy.
When an IV is a dichotomy, the scatterplot takes on an unusual appearance. It will be two columns of points,
one over one of the values of the IV and the other over the other value. It can be interpreted in the way all
scatterplots are interpreted, although if the values of the IV are arbitrary, the sign of the relationship may not be
a meaningful characteristic. For example, in the following scatterplot, it would not make any sense to say that
performance was positively related to training program. It would make sense, however, to say that performance
was higher in the Lecture+CAI program than in the Lecture-only program.
In the graph of the example data, the best fitting straight line has been drawn through the scatterplot. When the
independent variable is a dichotomy, the line will always go through the mean value of the dependent variable
at each of the two independent variable values.
We’ll notice that the regression coefficient, the B value, for Training Program is equal to the difference between
the means of performance in the two programs. This will always be the case if the values used to code the two
groups differ by one (1 vs. 2 in this example).
80
Mean
Perf for
Method 2
Mean
Perf for
Method 1
70
60
50
40
PE
RF
30
L Only
TP
L+CAI
SPSS Output and its interpretation.
Regression
Model S umm ary
Mo del
1
Ad justed R Std . Erro r of
R
R S quare
Sq uare
the Estim ate
.37 4 a
.14 0
.12 2
9.6 7
a. Pre dicto rs: (Consta nt), T P
R-square is the proportion of variance
in Y related to differences between the
groups. Some say that R-square is the
proportion of variance related to group
membership. So in this example, 14%
of variance of Y is related to group
membership.
Qualitative Independent Variables - 2
2/5/2016
ANOVAb
Mo del
1
Re gressi on
Su m of
Sq uares
729 .620
df
1
Me an S quare
729 .620
93. 602
Re sidua l
449 2.88 0
48
To tal
522 2.50 0
49
F
7.7 95
Sig .
.00 7 a
a. Pre dicto rs: (Consta nt), T P
b. De pend ent V ariab le: PE RF
As was the case with simple regression
with a continuous predictor, the
information in the ANOVA summary
table is redundant with the information in
the Coefficients box below.
Coeffici ents a
Un stand ardized
Co efficie nts
Mo del
1
(Co nstan t)
TP
B
Std . Erro r
42. 040
4.3 27
7.6 40
2.7 36
Sta ndard
ize d
Co efficie
nts
Be ta
.37 4
t
9.7 16
Sig .
.00 0
2.7 92
.00 7
a. De pend ent V ariab le: PE RF
Interpretation of (Constant): This is the expected value of the dependent variable when the independent
variable = 0. If one of the groups had been coded as 0, then the y-intercept would have been the expected value
of Y in that group. In this example, neither group is coded 0, so the value of the y-intercept has no special
meaning.
Interpretation of B when IV has only two values . . .
B = Difference in group means divided by difference in X-values for the two groups.
1
If the X-values for the groups differ by 1, as they do here, then B = Difference in group means.
2
The sign of the B coefficient.
The sign of the b coefficient associated with a dichotomous variable dependent on how the groups were labeled.
In this case, the L Only group was labeled 1 and the L+CAI group was labeled 2.
If the sign of the B coefficient is positive, this means that the group with the larger IV value had a larger
mean.
If the sign of the B coefficient is negative, this means that the group with the larger IV value had a
SMALLER mean.
The fact that B is positive means that the L+CAI group mean (coded 2) was larger than the L group mean
(coded 1). If the labeling had been reversed, with L+CAI coded as 1 and L-only coded as 2, the sign of the b
coefficient would have been negative.
The t-value
The t values test the hypothesis that each coefficient equals 0. In the case of the Constant, we don't care.
In the case of the B coefficient, the t value tells us whether the B coefficient, and equivalently, the
difference in means, is significantly different from 0. The p-value of .007 suggests that the B value is
significantly different from 0.
The bottom line
This means that when the independent variable is a dichotomy, regression of the dependent variable onto a
dichotomous independent variable is a comparison of the means of the two groups.
Qualitative Independent Variables - 3
2/5/2016
Relationship to independent groups t.
You may be thinking that another way to compare the performance in the two groups would be to perform an
independent groups t-test. This might then lead you to ask whether you'd get a result different from the
regression analysis.
The t-test on the data follows.
T-Test
Gr oup S tatis tics
PE RF
TP
1.0 0 L Only
2.0 0 L+ CAI
N
25
Me an Std . Deviatio n Std . Erro r Me an
49 .6800
10 .3952
2.0 790
25
57 .3200
8.8 963
This is what the Regression
t is from the Coefficients
table on the previous page.
1.7 793
Note that the difference in means is 57.32 - 49.68 = 7.64.
Independent Sam ples Test
Le vene' s Test for
Eq uality of
Va riances
PE RF
F
1.9 74
Eq ual varian ces a ssum ed
Eq ual varian ces n ot assume d
Sig .
.16 6
t-te st for Equa lity o f Me ans
t
-2. 792
df
-2. 792
46 .881
48
Me an
Sig . (2-t ailed ) Dif feren ce
.00 7
-7. 6400
.00 8
-7. 6400
95 % Co nfide nce
Int erval of the
Dif feren ce
Std . Erro r
Dif feren ce
Lo wer
2.7 364 -13 .142 0
Up per
-2. 1380
2.7 364 -13 .145 4
-2. 1346
Note that the t-value is 2.792, the same as the t-value from the regression analysis. This indicates a very
important relationship between the independent groups t-test and simple regression analysis:
When the independent variable is a dichotomy, the simple regression of Y onto the dichotomy gives the
same test of difference in group means as the equal variances assumed independent groups t-test.
As we'll see when we get to multiple regression, when independent variables represent several groups, the
regression of Y onto those independent variables gives the same test of differences in group means as does the
analysis of variance. That is, every test that can is conducted using analysis of variance can be conducted
using multiple regression analysis.
Analysis of variance – a dinosaur methodology?
Yes, it is. No self-respecting computer program would use the ANOVA formulae taught in many (but fewer
each year) older statistical textbooks. All convert the problem to a regression analysis and conduct the analysis
as if it were a regression, using the techniques to be shown in the following.
But statistics is littered with dinosaurs. Among many analysts, regression analysis itself has been replaced by
structural equation modeling a much more inclusive technique.
Among other analysts, the kinds of regression analyses we’re doing have been replaced by multilevel analyses,
again, a more inclusive technique in a different context.
Qualitative Independent Variables - 4
2/5/2016
Comparing Three Group Means using Regression – Start here on 3/31/15
The problem
Consider comparing mean religiosity scores among three religious groups – Protestants, Catholics, and Jews.
Suppose you had the following data
Religion
Prot
Prot
Prot
Prot
Prot
Prot
Prot
Cath
Cath
Cath
Cath
Cath
Cath
Cath
Jew
Jew
Jew
Jew
Jew
Jew
Jew
Naive
Religion
Code
1
1
1
1
1
1
1
2
2
2
2
2
2
2
3
3
3
3
3
3
3
Religiosity
6
12
13
11
9
14
12
5
7
8
9
10
8
9
4
3
6
5
7
8
2
Obviously, we could compare the means using traditional ANOVA formulas.
But suppose you wished to analyze these data using regression.
One seemingly logical approach would be to assign the successive integers to the religion groups and perform a
simple regression.
In the above, the variable, RELCODE, is a numeric variable representing the 3 religions. Because it is NOT the
appropriate way to represent a three-category variable in a regression analysis, we’ll call it the Naïve
RELCODE.
The simple regression follows:
Qualitative Independent Variables - 5
2/5/2016
Scatterplot of Strength of Conviction vs. Naive RELCODE
Below is a scatterplot of the “relationship” of STRENGTH to Naïve RELCODE.
16
14
12
10
8
STRENGTH
6
4
This is mostly a page of crap
because the analysis is completely
inappropriate.
2
0
.5
1.0
1.5
2.0
2.5
3.0
3.5
NAÏVE RELCODE
RELCODE
Regression
Va riabl es Entere d/Rem ov e db
Mo del
1
Va riable s En tered
RE LCO DEa
Va riable s
Re move d
.
Me thod
En ter
a. All requ ested vari ables ente red.
b. De pend ent V ariab le: S TRE NGT H
Model S umm ary
Mo del
1
R
.76 7 a
R S quare
.58 9
Ad justed R S quare
.56 7
Std . Erro r of
the Estim ate
2.1 52
a. Pre dicto rs: (Consta nt), RELCODE
Coeffici ents a
Un stand ardized Co efficients
Mo del
1
(Co nstan t)
RE LCOD
E
B
14. 000
Std . Erro r
1.2 43
-3.0 00
.57 5
Sta ndardized
Co efficie nts
Be ta
-.76 7
t
11. 267
Sig .
.00 0
-5.2 16
.00 0
a. De pend ent V ariab le: ST RENGTH
Looks like a strong “negative” relationship.
But wait!! Something’s wrong. <===== Not crap.
For this analysis, I assigned the numbers 1, 2, and 3 to the religions Prot, Cath, and Jew respectively.
But I could just as well have used a different assignment. How about Cath = 1, Prot=2, and Jew=3?
Qualitative Independent Variables - 6
2/5/2016
The data would now be
Prot
Prot
Prot
Prot
Prot
Prot
Prot
Cath
Cath
Cath
Cath
Cath
Cath
Cath
Jew
Jew
Jew
Jew
Jew
Jew
Jew
New
Naive
RelCode
2
2
2
2
2
2
2
1
1
1
1
1
1
1
3
3
3
3
3
3
3
Strength
6
12
13
11
9
14
12
5
7
8
9
10
8
9
4
3
6
5
7
8
2
The scatterplot would be
16
14
12
10
8
6
STRENGTH
Religion
This is another page of crap
because the analysis is completely
inappropriate.
4
2
0
.5
1.0
RELCODE
1.5
2.0
2.5
3.0
3.5
NEW NAÏVE RELCODE
The analysis would be
Regression
Model S umm ary
Mo del
1
R
.38 4 a
R S quare
.14 7
Ad justed R S quare
.10 2
Std . Erro r of
the Estim ate
3.0 99
a. Pre dicto rs: (Consta nt), RELCODE
Coeffici ents a
Un stand ardized Co efficients
Mo del
1
(Co nstan t)
RE LCOD
E
B
11. 000
Std . Erro r
1.7 89
-1.5 00
.82 8
Sta ndardized
Co efficie nts
Be ta
-.38 4
t
6.1 48
Sig .
.00 0
-1.8 11
.08 6
a. De pend ent V ariab le: ST RENGTH
Whoops! What’s going on? Two analyses of the same data yield two VERY different results. Which is
correct? Answer: Neither is correct. In fact, there is nothing of use in either analysis.
This is a great example of how a statistical analysis can go completely wrong.
Qualitative Independent Variables - 7
2/5/2016
The problem
Qualitative Factors, such as religion, race, type of graduate program, etc. with 3 or more values, cannot be
analyzed using simple regression techniques in which the factor is used “as-is” as a predictor.
That’s because the numbers assigned to qualitative factors are simply names. Any set of numbers will do. The
problem with that is that each different set of numbers will yield a different result in a simple regression.
Note: If the qualitative factor has only 2 values, i.e., it’s a dichotomy, it CAN be used as-is in the regression.
(So everything on the first couple of pages of this lecture is still true.) But if it has 3 or more values, it cannot.
Does this mean that regression analysis is useful only for continuous or dichotomous variables? How limiting!!
The solution –
1. Represent each value of the qualitative factor with a combination of two or more values of specially selected
Group Coding Variables.
They’re called group coding variables because each value of a qualitative factor represents a
group of people. For example, RELCODE = 1 in the immediately preceding analysis
represented the group, Catholics. RELCODE = 2 represented Protestant, RELCODE = 3
represented Jews.
If there are K groups, then K-1 group coding variables are required. .
2. Regress the dependent variable onto the set of group coding variables in a multiple regression.
Group Coding Variables
The question arises: What actually are the group coding variables? How are they created?
There are 3 common types of group coding variables.
1. Dummy coding variables.
2. Effects coding variables.
3. Contrast coding variables. (We won’t cover this technique this semester. Covered in Advanced SPSS.)
Qualitative Independent Variables - 8
2/5/2016
Dummy Variable Codes
In Dummy Variable Coding, one group is designated as the Comparison/Reference group. Its mean is
compared with the means of all the other groups.
If K is the number of groups, then K-1 Dummy variables are created.
The comparison group is assigned the value 0 on all Dummy Variables.
Each other group is assigned the value 1 on one Dummy Variable and 0 on the remaining.
Examples . . .
Two Groups (Special group coding variables are not actually needed for two groups.)
Group
GCV1
G1
1
G2
0 = The Comparison Group
Three Groups
Group
G1
G2
G3
GCV1
1
0
0
GCV2
0
1
0  The Comparison Group
Four Groups
Group
G1
G2
G3
G4
GCV1
1
0
0
0
GCV2
0
1
0
0
Five Groups
Group
G1
G2
G3
G4
G5
GCV1
1
0
0
0
0
GCV2
0
1
0
0
0
GCV3
0
0
1
0  The Comparison Group
GCV3
0
0
1
0
0
GCV4
0
0
0
1
0  The Comparison Group
Etc.
Because, as will be shown below, the regression results in a comparison of the means of the groups with “1”
codes with the mean of the Comparison Group, this coding scheme is most often used in situations in which
there is a natural comparison group, for example, a control group to be compared with several experimental
groups.
Qualitative Independent Variables - 9
2/5/2016
Example Regression Using Dummy Variable Coding Start here on 3/31/15
The hypothetical data are job satisfaction scores (JS) of three groups of employees.
JS JOB DC1 DC2
6
7
8
11
9
7
7
5
7
8
9
10
8
9
4
3
6
5
7
8
2
1
1
1
1
1
1
1
2
2
2
2
2
2
2
3
3
3
3
3
3
3
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
0
0
0
Group 1
Group 2
Group 3, the Comparison
Group.
The REGRESSION Dialog
Qualitative Independent Variables - 10
2/5/2016
Regression
b
Variables Entered/Removed
Model
1
Variables
Entered
DC2, DC1a
Variables
Removed
.
Method
Enter
a. All requested variables entered.
b. Dependent Variable: JS
Model Summary
Model
1
R
R Square
a
.630
.397
Adjusted R
Square
.330
Std. Error
of the
Estimate
1.84
When the predictors are group coding
variables, we often say that R2 is the
proportion of variance related to
group membership.
a. Predictors: (Constant), DC2, DC1
ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
40.095
60.857
100.952
df
2
18
20
Mean
Square
20.048
3.381
F
5.930
Sig.
.011a
This F tests the overall null
hypothesis that there are no
differences between the 3
population means.
The F is significant, so reject
the hypothesis that the
population means are equal.
a. Predictors: (Constant), DC2, DC1
b. Dependent Variable: JS
Interpretation of the Coefficients Box.
Each Dummy Variable compares the mean of the group coded 1 on that variable to the mean of the Comparison
group. The value of the B coefficient is the difference in means.
So, for DC1, the B of 2.857 means that the mean of Group1 was 2.857 larger than the Comparison group mean.
For DC2, the B of 3.000 means that the mean of Group2 was 3.000 larger than the Comparison group mean.
Coefficientsa
Model
1
(Constant)
DC1
DC2
Unstandardized
Coefficients
B
Std. Error
5.000
.695
2.857
.983
3.000
.983
Stan
dardi
zed
Coeff
icient
s
Beta
.614
.645
When is dummy coding
used?
When one of the groups is a
natural control group for all
the other groups.
t
7.194
2.907
3.052
Sig.
.000
.009
.007
a. Dependent Variable: JS
Each t tests the significance of the difference between a group mean and the reference group mean.
T=2.907 tests the significance of the difference between Group 1 mean and the Reference group mean.
T = 3.052 test the significance of the difference between Group 2 mean and the Reference group mean.
So the mean of Group1 is significantly different from the Reference group mean and the mean of Group2 is also
significantly different from the Reference Group mean.
Qualitative Independent Variables - 11
2/5/2016
Effects Coding (called Deviation coding in SPSS)
Effects coding is basically the same as Dummy Variable Coding with the exception that the comparison group
code is switched from all 0s to all -1s.
Two Groups (Coding is not actually needed, since there are two groups.)
Group
Code
G1
1
G2
-1
Three Groups
Group
G1
G2
G3
GCV1
1
0
-1
GCV2
0
1
-1
Four Groups
Group
G1
G2
G3
G4
GCV1
1
0
0
-1
GCV2
0
1
0
-1
GCV3
0
0
1
-1
Etc.
The coding switch changes the interpretation of the B coefficients.
Now, rather than representing a comparison of the mean of a “1” group with the mean of a comparison group,
the B coefficient represents a comparison of the mean of a “1” group with the mean of ALL groups.
Qualitative Independent Variables - 12
2/5/2016
Regression Example Using Effects Coding
JS
6
7
8
11
9
7
7
5
7
8
9
10
8
9
4
3
6
5
7
8
2
JOB EC1 EC2
1
1
1
1
1
1
1
2
2
2
2
2
2
2
3
3
3
3
3
3
3
1
1
1
1
1
1
1
0
0
0
0
0
0
0
-1
-1
-1
-1
-1
-1
-1
0
0
0
0
0
0
0
1
1
1
1
1
1
1
-1
-1
-1
-1
-1
-1
-1
Group 1
Group 2
Group 3: Comparison Group
Report
JS
JOB
1 Clerks
2 Receptionist
3 Mailroom
Total
Mean
7.86
8.00
5.00
6.95
N
7
7
7
21
Std.
Deviation
1.68
1.63
2.16
2.25
Alas, we can use REGRESSION to
compare means, but it won’t report
them for us. We have to use some
other procedure, such as the
REPORT procedure, if we want to
actually seen the values of the
means.
Qualitative Independent Variables - 13
2/5/2016
Regression
b
Variables Entered/Removed
Model
1
Variables
Entered
EC1, EC2
DC2,
DC1a
Variables
Removed
.
Everything in
the top three
boxes is the
same as the
dummy
variable
analysis.
Method
Enter
a. All requested variables entered.
b. Dependent Variable: JS
Model Summary
Model
1
Adjusted R
Square
.330
R
R Square
a
.630
.397
Std. Error
of the
Estimate
1.84
a. Predictors: (Constant), DC2,
EC1, EC2
DC1
ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
40.095
60.857
100.952
df
2
18
20
Mean
Square
20.048
3.381
F
5.930
Sig.
.011a
a. Predictors: (Constant), EC2, EC1
b. Dependent Variable: JS
Interpretation of the Coefficients Box.
In Effects coding, each B coefficient represents a comparison of the mean of the group coded 1 on the variable
with the mean of ALL the groups.
So, for EC1, the B of .905 indicates that the mean of Group 1 was .905 larger than the mean of all the groups.
For EC2, the B of 1.048 indicates that the mean of Group 2 was 1.048 larger than the mean of all the groups.
There is no B coefficient for Group 3.
Coefficientsa
Model
1
(Constant)
EC1
EC2
Unstandardized
Coefficients
B
Std. Error
6.952
.401
.905
.567
1.048
.567
Stan
dardi
zed
Coeff
icient
s
Beta
.337
.390
DC1
DC2
t
17.327
1.594
1.846
EC1
Sig.
.000
.128
.081
EC2
a. Dependent Variable: JS
The t of 1.594 indicates that the mean of Group 1 was not significantly different from the mean of all groups.
The t of 1.846 indicates that the mean of Group 2 was not significantly different from the mean of all groups.
Remember that these are the same data as above. It indicates that one form of analysis of the data may be more
informative than another form. In this case, the Dummy Variable analysis was more informative.
Qualitative Independent Variables - 14
2/5/2016
Perspective
You may recall that we considered a procedure for comparing means in the fall semester. It was the analysis of
variance. It was a lot easier than creating group-coding variables and performing the regression analyses
we’ve done here. Furthermore, using the analysis of variance procedure in SPSS automatically provided
means and standard deviations of the groups, something we had to do as an extra step when using
REGRESSION. Plus, the analysis of variance provides post hoc tests that aren’t available in regression.
Here’s the output of SPSS’s ONEWAY analysis of variance procedure for the above data . . .
ANOVA
JS
Between Groups
Within Groups
Total
Sum of
Squares
40.095
60.857
100.952
df
2
18
20
Mean
Square
20.048
3.381
F
5.930
Sig.
.011
Note that the F value (5.930) is exactly the same as the F value from the ANOVA table from the regression
procedure.
So why bother to use the regression procedure to compare group means?
The answer is that if the comparison of a single set of group means were all that there was to the analysis, you
would NOT use the regression procedure - you’d use the analysis of variance procedure.
But here are three reasons for using or at least being familiar with regression-based means comparisons and the
group coding variable schemes upon which they’re based.
1. Whenever you have a mixture of qualitative and quantitative variables in the analysis, regression
procedures are the overwhelming choice. Example: Are there differences in the means of three groups
bgcontrolling for cognitive ability? Can’t do that without including cognitive ability, a quantitative variable in
the analysis. Traditional analysis of variance formulas don’t easily incorporate quantitative variables. Once
you’re familiar with group coding schemes, it’s pretty easy to perform analyses with both quantitative and
qualitative variables.
2. Most statistical packages perform ALL analyses of both qualitative and quantitative and mixtures
using regression formulas. When analyzing only qualitative variables they will print output that looks like
they’ve used the analysis of variance formulas, but behind your back, they’ve actually done regression analyses.
Some of that output may reference the behind-your-back regression that was actually performed. So
knowing about the regression approach to comparison of group means will help you understand the output of
statistical packages performing “analysis of variance”. We’ll see that in the GLM procedure below.
3. Other analyses, for example Logistic Regression and Survival Analyses, to name two in SPSS, have very
regression-like output when qualitative factors are analyzed. That is, they’re quite up-front about the fact
that they do regression analyses. If you don’t understand the regression approach to analysis of variance, it’ll be
very hard for you to understand the output of these procedures.
Qualitative Independent Variables - 15
2/5/2016
Doing the analyses using the GLM procedure.
JS
6
7
8
11
9
7
7
5
7
8
9
10
8
9
4
3
6
5
7
8
2
JOB
1
1
1
1
1
1
1
2
2
2
2
2
2
2
3
3
3
3
3
3
3
Group 1
Note that there are no
group-coding variables
in the data that must be
submitted to GLM.
Hurray. Hurray!!
Group 2
Don’t need no stinkin’
GCVs.
Group 3: Comparison Group
Qualitative Independent Variables - 16
2/5/2016
Put names of
qualitative
factors in the
Fixed Factor(s)
field.
Put names of
quantitative
factors in the
Covariates
field.
Qualitative Independent Variables - 17
2/5/2016
SAVE OUTFILE='C:\Users\Michael\Documents\JSExampleFor513.sav'
/COMPRESSED.
UNIANOVA JS BY JOB
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/POSTHOC=JOB(BTUKEY)
/PLOT=PROFILE(JOB)
/PRINT=ETASQ HOMOGENEITY DESCRIPTIVE OPOWER
/CRITERIA=ALPHA(.05)
/DESIGN=JOB.
[DataSet0] C:\Users\Michael\Documents\JSExampleFor513.sav
Between-Subjects Factors
Descriptive Statistics
N
JOB
1
7
2
7
3
7
Dependent Variable:JS
Job
Levene's Test of Equality of Error Variances
a
Dependent Variable:JS
F
df1
.572
df2
2
Mean
Std. Deviation
1
7.86
1.676
7
2
8.00
1.633
7
3
5.00
2.160
7
Total
6.95
2.247
21
Sig.
18
N
.574
Tests the null hypothesis that the error variance of
the dependent variable is equal across groups.
a. Design: Intercept + JOB
Qualitative Independent Variables - 18
2/5/2016
Tests of Between-Subjects Effects
Dependent Variable:JS
Noncent
.
Type III Sum
Source
of Squares
Partial Eta Paramet
df
Mean Square
40.095a
2
1015.048
1
JOB
40.095
2
20.048
Error
60.857
18
3.381
Total
1116.000
21
100.952
20
Corrected Model
Intercept
Corrected Total
20.048
F
Sig.
Squared
Observed
Powerb
er
5.930
.011
.397 1.186E1
.815
1015.048 3.002E2
.000
.943 3.002E2
1.000
.011
.397 1.186E1
.815
5.930
a. R Squared = .397 (Adjusted R Squared = .330)
b. Computed using alpha = .05
Corrected Model: This is what is in the ANOVA box in regression.
GLM regresses the dependent variable onto ALL of the group coding variables and quantitative
variables, if there are any. This is the report of the significance of that regression.
Intercepts: This is the report on the Y-intercept of the “All predictors” regression reported on in the line
immediately above.
These are signs of the behind-your-back regression analysis that’s actually been conducted.
JOB: The overall F again, this time for job.
Note that no mention is made of the fact that two group-coding variables were created to represent JOB.
The only indication that something is up is the 2 in the df column. That 2 is the number of actual
independent variables used to represent the JOB factor.
Error: The denominator of the F statistic.
Partial Eta squared: A measure of effect size appropriate for analysis of variance.
See 510/511 notes for interpretation of eta squared.
Observed Power: Probability of a significant F if experiment were conducted again with population means
equal to these sample means.
Qualitative Independent Variables - 19
2/5/2016
Profile Plots
Post Hoc Tests
JOB
Homogeneous Subsets
JS
Tukey B
Subset
JOB
N
1
2
3
7
5.00
1
7
7.86
2
7
8.00
Means for groups in homogeneous subsets are
displayed.
Based on observed means.
The error term is Mean Square(Error) = 3.381.
Qualitative Independent Variables - 20
2/5/2016
Having your cake and eating it too - Specifying Coding Schemes in GLM
What if you just miss group coding variables. Is there a way to see them one last time in GLM?
Click on this button to
work with group
coding variables.
Here are the SPSS names
for the coding schemes
we’re using
Qualitative Independent Variables - 21
Our name
SPSS’s
Dummy
Simple
Effects
Deviation
2/5/2016
I should have
checked the
homogeneity box
here. Thanks,
Stephanie.
UNIANOVA JS BY Job
/CONTRAST(Job)=Deviation
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/PRINT=OPOWER ETASQ DESCRIPTIVE PARAMETER
/CRITERIA=ALPHA(.05)
/DESIGN=Job.
Univariate Analysis of Variance
Checking the Parameter
Estimates box tells GLM to print
out any regression parameters it
might have computed.
Between-Subjects Factors
These are regression parameters
for any quantitative independent
variables and for group-coding
variables that are created
automatically by GLM.
N
Job
1
7
2
7
3
7
Descriptive Statistics
Dependent Variable:JS
Job
Mean
Std. Deviation
N
1
7.86
1.676
7
2
8.00
1.633
7
3
5.00
2.160
7
Total
6.95
2.247
21
Qualitative Independent Variables - 22
2/5/2016
Tests of Between-Subjects Effects
Dependent Variable:JS
Type III Sum
Source
of Squares
Corrected Model
40.095a
Intercept
1015.048
Job
40.095
Error
60.857
Total
Corrected Total
df
2
1
2
18
1116.000
21
100.952
20
Mean Square
20.048
1015.048
20.048
3.381
F
5.930
300.225
5.930
Partial Eta
Squared
.397
.943
.397
Sig.
.011
.000
.011
a. R Squared = .397 (Adjusted R Squared = .330)
b. Computed using alpha = .05
Noncent.
Parameter
11.859
300.225
11.859
Observed
Powerb
.815
1.000
.815
These results are from the default
dummy coding that SPSS always
does automatically.
Parameter Estimates
Dependent Variable:JS
95% Confidence Interval
Parameter
B
Std. Error
t
Sig.
Lower Bound Upper Bound
Intercept
5.000
.695
7.194
.000
3.540
6.460
[Job=1]
2.857
.983
2.907
.009
.792
4.922
[Job=2]
3.000
.983
3.052
.007
.935
5.065
[Job=3]
0b
.
.
.
.
.
a. Computed using alpha = .05
b. This parameter is set to zero because it is redundant.
Partial Eta
Squared
.742
.319
.341
.
Noncent.
Parameter
7.194
2.907
3.052
.
Observed
Powera
1.000
.785
.823
.
Custom Hypothesis Tests These are the results for the “deviation” group coding scheme we asked for.
Contrast Results (K Matrix)
Dependent
Variable
JS
.905
0
.905
.567
.128
-.287
2.097
1.048
0
1.048
.567
.081
-.145
2.240
Job Deviation Contrasta
Level 1 vs. Mean
Contrast Estimate
Hypothesized Value
Difference (Estimate - Hypothesized)
Std. Error
Sig.
95% Confidence Interval for
Lower Bound
Difference
Upper Bound
Level 2 vs. Mean
Contrast Estimate
Hypothesized Value
Difference (Estimate - Hypothesized)
Std. Error
Sig.
95% Confidence Interval for
Lower Bound
Difference
Upper Bound
a. Omitted category = 3
p-values are the
same as those
obtained using
the
REGRESSION
procedure on p.
14.
What’s this???
Test Results
Dependent Variable:JS
Sum of
Source
Squares
Contrast
40.095
Error
60.857
df
2
18
Mean Square
20.048
3.381
F
5.930
Sig.
.011
Partial Eta
Squared
.397
Noncent.
Parameter
11.859
a. Computed using alpha = .05
Qualitative Independent Variables - 23
2/5/2016
Observed
Powera
.815
Download