Lecture 9 Qualitative Independent Variables Comparing means using Regression (I don’t need no stinkin’ ANOVA) In linear regression analysis, the dependent variable should always be a continuous variable. The same restriction does not apply to the independent variables, however. This lecture shows how qualitative variables – variables whose values represent different groups of people, not different quantities, are incorporated into regression analyses, allowing comparison of means of the groups. We’ll discover that IVs with only 2 values can be treated as if they are continuous IVs in any regression. But IVs with 3 or more values must be treated specially. Once that’s done, they also can be included in regression analyses. Regression with a single two-valued (dichotomous) predictor Any two-valued independent variable can be included in a simple or multiple regression analysis. The regression can be used to compare the means of the two groups yielding the same conclusion as the equalvariances independent groups t-test. Suppose the performance of two groups trained using different methods is being compared. Group 1 was trained using a Lecture only method. Group 2 was trained using a Lecture+CAI method. Performance was measured using scores on a final exam covering the material being taught. So, the dependent variable is PERF – performance in the final exam. The independent variable is TP – Training program: Lecture only vs. Lecture+CAI. The data follow ID TP PERF ID 1 2 3 4 5 6 7 8 9 10 11 12 1 1 1 1 1 1 1 1 1 1 1 1 37 69 64 43 37 54 52 40 61 48 44 65 TP PERF 13 14 15 16 17 18 19 20 21 22 23 24 1 1 1 1 1 1 1 1 1 1 1 1 57 50 58 65 48 34 44 58 45 35 45 52 ID 25 26 27 28 29 30 31 32 33 34 35 36 37 TP PERF 1 2 2 2 2 2 2 2 2 2 2 2 2 ID 37 53 62 56 61 63 34 56 54 60 59 67 42 38 39 40 41 42 43 44 45 46 47 48 49 50 TP PERF 2 2 2 2 2 2 2 2 2 2 2 2 2 56 61 62 72 46 64 60 58 73 57 53 43 61 How should the groups be coded? In the example data, Training program (TP) was coded as 1 for the Lecture method and 2 for the L+CAI method. But any two values could have been used. For example 0 and 1 could have been used. Or, 3 and 47 could have been used. When the IV is a dichotomy, the specific values used to represent the two groups formed by the two values of the IV are completely arbitrary. When one of the groups has whatever the other has plus something else, my practice is to give it the larger of the two values, often 0 for the group with less and 1 for the group with more. When one is a control and the other is an experimental group, my practice is to use 0 for the control and 1 for the experimental. Qualitative Independent Variables - 1 2/5/2016 Visualizing regressions when the independent variable is a dichotomy. When an IV is a dichotomy, the scatterplot takes on an unusual appearance. It will be two columns of points, one over one of the values of the IV and the other over the other value. It can be interpreted in the way all scatterplots are interpreted, although if the values of the IV are arbitrary, the sign of the relationship may not be a meaningful characteristic. For example, in the following scatterplot, it would not make any sense to say that performance was positively related to training program. It would make sense, however, to say that performance was higher in the Lecture+CAI program than in the Lecture-only program. In the graph of the example data, the best fitting straight line has been drawn through the scatterplot. When the independent variable is a dichotomy, the line will always go through the mean value of the dependent variable at each of the two independent variable values. We’ll notice that the regression coefficient, the B value, for Training Program is equal to the difference between the means of performance in the two programs. This will always be the case if the values used to code the two groups differ by one (1 vs. 2 in this example). 80 Mean Perf for Method 2 Mean Perf for Method 1 70 60 50 40 PE RF 30 L Only TP L+CAI SPSS Output and its interpretation. Regression Model S umm ary Mo del 1 Ad justed R Std . Erro r of R R S quare Sq uare the Estim ate .37 4 a .14 0 .12 2 9.6 7 a. Pre dicto rs: (Consta nt), T P R-square is the proportion of variance in Y related to differences between the groups. Some say that R-square is the proportion of variance related to group membership. So in this example, 14% of variance of Y is related to group membership. Qualitative Independent Variables - 2 2/5/2016 ANOVAb Mo del 1 Re gressi on Su m of Sq uares 729 .620 df 1 Me an S quare 729 .620 93. 602 Re sidua l 449 2.88 0 48 To tal 522 2.50 0 49 F 7.7 95 Sig . .00 7 a a. Pre dicto rs: (Consta nt), T P b. De pend ent V ariab le: PE RF As was the case with simple regression with a continuous predictor, the information in the ANOVA summary table is redundant with the information in the Coefficients box below. Coeffici ents a Un stand ardized Co efficie nts Mo del 1 (Co nstan t) TP B Std . Erro r 42. 040 4.3 27 7.6 40 2.7 36 Sta ndard ize d Co efficie nts Be ta .37 4 t 9.7 16 Sig . .00 0 2.7 92 .00 7 a. De pend ent V ariab le: PE RF Interpretation of (Constant): This is the expected value of the dependent variable when the independent variable = 0. If one of the groups had been coded as 0, then the y-intercept would have been the expected value of Y in that group. In this example, neither group is coded 0, so the value of the y-intercept has no special meaning. Interpretation of B when IV has only two values . . . B = Difference in group means divided by difference in X-values for the two groups. 1 If the X-values for the groups differ by 1, as they do here, then B = Difference in group means. 2 The sign of the B coefficient. The sign of the b coefficient associated with a dichotomous variable dependent on how the groups were labeled. In this case, the L Only group was labeled 1 and the L+CAI group was labeled 2. If the sign of the B coefficient is positive, this means that the group with the larger IV value had a larger mean. If the sign of the B coefficient is negative, this means that the group with the larger IV value had a SMALLER mean. The fact that B is positive means that the L+CAI group mean (coded 2) was larger than the L group mean (coded 1). If the labeling had been reversed, with L+CAI coded as 1 and L-only coded as 2, the sign of the b coefficient would have been negative. The t-value The t values test the hypothesis that each coefficient equals 0. In the case of the Constant, we don't care. In the case of the B coefficient, the t value tells us whether the B coefficient, and equivalently, the difference in means, is significantly different from 0. The p-value of .007 suggests that the B value is significantly different from 0. The bottom line This means that when the independent variable is a dichotomy, regression of the dependent variable onto a dichotomous independent variable is a comparison of the means of the two groups. Qualitative Independent Variables - 3 2/5/2016 Relationship to independent groups t. You may be thinking that another way to compare the performance in the two groups would be to perform an independent groups t-test. This might then lead you to ask whether you'd get a result different from the regression analysis. The t-test on the data follows. T-Test Gr oup S tatis tics PE RF TP 1.0 0 L Only 2.0 0 L+ CAI N 25 Me an Std . Deviatio n Std . Erro r Me an 49 .6800 10 .3952 2.0 790 25 57 .3200 8.8 963 This is what the Regression t is from the Coefficients table on the previous page. 1.7 793 Note that the difference in means is 57.32 - 49.68 = 7.64. Independent Sam ples Test Le vene' s Test for Eq uality of Va riances PE RF F 1.9 74 Eq ual varian ces a ssum ed Eq ual varian ces n ot assume d Sig . .16 6 t-te st for Equa lity o f Me ans t -2. 792 df -2. 792 46 .881 48 Me an Sig . (2-t ailed ) Dif feren ce .00 7 -7. 6400 .00 8 -7. 6400 95 % Co nfide nce Int erval of the Dif feren ce Std . Erro r Dif feren ce Lo wer 2.7 364 -13 .142 0 Up per -2. 1380 2.7 364 -13 .145 4 -2. 1346 Note that the t-value is 2.792, the same as the t-value from the regression analysis. This indicates a very important relationship between the independent groups t-test and simple regression analysis: When the independent variable is a dichotomy, the simple regression of Y onto the dichotomy gives the same test of difference in group means as the equal variances assumed independent groups t-test. As we'll see when we get to multiple regression, when independent variables represent several groups, the regression of Y onto those independent variables gives the same test of differences in group means as does the analysis of variance. That is, every test that can is conducted using analysis of variance can be conducted using multiple regression analysis. Analysis of variance – a dinosaur methodology? Yes, it is. No self-respecting computer program would use the ANOVA formulae taught in many (but fewer each year) older statistical textbooks. All convert the problem to a regression analysis and conduct the analysis as if it were a regression, using the techniques to be shown in the following. But statistics is littered with dinosaurs. Among many analysts, regression analysis itself has been replaced by structural equation modeling a much more inclusive technique. Among other analysts, the kinds of regression analyses we’re doing have been replaced by multilevel analyses, again, a more inclusive technique in a different context. Qualitative Independent Variables - 4 2/5/2016 Comparing Three Group Means using Regression – Start here on 3/31/15 The problem Consider comparing mean religiosity scores among three religious groups – Protestants, Catholics, and Jews. Suppose you had the following data Religion Prot Prot Prot Prot Prot Prot Prot Cath Cath Cath Cath Cath Cath Cath Jew Jew Jew Jew Jew Jew Jew Naive Religion Code 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 Religiosity 6 12 13 11 9 14 12 5 7 8 9 10 8 9 4 3 6 5 7 8 2 Obviously, we could compare the means using traditional ANOVA formulas. But suppose you wished to analyze these data using regression. One seemingly logical approach would be to assign the successive integers to the religion groups and perform a simple regression. In the above, the variable, RELCODE, is a numeric variable representing the 3 religions. Because it is NOT the appropriate way to represent a three-category variable in a regression analysis, we’ll call it the Naïve RELCODE. The simple regression follows: Qualitative Independent Variables - 5 2/5/2016 Scatterplot of Strength of Conviction vs. Naive RELCODE Below is a scatterplot of the “relationship” of STRENGTH to Naïve RELCODE. 16 14 12 10 8 STRENGTH 6 4 This is mostly a page of crap because the analysis is completely inappropriate. 2 0 .5 1.0 1.5 2.0 2.5 3.0 3.5 NAÏVE RELCODE RELCODE Regression Va riabl es Entere d/Rem ov e db Mo del 1 Va riable s En tered RE LCO DEa Va riable s Re move d . Me thod En ter a. All requ ested vari ables ente red. b. De pend ent V ariab le: S TRE NGT H Model S umm ary Mo del 1 R .76 7 a R S quare .58 9 Ad justed R S quare .56 7 Std . Erro r of the Estim ate 2.1 52 a. Pre dicto rs: (Consta nt), RELCODE Coeffici ents a Un stand ardized Co efficients Mo del 1 (Co nstan t) RE LCOD E B 14. 000 Std . Erro r 1.2 43 -3.0 00 .57 5 Sta ndardized Co efficie nts Be ta -.76 7 t 11. 267 Sig . .00 0 -5.2 16 .00 0 a. De pend ent V ariab le: ST RENGTH Looks like a strong “negative” relationship. But wait!! Something’s wrong. <===== Not crap. For this analysis, I assigned the numbers 1, 2, and 3 to the religions Prot, Cath, and Jew respectively. But I could just as well have used a different assignment. How about Cath = 1, Prot=2, and Jew=3? Qualitative Independent Variables - 6 2/5/2016 The data would now be Prot Prot Prot Prot Prot Prot Prot Cath Cath Cath Cath Cath Cath Cath Jew Jew Jew Jew Jew Jew Jew New Naive RelCode 2 2 2 2 2 2 2 1 1 1 1 1 1 1 3 3 3 3 3 3 3 Strength 6 12 13 11 9 14 12 5 7 8 9 10 8 9 4 3 6 5 7 8 2 The scatterplot would be 16 14 12 10 8 6 STRENGTH Religion This is another page of crap because the analysis is completely inappropriate. 4 2 0 .5 1.0 RELCODE 1.5 2.0 2.5 3.0 3.5 NEW NAÏVE RELCODE The analysis would be Regression Model S umm ary Mo del 1 R .38 4 a R S quare .14 7 Ad justed R S quare .10 2 Std . Erro r of the Estim ate 3.0 99 a. Pre dicto rs: (Consta nt), RELCODE Coeffici ents a Un stand ardized Co efficients Mo del 1 (Co nstan t) RE LCOD E B 11. 000 Std . Erro r 1.7 89 -1.5 00 .82 8 Sta ndardized Co efficie nts Be ta -.38 4 t 6.1 48 Sig . .00 0 -1.8 11 .08 6 a. De pend ent V ariab le: ST RENGTH Whoops! What’s going on? Two analyses of the same data yield two VERY different results. Which is correct? Answer: Neither is correct. In fact, there is nothing of use in either analysis. This is a great example of how a statistical analysis can go completely wrong. Qualitative Independent Variables - 7 2/5/2016 The problem Qualitative Factors, such as religion, race, type of graduate program, etc. with 3 or more values, cannot be analyzed using simple regression techniques in which the factor is used “as-is” as a predictor. That’s because the numbers assigned to qualitative factors are simply names. Any set of numbers will do. The problem with that is that each different set of numbers will yield a different result in a simple regression. Note: If the qualitative factor has only 2 values, i.e., it’s a dichotomy, it CAN be used as-is in the regression. (So everything on the first couple of pages of this lecture is still true.) But if it has 3 or more values, it cannot. Does this mean that regression analysis is useful only for continuous or dichotomous variables? How limiting!! The solution – 1. Represent each value of the qualitative factor with a combination of two or more values of specially selected Group Coding Variables. They’re called group coding variables because each value of a qualitative factor represents a group of people. For example, RELCODE = 1 in the immediately preceding analysis represented the group, Catholics. RELCODE = 2 represented Protestant, RELCODE = 3 represented Jews. If there are K groups, then K-1 group coding variables are required. . 2. Regress the dependent variable onto the set of group coding variables in a multiple regression. Group Coding Variables The question arises: What actually are the group coding variables? How are they created? There are 3 common types of group coding variables. 1. Dummy coding variables. 2. Effects coding variables. 3. Contrast coding variables. (We won’t cover this technique this semester. Covered in Advanced SPSS.) Qualitative Independent Variables - 8 2/5/2016 Dummy Variable Codes In Dummy Variable Coding, one group is designated as the Comparison/Reference group. Its mean is compared with the means of all the other groups. If K is the number of groups, then K-1 Dummy variables are created. The comparison group is assigned the value 0 on all Dummy Variables. Each other group is assigned the value 1 on one Dummy Variable and 0 on the remaining. Examples . . . Two Groups (Special group coding variables are not actually needed for two groups.) Group GCV1 G1 1 G2 0 = The Comparison Group Three Groups Group G1 G2 G3 GCV1 1 0 0 GCV2 0 1 0 The Comparison Group Four Groups Group G1 G2 G3 G4 GCV1 1 0 0 0 GCV2 0 1 0 0 Five Groups Group G1 G2 G3 G4 G5 GCV1 1 0 0 0 0 GCV2 0 1 0 0 0 GCV3 0 0 1 0 The Comparison Group GCV3 0 0 1 0 0 GCV4 0 0 0 1 0 The Comparison Group Etc. Because, as will be shown below, the regression results in a comparison of the means of the groups with “1” codes with the mean of the Comparison Group, this coding scheme is most often used in situations in which there is a natural comparison group, for example, a control group to be compared with several experimental groups. Qualitative Independent Variables - 9 2/5/2016 Example Regression Using Dummy Variable Coding Start here on 3/31/15 The hypothetical data are job satisfaction scores (JS) of three groups of employees. JS JOB DC1 DC2 6 7 8 11 9 7 7 5 7 8 9 10 8 9 4 3 6 5 7 8 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 Group 1 Group 2 Group 3, the Comparison Group. The REGRESSION Dialog Qualitative Independent Variables - 10 2/5/2016 Regression b Variables Entered/Removed Model 1 Variables Entered DC2, DC1a Variables Removed . Method Enter a. All requested variables entered. b. Dependent Variable: JS Model Summary Model 1 R R Square a .630 .397 Adjusted R Square .330 Std. Error of the Estimate 1.84 When the predictors are group coding variables, we often say that R2 is the proportion of variance related to group membership. a. Predictors: (Constant), DC2, DC1 ANOVAb Model 1 Regression Residual Total Sum of Squares 40.095 60.857 100.952 df 2 18 20 Mean Square 20.048 3.381 F 5.930 Sig. .011a This F tests the overall null hypothesis that there are no differences between the 3 population means. The F is significant, so reject the hypothesis that the population means are equal. a. Predictors: (Constant), DC2, DC1 b. Dependent Variable: JS Interpretation of the Coefficients Box. Each Dummy Variable compares the mean of the group coded 1 on that variable to the mean of the Comparison group. The value of the B coefficient is the difference in means. So, for DC1, the B of 2.857 means that the mean of Group1 was 2.857 larger than the Comparison group mean. For DC2, the B of 3.000 means that the mean of Group2 was 3.000 larger than the Comparison group mean. Coefficientsa Model 1 (Constant) DC1 DC2 Unstandardized Coefficients B Std. Error 5.000 .695 2.857 .983 3.000 .983 Stan dardi zed Coeff icient s Beta .614 .645 When is dummy coding used? When one of the groups is a natural control group for all the other groups. t 7.194 2.907 3.052 Sig. .000 .009 .007 a. Dependent Variable: JS Each t tests the significance of the difference between a group mean and the reference group mean. T=2.907 tests the significance of the difference between Group 1 mean and the Reference group mean. T = 3.052 test the significance of the difference between Group 2 mean and the Reference group mean. So the mean of Group1 is significantly different from the Reference group mean and the mean of Group2 is also significantly different from the Reference Group mean. Qualitative Independent Variables - 11 2/5/2016 Effects Coding (called Deviation coding in SPSS) Effects coding is basically the same as Dummy Variable Coding with the exception that the comparison group code is switched from all 0s to all -1s. Two Groups (Coding is not actually needed, since there are two groups.) Group Code G1 1 G2 -1 Three Groups Group G1 G2 G3 GCV1 1 0 -1 GCV2 0 1 -1 Four Groups Group G1 G2 G3 G4 GCV1 1 0 0 -1 GCV2 0 1 0 -1 GCV3 0 0 1 -1 Etc. The coding switch changes the interpretation of the B coefficients. Now, rather than representing a comparison of the mean of a “1” group with the mean of a comparison group, the B coefficient represents a comparison of the mean of a “1” group with the mean of ALL groups. Qualitative Independent Variables - 12 2/5/2016 Regression Example Using Effects Coding JS 6 7 8 11 9 7 7 5 7 8 9 10 8 9 4 3 6 5 7 8 2 JOB EC1 EC2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 1 1 1 1 1 1 1 0 0 0 0 0 0 0 -1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 Group 1 Group 2 Group 3: Comparison Group Report JS JOB 1 Clerks 2 Receptionist 3 Mailroom Total Mean 7.86 8.00 5.00 6.95 N 7 7 7 21 Std. Deviation 1.68 1.63 2.16 2.25 Alas, we can use REGRESSION to compare means, but it won’t report them for us. We have to use some other procedure, such as the REPORT procedure, if we want to actually seen the values of the means. Qualitative Independent Variables - 13 2/5/2016 Regression b Variables Entered/Removed Model 1 Variables Entered EC1, EC2 DC2, DC1a Variables Removed . Everything in the top three boxes is the same as the dummy variable analysis. Method Enter a. All requested variables entered. b. Dependent Variable: JS Model Summary Model 1 Adjusted R Square .330 R R Square a .630 .397 Std. Error of the Estimate 1.84 a. Predictors: (Constant), DC2, EC1, EC2 DC1 ANOVAb Model 1 Regression Residual Total Sum of Squares 40.095 60.857 100.952 df 2 18 20 Mean Square 20.048 3.381 F 5.930 Sig. .011a a. Predictors: (Constant), EC2, EC1 b. Dependent Variable: JS Interpretation of the Coefficients Box. In Effects coding, each B coefficient represents a comparison of the mean of the group coded 1 on the variable with the mean of ALL the groups. So, for EC1, the B of .905 indicates that the mean of Group 1 was .905 larger than the mean of all the groups. For EC2, the B of 1.048 indicates that the mean of Group 2 was 1.048 larger than the mean of all the groups. There is no B coefficient for Group 3. Coefficientsa Model 1 (Constant) EC1 EC2 Unstandardized Coefficients B Std. Error 6.952 .401 .905 .567 1.048 .567 Stan dardi zed Coeff icient s Beta .337 .390 DC1 DC2 t 17.327 1.594 1.846 EC1 Sig. .000 .128 .081 EC2 a. Dependent Variable: JS The t of 1.594 indicates that the mean of Group 1 was not significantly different from the mean of all groups. The t of 1.846 indicates that the mean of Group 2 was not significantly different from the mean of all groups. Remember that these are the same data as above. It indicates that one form of analysis of the data may be more informative than another form. In this case, the Dummy Variable analysis was more informative. Qualitative Independent Variables - 14 2/5/2016 Perspective You may recall that we considered a procedure for comparing means in the fall semester. It was the analysis of variance. It was a lot easier than creating group-coding variables and performing the regression analyses we’ve done here. Furthermore, using the analysis of variance procedure in SPSS automatically provided means and standard deviations of the groups, something we had to do as an extra step when using REGRESSION. Plus, the analysis of variance provides post hoc tests that aren’t available in regression. Here’s the output of SPSS’s ONEWAY analysis of variance procedure for the above data . . . ANOVA JS Between Groups Within Groups Total Sum of Squares 40.095 60.857 100.952 df 2 18 20 Mean Square 20.048 3.381 F 5.930 Sig. .011 Note that the F value (5.930) is exactly the same as the F value from the ANOVA table from the regression procedure. So why bother to use the regression procedure to compare group means? The answer is that if the comparison of a single set of group means were all that there was to the analysis, you would NOT use the regression procedure - you’d use the analysis of variance procedure. But here are three reasons for using or at least being familiar with regression-based means comparisons and the group coding variable schemes upon which they’re based. 1. Whenever you have a mixture of qualitative and quantitative variables in the analysis, regression procedures are the overwhelming choice. Example: Are there differences in the means of three groups bgcontrolling for cognitive ability? Can’t do that without including cognitive ability, a quantitative variable in the analysis. Traditional analysis of variance formulas don’t easily incorporate quantitative variables. Once you’re familiar with group coding schemes, it’s pretty easy to perform analyses with both quantitative and qualitative variables. 2. Most statistical packages perform ALL analyses of both qualitative and quantitative and mixtures using regression formulas. When analyzing only qualitative variables they will print output that looks like they’ve used the analysis of variance formulas, but behind your back, they’ve actually done regression analyses. Some of that output may reference the behind-your-back regression that was actually performed. So knowing about the regression approach to comparison of group means will help you understand the output of statistical packages performing “analysis of variance”. We’ll see that in the GLM procedure below. 3. Other analyses, for example Logistic Regression and Survival Analyses, to name two in SPSS, have very regression-like output when qualitative factors are analyzed. That is, they’re quite up-front about the fact that they do regression analyses. If you don’t understand the regression approach to analysis of variance, it’ll be very hard for you to understand the output of these procedures. Qualitative Independent Variables - 15 2/5/2016 Doing the analyses using the GLM procedure. JS 6 7 8 11 9 7 7 5 7 8 9 10 8 9 4 3 6 5 7 8 2 JOB 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 Group 1 Note that there are no group-coding variables in the data that must be submitted to GLM. Hurray. Hurray!! Group 2 Don’t need no stinkin’ GCVs. Group 3: Comparison Group Qualitative Independent Variables - 16 2/5/2016 Put names of qualitative factors in the Fixed Factor(s) field. Put names of quantitative factors in the Covariates field. Qualitative Independent Variables - 17 2/5/2016 SAVE OUTFILE='C:\Users\Michael\Documents\JSExampleFor513.sav' /COMPRESSED. UNIANOVA JS BY JOB /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /POSTHOC=JOB(BTUKEY) /PLOT=PROFILE(JOB) /PRINT=ETASQ HOMOGENEITY DESCRIPTIVE OPOWER /CRITERIA=ALPHA(.05) /DESIGN=JOB. [DataSet0] C:\Users\Michael\Documents\JSExampleFor513.sav Between-Subjects Factors Descriptive Statistics N JOB 1 7 2 7 3 7 Dependent Variable:JS Job Levene's Test of Equality of Error Variances a Dependent Variable:JS F df1 .572 df2 2 Mean Std. Deviation 1 7.86 1.676 7 2 8.00 1.633 7 3 5.00 2.160 7 Total 6.95 2.247 21 Sig. 18 N .574 Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept + JOB Qualitative Independent Variables - 18 2/5/2016 Tests of Between-Subjects Effects Dependent Variable:JS Noncent . Type III Sum Source of Squares Partial Eta Paramet df Mean Square 40.095a 2 1015.048 1 JOB 40.095 2 20.048 Error 60.857 18 3.381 Total 1116.000 21 100.952 20 Corrected Model Intercept Corrected Total 20.048 F Sig. Squared Observed Powerb er 5.930 .011 .397 1.186E1 .815 1015.048 3.002E2 .000 .943 3.002E2 1.000 .011 .397 1.186E1 .815 5.930 a. R Squared = .397 (Adjusted R Squared = .330) b. Computed using alpha = .05 Corrected Model: This is what is in the ANOVA box in regression. GLM regresses the dependent variable onto ALL of the group coding variables and quantitative variables, if there are any. This is the report of the significance of that regression. Intercepts: This is the report on the Y-intercept of the “All predictors” regression reported on in the line immediately above. These are signs of the behind-your-back regression analysis that’s actually been conducted. JOB: The overall F again, this time for job. Note that no mention is made of the fact that two group-coding variables were created to represent JOB. The only indication that something is up is the 2 in the df column. That 2 is the number of actual independent variables used to represent the JOB factor. Error: The denominator of the F statistic. Partial Eta squared: A measure of effect size appropriate for analysis of variance. See 510/511 notes for interpretation of eta squared. Observed Power: Probability of a significant F if experiment were conducted again with population means equal to these sample means. Qualitative Independent Variables - 19 2/5/2016 Profile Plots Post Hoc Tests JOB Homogeneous Subsets JS Tukey B Subset JOB N 1 2 3 7 5.00 1 7 7.86 2 7 8.00 Means for groups in homogeneous subsets are displayed. Based on observed means. The error term is Mean Square(Error) = 3.381. Qualitative Independent Variables - 20 2/5/2016 Having your cake and eating it too - Specifying Coding Schemes in GLM What if you just miss group coding variables. Is there a way to see them one last time in GLM? Click on this button to work with group coding variables. Here are the SPSS names for the coding schemes we’re using Qualitative Independent Variables - 21 Our name SPSS’s Dummy Simple Effects Deviation 2/5/2016 I should have checked the homogeneity box here. Thanks, Stephanie. UNIANOVA JS BY Job /CONTRAST(Job)=Deviation /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /PRINT=OPOWER ETASQ DESCRIPTIVE PARAMETER /CRITERIA=ALPHA(.05) /DESIGN=Job. Univariate Analysis of Variance Checking the Parameter Estimates box tells GLM to print out any regression parameters it might have computed. Between-Subjects Factors These are regression parameters for any quantitative independent variables and for group-coding variables that are created automatically by GLM. N Job 1 7 2 7 3 7 Descriptive Statistics Dependent Variable:JS Job Mean Std. Deviation N 1 7.86 1.676 7 2 8.00 1.633 7 3 5.00 2.160 7 Total 6.95 2.247 21 Qualitative Independent Variables - 22 2/5/2016 Tests of Between-Subjects Effects Dependent Variable:JS Type III Sum Source of Squares Corrected Model 40.095a Intercept 1015.048 Job 40.095 Error 60.857 Total Corrected Total df 2 1 2 18 1116.000 21 100.952 20 Mean Square 20.048 1015.048 20.048 3.381 F 5.930 300.225 5.930 Partial Eta Squared .397 .943 .397 Sig. .011 .000 .011 a. R Squared = .397 (Adjusted R Squared = .330) b. Computed using alpha = .05 Noncent. Parameter 11.859 300.225 11.859 Observed Powerb .815 1.000 .815 These results are from the default dummy coding that SPSS always does automatically. Parameter Estimates Dependent Variable:JS 95% Confidence Interval Parameter B Std. Error t Sig. Lower Bound Upper Bound Intercept 5.000 .695 7.194 .000 3.540 6.460 [Job=1] 2.857 .983 2.907 .009 .792 4.922 [Job=2] 3.000 .983 3.052 .007 .935 5.065 [Job=3] 0b . . . . . a. Computed using alpha = .05 b. This parameter is set to zero because it is redundant. Partial Eta Squared .742 .319 .341 . Noncent. Parameter 7.194 2.907 3.052 . Observed Powera 1.000 .785 .823 . Custom Hypothesis Tests These are the results for the “deviation” group coding scheme we asked for. Contrast Results (K Matrix) Dependent Variable JS .905 0 .905 .567 .128 -.287 2.097 1.048 0 1.048 .567 .081 -.145 2.240 Job Deviation Contrasta Level 1 vs. Mean Contrast Estimate Hypothesized Value Difference (Estimate - Hypothesized) Std. Error Sig. 95% Confidence Interval for Lower Bound Difference Upper Bound Level 2 vs. Mean Contrast Estimate Hypothesized Value Difference (Estimate - Hypothesized) Std. Error Sig. 95% Confidence Interval for Lower Bound Difference Upper Bound a. Omitted category = 3 p-values are the same as those obtained using the REGRESSION procedure on p. 14. What’s this??? Test Results Dependent Variable:JS Sum of Source Squares Contrast 40.095 Error 60.857 df 2 18 Mean Square 20.048 3.381 F 5.930 Sig. .011 Partial Eta Squared .397 Noncent. Parameter 11.859 a. Computed using alpha = .05 Qualitative Independent Variables - 23 2/5/2016 Observed Powera .815