Two Group Discriminant Function Analysis In DFA one wishes to predict group membership from a set of (usually continuous) predictor variables. In the most simple case one has two groups and p predictor variables. A linear discriminant equation, Di a b1 X 1 b2 X 2 b p X p , is constructed such that the two groups differ as much as possible on D. That is, the weights are chosen so that were you to compute a discriminant score ( Di ) for each subject and then do an ANOVA on D, the ratio of the between groups sum of squares to the within groups sum of squares is as large as possible. The value of this ratio is the eigenvalue. “Eigen” can be translated from the German as “own,” “peculiar,” “original,” “singular,” etc. Check out the page at http://core.ecu.edu/psyc/wuenschk/StatHelp/eigenvalue.txt for a discussion of the origins of the term “eigenvalue.” Read the following article, which has been placed on reserve in Joyner: Castellow, W. A., Wuensch, K. L., & Moore, C. H. (1990). Effects of physical attractiveness of the plaintiff and defendant in sexual harassment judgments. Journal of Social Behavior and Personality, 5, 547-562. The data for this analysis are those used for the research presented in that article. They are in the SPSS data file “Harass90.sav.” Download it from my SPSS-Data page and bring it into SPSS. To do the discriminant analysis, click Analyze, Classify, Discriminant. Place the Verdict variable into the Grouping Variable box and define the range from 1 to 2. Place the 22 rating scale variables (D_excit through P_happy) in the ‘Independents” box. We are using the ratings the jurors gave defendant and plaintiff to predict the verdict. Under Statistics, ask for Means, Univariate ANOVAs, Box’s M, Fishers Coefficients, and Unstandardized Coefficients. Under Classify, ask for Priors Computed From Group Sizes and for a Summary Table. Under Save ask that the discriminant scores be saved. Now look at the output. The means show that when the defendant was judged not guilty he was rated more favorably on all 11 scales than when he was judged guilty. When the defendant was judged not guilty the plaintiff was rated less favorably on all 11 scales than when a guilty verdict was returned. The Tests of Equality of Group Means show that the groups differ significantly on every variable except plaintiff excitingness, calmness, independence, and happiness. The discriminant function, in unstandardized units (Canonical Discriminant Function Coefficients), is D = -0.064 + .083 D_excit + ...... + .029 P_happy. The group centroids (mean discriminant scores) are -0.785 for the Guilty group and 1.491 for those jurors who decided the defendant was not guilty. High scores on the discriminant function are associated with the juror deciding to vote not guilty. Copyright 2008 Karl L. Wuensch - All rights reserved. dfa2.doc 2 The eigenvalue = SSbetween _ groups SSwithin _ groups on D (the quantity maximized by the discriminant function coefficients obtained), is 1.187. The canonical correlation = SSbetween _ groups on D SStotal (equivalent to eta in an ANOVA and equal to the point biserial r between Group and D), is 0.737. Wilks lambda is used to test the null hypothesis that the populations have identical SSwithin _ groups means on D. Wilks lambda is = , so the smaller the the more doubt cast SStotal upon that null hypothesis. SPSS uses a 2 approximation to obtain a significance level. For our data, p < .0001. We can determine how much of the variance in the grouping variable is explained by our predictor variables by subtracting the from one. For our data, that is 54% (also the value of the squared canonical correlation). DFA is mathematically equivalent to a MANOVA. Looking at our from the perspective of a MANOVA, when we combine the rating scales with weights that maximize group differences on the resulting linear combination, the groups do differ significantly from one another. Such a MANOVA is sometimes done prior to doing univariate analyses to provide a bit of protection against inflation of alpha. Recall that the grouping variable is predictor variable in MANOVA (is it what is being predicted in DFA) and the rating scales are the MANOVA outcome variables (and our DFA predictor variables). If the MANOVA is not significant, we stop. If it is significant, we may go on to do an ANOVA on each dependent variable. SPSS gave us those ANOVAs. We have created (or discovered) a dimension (like a component in PCA) on which the two groups differ. The univariate ANOVAs may help us explain the nature of the relationship between this discriminant dimension and the grouping variable. For example, some of the variates may have a significant relationship with the grouping variable and others might not, but the univariate ANOVAs totally ignore the correlations among the variates. It is possible for the groups to differ significantly on D but not on any one predictor by itself. The standardized discriminant function coefficients may help. These may be treated as Beta weights in a multiple regression predicting D from z-scores on the X’s, Di 1Z 1 2 Z 2 p Z p . Of course, one must realize that these coefficients reflect the contribution of one variate in the context of the other variates in the model. A low standardized coefficient might mean that the groups do not differ much on that variate or it might just mean that that variate’s correlation with the grouping variable is redundant with that of another variate in the model. Suppressor effects can also occur. For our analysis, the predictors with the largest standardized coefficients were D_sincerity, D_warmth, D_kindness, P_sincerity, P_strength, and P_warmth. Correlations between variates and D may also be helpful. These are available in the loading or structure matrix. Generally, any variate with a loading of .30 or more is 3 considered to be important in defining the discriminant dimension. These correlations may help us understand the discriminant function we have created. Note that high scores on our D are associated with the defendant being rated as sincere, kind, happy, warm, and calm and with the plaintiff being rated as cold, insincere, and cruel. D scores were higher (mean = 1.49) for jurors who voted not guilty than for those who voted guilty (mean = -0.78). The predictors with the highest loadings are D_sincerity and P_warmth. Note that the standardized weight for D_warmth is negative but its loading positive, indicating suppression. If your primary purpose is to predict group membership from the variates (rather than to examine group differences on the variates), you need to do classification. SPSS classifies p(Gi ) p(D | Gi ) subjects into predicted groups using Bayes’ rule: p(Gi | D ) g . p(Gi ) p(D | Gi ) i 1 Each subject’s discriminant score is used to determine the posterior probabilities of being in each of the two groups. The subject is then classified (predicted) to be in the group with the higher posterior probability. By default, SPSS assumes that all groups have equal prior probabilities. For two groups, each prior = ½, for three, 1/3, etc. I asked SPSS to use the group relative frequencies as priors, which should result in better classification. Another way to classify subjects is to use Fisher’s classification function coefficients. For each subject a D is computed for each group and the subject classified into the group for which e’s D is highest. To compute a subjects D1 you would multiply e’s scores on the 22 rating scales by the indicated coefficients and sum them and the constant. For e’s D2 you would do the same with the coefficients for Group 2. If D1 > D2 then you classify the subject into Group 1, if D2 > D1 , the you classify em into Group 2. The classification results table shows that we correctly classified 89.5% of those cases where the verdict was guilty and 86% of those where the verdict was not guilty, for an overall success rate of 88.3%. To evaluate how good this is we should compare 88% with what would be expected by chance. By just randomly classifying half into group 1 and half into group 2 you would expect to get .5(.655) + .5(.345) = 50% correct. Given that the marginal distribution of Verdict is not uniform, you would do better by randomly putting 65.5% into group 1 and 34.4% into group 2 (“probability matching”), in which case you would expect to be correct .655(.655) + .345(.345) = 54.8% of the time. Even better would be to “probability maximize” by just placing every subject into the most likely group, in which case you would be correct 65.5% of the time. We can do significantly better than any of these by using our discriminant function. Assumptions: Multivariate normality of the predictors is assumed. One may hope that large sample sizes make the DFA sufficiently robust that one does not worry about moderate departures from normality. One also assumes that the variance-covariance matrix of the predictor variables is the same in all groups (so we can obtain a pooled matrix to estimate error variance). Box’s M tests this assumption and indicates a problem with our 4 example data. For validity of significance tests, one generally does not worry about this if sample sizes are equal, and with unequal sample sizes one need not worry unless the p < .001. The DFA is thought to be very robust and Box’s M is very sensitive. Non-normality also tends to lower the p for Box’s M. The classification procedures are not, however, so robust as the significance tests are. One may need to transform variables or do a quadratic DFA (SPSS won’t do this) or ask that separate rather than pooled variance-covariance matrices be used. Pillai’s criterion (rather than Wilk’s ) may provide additional robustness for significance testing -- although not available with SPSS discriminant, this criterion is available with SPSS MANOVA. ANOVA on D. Conduct an ANOVA comparing the verdict groups on the discriminant function. Then you can demonstrate that the DFA eigenvalue is equal to the ratio of the SSbetween to SSwithin from that ANOVA and that the ratio of SSbetween to SStotal is the squared canonical correlation coefficient from the DFA. Correlation Between Groups and D. Correlate the discriminant scores with the verdict variable. You will discover that the resulting point biserial correlation coefficient is the canonical correlation from the DFA. Binary Logistic Regression. These same data could be analyzed with a binary logistic regression. Adding the predictors to the intercept-only model significantly improves the fit, 2(22, N = 145) = 119.28, p < .001). The “Variables Not in the Equation” output matches that of the univariate ANOVA output from the DFA. The predictors with significant unique contributions were D_sincerity and P_warmth. The classification table shows that we correctly classified 91.6% of those cases where the verdict was guilty and 84% of those where the verdict was not guilty, for an overall success rate of 89%, slightly better than with the DFA. SAS: Obtain the data file Harass90.dat from my StatData page and the program DFA2.sas from my SAS Programs Page. Run the program. This program uses SAS to do essentially the same analysis we just did with SPSS. Look at the output from PROC REG. It did a multiple regression to predict group membership (1, 2) from the rating scales. Notice that the SSmodel / SSerror = the eigenvalue from the DFA, and that the SSerror / SStotal = the Wilks from the DFA. The square root of the R2 equals the canonical correlation from the DFA. The unstandardized discriminant function coefficients (raw canonical coefficients) are equal to the standardized discriminant function coefficients (pooled within-class standardized canonical coefficients) divided by the pooled (within-group) standard deviations. Note also that the DFA’s discriminant function coefficients are a linear transformation of the multiple regression b’s (multiply each by 4.19395 and you get the unstandardized discriminant funtion coefficients). I do not know what determines the value of this constant, I determined it empirically for this set of data. More on Box’s M Return to Wuensch’s Statistics Lessons Page Copyright 2015 Karl L. Wuensch - All rights reserved.