Two Group Discriminant Function Analysis

Two Group Discriminant Function Analysis
In DFA one wishes to predict group membership from a set of (usually continuous)
predictor variables. In the most simple case one has two groups and p predictor variables. A
linear discriminant equation, Di  a  b1 X 1  b2 X 2    b p X p , is constructed such that the
two groups differ as much as possible on D. That is, the weights are chosen so that were you
to compute a discriminant score ( Di ) for each subject and then do an ANOVA on D, the ratio
of the between groups sum of squares to the within groups sum of squares is as large as
possible. The value of this ratio is the eigenvalue. “Eigen” can be translated from the
German as “own,” “peculiar,” “original,” “singular,” etc. Check out the page at
http://core.ecu.edu/psyc/wuenschk/StatHelp/eigenvalue.txt for a discussion of the origins of
the term “eigenvalue.”
Read the following article, which has been placed on reserve in Joyner:
Castellow, W. A., Wuensch, K. L., & Moore, C. H. (1990). Effects of physical attractiveness of
the plaintiff and defendant in sexual harassment judgments. Journal of Social Behavior
and Personality, 5, 547-562.
The data for this analysis are those used for the research presented in that article.
They are in the SPSS data file “Harass90.sav.” Download it from my SPSS-Data page and
bring it into SPSS. To do the discriminant analysis, click Analyze, Classify, Discriminant.
Place the Verdict variable into the Grouping Variable box and define the range from 1 to 2.
Place the 22 rating scale variables (D_excit through P_happy) in the ‘Independents” box. We
are using the ratings the jurors gave defendant and plaintiff to predict the verdict. Under
Statistics, ask for Means, Univariate ANOVAs, Box’s M, Fishers Coefficients, and
Unstandardized Coefficients. Under Classify, ask for Priors Computed From Group Sizes and
for a Summary Table. Under Save ask that the discriminant scores be saved.
Now look at the output. The means show that when the defendant was judged not
guilty he was rated more favorably on all 11 scales than when he was judged guilty. When the
defendant was judged not guilty the plaintiff was rated less favorably on all 11 scales than
when a guilty verdict was returned. The Tests of Equality of Group Means show that the
groups differ significantly on every variable except plaintiff excitingness, calmness,
independence, and happiness.
The discriminant function, in unstandardized units (Canonical Discriminant Function
Coefficients), is D = -0.064 + .083 D_excit + ...... + .029 P_happy. The group centroids
(mean discriminant scores) are -0.785 for the Guilty group and 1.491 for those jurors who
decided the defendant was not guilty. High scores on the discriminant function are associated
with the juror deciding to vote not guilty.

Copyright 2008 Karl L. Wuensch - All rights reserved.
dfa2.doc
2
The eigenvalue =
SSbetween _ groups
SSwithin _ groups
on D (the quantity maximized by the discriminant
function coefficients obtained), is 1.187. The canonical correlation =
SSbetween _ groups
on D
SStotal
(equivalent to eta in an ANOVA and equal to the point biserial r between Group and D), is
0.737.
Wilks lambda is used to test the null hypothesis that the populations have identical
SSwithin _ groups
means on D. Wilks lambda is  =
, so the smaller the  the more doubt cast
SStotal
upon that null hypothesis. SPSS uses a 2 approximation to obtain a significance level. For
our data, p < .0001. We can determine how much of the variance in the grouping variable is
explained by our predictor variables by subtracting the  from one. For our data, that is 54%
(also the value of the squared canonical correlation).
DFA is mathematically equivalent to a MANOVA. Looking at our  from the perspective
of a MANOVA, when we combine the rating scales with weights that maximize group
differences on the resulting linear combination, the groups do differ significantly from one
another. Such a MANOVA is sometimes done prior to doing univariate analyses to provide a
bit of protection against inflation of alpha. Recall that the grouping variable is predictor variable
in MANOVA (is it what is being predicted in DFA) and the rating scales are the MANOVA
outcome variables (and our DFA predictor variables). If the MANOVA is not significant, we
stop. If it is significant, we may go on to do an ANOVA on each dependent variable. SPSS
gave us those ANOVAs.
We have created (or discovered) a dimension (like a component in PCA) on which the
two groups differ. The univariate ANOVAs may help us explain the nature of the relationship
between this discriminant dimension and the grouping variable. For example, some of the
variates may have a significant relationship with the grouping variable and others might not,
but the univariate ANOVAs totally ignore the correlations among the variates. It is possible for
the groups to differ significantly on D but not on any one predictor by itself.
The standardized discriminant function coefficients may help. These may be
treated as Beta weights in a multiple regression predicting D from z-scores on the X’s,
Di   1Z 1   2 Z 2     p Z p . Of course, one must realize that these coefficients reflect the
contribution of one variate in the context of the other variates in the model. A low standardized
coefficient might mean that the groups do not differ much on that variate or it might just mean
that that variate’s correlation with the grouping variable is redundant with that of another
variate in the model. Suppressor effects can also occur. For our analysis, the predictors with
the largest standardized coefficients were D_sincerity, D_warmth, D_kindness, P_sincerity,
P_strength, and P_warmth.
Correlations between variates and D may also be helpful. These are available in the
loading or structure matrix. Generally, any variate with a loading of .30 or more is
3
considered to be important in defining the discriminant dimension. These correlations may help
us understand the discriminant function we have created. Note that high scores on our D are
associated with the defendant being rated as sincere, kind, happy, warm, and calm and with
the plaintiff being rated as cold, insincere, and cruel. D scores were higher (mean = 1.49) for
jurors who voted not guilty than for those who voted guilty (mean = -0.78). The predictors with
the highest loadings are D_sincerity and P_warmth. Note that the standardized weight for
D_warmth is negative but its loading positive, indicating suppression.
If your primary purpose is to predict group membership from the variates (rather than to
examine group differences on the variates), you need to do classification. SPSS classifies
p(Gi )  p(D | Gi )
subjects into predicted groups using Bayes’ rule: p(Gi | D )  g
.
 p(Gi )  p(D | Gi )
i 1
Each subject’s discriminant score is used to determine the posterior probabilities of
being in each of the two groups. The subject is then classified (predicted) to be in the group
with the higher posterior probability.
By default, SPSS assumes that all groups have equal prior probabilities. For two
groups, each prior = ½, for three, 1/3, etc. I asked SPSS to use the group relative frequencies
as priors, which should result in better classification.
Another way to classify subjects is to use Fisher’s classification function
coefficients. For each subject a D is computed for each group and the subject classified into
the group for which e’s D is highest. To compute a subjects D1 you would multiply e’s scores
on the 22 rating scales by the indicated coefficients and sum them and the constant. For e’s
D2 you would do the same with the coefficients for Group 2. If D1 > D2 then you classify the
subject into Group 1, if D2 > D1 , the you classify em into Group 2.
The classification results table shows that we correctly classified 89.5% of those
cases where the verdict was guilty and 86% of those where the verdict was not guilty, for an
overall success rate of 88.3%. To evaluate how good this is we should compare 88% with
what would be expected by chance. By just randomly classifying half into group 1 and half
into group 2 you would expect to get .5(.655) + .5(.345) = 50% correct. Given that the
marginal distribution of Verdict is not uniform, you would do better by randomly putting 65.5%
into group 1 and 34.4% into group 2 (“probability matching”), in which case you would expect
to be correct .655(.655) + .345(.345) = 54.8% of the time. Even better would be to
“probability maximize” by just placing every subject into the most likely group, in which case
you would be correct 65.5% of the time. We can do significantly better than any of these by
using our discriminant function.
Assumptions: Multivariate normality of the predictors is assumed. One may hope
that large sample sizes make the DFA sufficiently robust that one does not worry about
moderate departures from normality. One also assumes that the variance-covariance matrix
of the predictor variables is the same in all groups (so we can obtain a pooled matrix to
estimate error variance). Box’s M tests this assumption and indicates a problem with our
4
example data. For validity of significance tests, one generally does not worry about this if
sample sizes are equal, and with unequal sample sizes one need not worry unless the p <
.001. The DFA is thought to be very robust and Box’s M is very sensitive. Non-normality also
tends to lower the p for Box’s M. The classification procedures are not, however, so robust
as the significance tests are. One may need to transform variables or do a quadratic DFA
(SPSS won’t do this) or ask that separate rather than pooled variance-covariance matrices be
used. Pillai’s criterion (rather than Wilk’s ) may provide additional robustness for
significance testing -- although not available with SPSS discriminant, this criterion is available
with SPSS MANOVA.
ANOVA on D. Conduct an ANOVA comparing the verdict groups on the discriminant
function. Then you can demonstrate that the DFA eigenvalue is equal to the ratio of the
SSbetween to SSwithin from that ANOVA and that the ratio of SSbetween to SStotal is the squared
canonical correlation coefficient from the DFA.
Correlation Between Groups and D. Correlate the discriminant scores with the
verdict variable. You will discover that the resulting point biserial correlation coefficient is the
canonical correlation from the DFA.
Binary Logistic Regression. These same data could be analyzed with a binary
logistic regression. Adding the predictors to the intercept-only model significantly improves the
fit, 2(22, N = 145) = 119.28, p < .001). The “Variables Not in the Equation” output matches
that of the univariate ANOVA output from the DFA. The predictors with significant unique
contributions were D_sincerity and P_warmth.
The classification table shows that we correctly classified 91.6% of those cases where
the verdict was guilty and 84% of those where the verdict was not guilty, for an overall success
rate of 89%, slightly better than with the DFA.
SAS: Obtain the data file Harass90.dat from my StatData page and the program
DFA2.sas from my SAS Programs Page. Run the program. This program uses SAS to do
essentially the same analysis we just did with SPSS. Look at the output from PROC REG. It
did a multiple regression to predict group membership (1, 2) from the rating scales. Notice that
the SSmodel / SSerror = the eigenvalue from the DFA, and that the SSerror / SStotal = the Wilks 
from the DFA. The square root of the R2 equals the canonical correlation from the DFA. The
unstandardized discriminant function coefficients (raw canonical coefficients) are equal to the
standardized discriminant function coefficients (pooled within-class standardized canonical
coefficients) divided by the pooled (within-group) standard deviations.
Note also that the DFA’s discriminant function coefficients are a linear transformation of
the multiple regression b’s (multiply each by 4.19395 and you get the unstandardized
discriminant funtion coefficients). I do not know what determines the value of this constant, I
determined it empirically for this set of data.
More on Box’s M
Return to Wuensch’s Statistics Lessons Page
Copyright 2015 Karl L. Wuensch - All rights reserved.