
Discriminant Analysis (DA)
1. Basic Concept
a) Dependent Variable: Categorical (nominal). If the dependent variable is ordinal or
metric, convert it to mutually exclusive and exhaustive categories.
 If the number of categories is large, one may choose the polar extremes
approach, where only extreme categories are analyzed, and the middle ones
excluded from the analysis.
 The number of observations in each category (the group size) should be at
least 20.
 If the group sizes vary significantly, select randomly observations from the
larger groups to make the resulting samples comparable to the smaller groups.
Two-group DA - the dependent variable has two categories, e.g. Male = 1,
Female = 2
Multiple Discriminant Analysis (MDA) – the dependent variable has more
than two categories
b) Independent Variables: Metric (required ratio of observations to the number of
independent variables: at least 20; the minimum ratio: 5). This ratio applies to all
variables initially included in the analysis (even if some would later be eliminated
by the stepwise procedure)
2. Randomly split (e.g., 50-50, or 60-40, or 75-25) the sample into two parts (use a
proportionately stratified sampling procedure):
a) Analysis Sample: for estimation of the discriminant function
b) Validation (Holdout) Sample: for validating the discriminant function.
 Ex. Split 60-40: Total sample: 80 males and 50 females Analysis Sample:
48 males and 30 females; Holdout Sample: 32 males and 20 females.
3. Assumptions of DA
a) No outliers. Verify their presence with both the SRE (Studentized Residuals) and
the Mahalanobis distance. Remove the outliers from further analysis.
b) Linear relationships between the dependent variable and the independent
variables. Check the linearity with the procedures explained in the previous
sessions. Correct any nonlinear relationships with specific variable
c) No multicollinearity among the independent variables. Check the Pooled WithinGroups Correlation Matrix. Analyze the collinearity diagnostics.
d) Equal covariance matrices for the groups as defined by the dependent variable.
Use Box’s M test. If Sig. > 0.05, cannot reject Ho: The population covariance
matrices are equal. Note: In cases when some of the assumptions of DA are not
clearly met (e.g. the normality is dubious), a researcher is allowed to use a
threshold value less than 0.05, e.g. 0.03.
e) The independent variables are normally distributed. Use appropriate tests
explained in the previous sessions.
4. Apply DA to the Analysis Sample
a) Direct (Simultaneous) Method – when the discrimination is to be based on ALL
independent variables
b) Stepwise (Elimination) DA – when only a subset of the most discriminatory
independent variables is to be included in the discriminant function
5. Example: Discriminant Analysis (DA)
 Convert file File4ab.xls to File4ab.sav (the first 30 rows are Analysis Sample, and
the last 12 rows are Holdout Sample)
 TransformRecodeInto Different VariablesPaste the variable “no” as
Numeric Variable into Output Variable (“no1”)Old and New Values: Range 1
thru 30 1; Range 31 thru 42 0; ContinueChange
 AnalyzeClassifyDiscriminantGrouping Variable (visit)Define Range
(Minimum 1; Maximum 2)Independents (income, travel, vacation, hsize,
age)Selection Variable: no1 = 1 Statistics (check all boxes)Classify (Limit
cases to first 30)Save (check all boxes)Method (Enter idependents
 Interpretation of the results based on the Analysis Sample: n = 30 (going from the
top of the computer printout)
Group Means and Standard Deviations:
 Income (Large difference in means; large standard deviations in each
 Age (Small difference in means; large standard deviations in each
 Travel, vacation, hsize (small difference in means; small standard
deviations in each group)
 Test of Equality of Group Means: Wilks’ Lambda
 Income:
Sig. = 0.000
 Travel:
Sig. = 0.143
 Vacation:
Sig. = 0.021
 Hsize:
Sig. = 0.001
 Age:
Sig. = 0.257
 Income, Vacation, and Hsize show significant (p = 0.05)
univariate differences between the two groups (Group 1:
Those who visited the resort during the last 2 years; Group
2: Those who not)
 There is no difference in resorts visits based on Attitude
toward travel and Age.
 It is therefore obvious that Income, Vacation, and Hsize
may best discriminate between the two groups of visitors. If
you are interested in the efficiency of only these three
variables in discriminating between the two groupsUse
the Stepwise procedure. If you are interested in the
efficiency of each of the five independent variablesUse
the Direct procedure (which we are going to follow below)
Pooled within-groups correlation matrix: low correlations  lack of
multicollinearity among the independent variables (rule of thumb: if a
correlation coefficient is < 0.30)
Box’s M Test of Equality of Covariance Matrices
Sig. = 0.141 > 0.05: Cannot reject Ho: The population covariance
matrices are equal
Summary of Canonical Discriminant Functions
 Canonical Correlation = 0.801(0.801)2 = 64.1% of the variance in
the dependent variable (Resort Visit) can be accounted for by the
model (all the five independent variables).
 The Wilks’ Lambda = 0.359 (which is equivalent to Chi-square =
26.130 with 5 d.f.) is significant at the 0.000 level. This means that the
discriminant function computed in this procedure is statistically
significant at the 0.000 level. Only then, one can proceed to interpret
the results.
Standarized Canonical Discriminant Function Coefficients
Note: the signs (+ or -) indicate a positive or a negative relationship with the dependent
 The discriminant function (based on standarized discriminant
Z = 0.743Income + 0.096Travel + 0.233Vacation + 0.469Hsize + 0.209Age
Structure Matrix (Discriminant Loadings – order from highest to lowest by
the absolute size of the loading, the sign + or – indicates only a positive or
a negative relationship with the dependent variable)
The discriminant function (based on discriminant loadings)
Z = 0.822Income + 0.541Hsize + 0.346Vacation + 0.213Travel + 0.164Age
(Unstandarized) Canonical Discriminant Function Coefficients
The discriminant function (based on unstandarized discriminant
Z = -7.975 + 0.085Income + 0.050Travel + 0.120Vacation + 0.427Hsize + 0.025 Age
Which discriminant function to use and when?
 For interpretation purposes, use discriminant loadings. The
standarized discriminant coefficients can also be used,
although in the literature they are less preferred than the
loadings. Any variable exhibiting a loading of more than
+0.30 or less than –0.30 is considered a substantive
discriminator (i.e., Income, Hsize, and Vacation)
To calculate the discriminant Z scores for the classification
purposes, use the unstandarized discriminant coefficients.
(viii) Functions at Group Centroids: The value of the discriminant function
(with the unstandarized coefficients) at the group means.
Ex. Gc1 = -7.975 + 0.085*60.520 + 0.05*5.4 + 0.120*5.8 +
0.427*4.333 + 0.025*53.733 = 1.291
So, the group centroid for the visitors to the resort (Group 1) is +1.291
The group centroid for non-visitors (Group 2) is –1.291 (they do not
have to be equal in absolute terms)
The optimum cutting score is based on the Group Centroids:
For two groups of equal size: Gc = (Gc1 + Gc2)/2 = (1.291 –
1.291)/2 = 0  Thus, if Zi > 0  assign case i to Group1; if Zi < 0
 assign case i to Group 2 (see the additional columns dis_1 and
dis1_1 saved by the DA in the SPSS Input file:
For example, for Case number 1: dis1_1 = -0.17214 was
calculated as follows (based on the unstandarized discriminant
coefficients) dis1_1 = -7.975476 + 0.0847671*50.2 +
0.04964455*5 + … + 0.0245438*43 = -0.17214 < 0  Case 1
is assigned to Group 2, which is a mistake, because we know
from the sample that Case 1 belongs to Group 1(visitors).
There are altogether 3* such mistakes made, i.e. when a Case
from Group 1 has been assigned to Group 2. However, there
were 0* such mistakes made when assigning members from
Group 2 – all of them were correctly assigned to Group 2.
Classification Results (Original)
Group 1
Group 2
Group 1
Group 2
For two groups of different sizes n1 and n2:
 Zcs = (n1*Gc1 + n2*Gc2)/(n1 + n2)
Classification Function Coefficients (Fisher’s Linear Discriminant
Functions) – can also be used for classification purposes.
Z1 = -57.532 + 0.678Income + 1.509Travel + 0.938Vacation + 3.322Hsize + 0.832Age
Z2 = -36.936 + 0.459Income + 1.381Travel + 0.628Vacation + 2.218Hsize + 0.768Age
Ex. Consider again Case 1.
Calculate: Z1(Case 1) = -57.532 + 0.678*50.2 + .. + 0.832*43 = 37.295
Calculate: Z2(Case 1) = -36.936 + 0.459*50.2 + … + 0.768*43 = 37.7128
Because 37.7128 > 37.295, assign Case 1 to Group 2
Classification Results
Based on the Analysis Sample (called in SPSS: Original)
 The Hit Ratio = % of correctly classified cases = (12 + 15)/30 =
Based on “leave-one-out” principle (called in SPSS: Cross-validated)
 The Hit Ratio = % of correctly classified cases = (11 + 13)/30 =
Based on the Holdout Sample (n = 12) (called in SPSS: Cases Not
Selected – Original)
 The Hit Ratio = % of correctly classified cases = (4 + 6)/12 =
In either case, compare the Hit Ratio with the Chance Ratio:
 If the group sizes are equal: Chance Ratio = 1/(number of
 In our example, Chance Ratio = ½ = 0.5. (The Hit Ratio
should be 1.25 times greater than the Chance Ratio in
order for the Validity of the DA to be satisfactory). The
lowest of the three Hit Ratios = 80% > 1.25*50% =
62.5% The validity of our DA is satisfactory.
If the group sizes are different, two Chance Ratios are
 Maximum Chance Criterion = The percentage of the
total sample represented by the largest of the groups
Ex. Group 1 = 65, Group 2 = 25, Group 3 = 10 
The MCC = 0.65 = 65%
Proportional Chance Criterion = p12 + p22 + .. + pk2
 Ex. The PCC = 0.652 + 0.252 + 0.12 = 0.495 =
If the Hit Ratio > 1.25*max(MCC, PCC)  The
validity of DA is satisfactory
Another statistical test for the discriminatory power of the
classification matrix:
Press’s Q statistic = [N – (Ncorrect*c)]2/N(c-1)
Where: c = number of groups
N = sample size
Ncorrect = number of observations correctly
Ex. c = 2, N = 30, Ncorrect = 24 (based on the
cross-validated procedure)  Press’s Q = [30 –
24*2]2/30*(2-1) = 10.8.
The critical level at 0.05 is Q = 6.63  Q =
10.8>Qcritical, hence the classification matrix
can be deemed significantly (p = 0.05)
statistically better than chance.