Discriminant Analysis Discriminant analysis is used to determine which variables discriminate between two or more naturally occurring groups. Computationally, discriminant function analysis is very similar to analysis of variance (ANOVA). 1 Sunday, 12 April 2015 10:07 PM Discriminant Analysis For example, an educational researcher may want to investigate which variables discriminate between high school graduates who decide (1) to go to college, (2) to attend a trade or professional school, or (3) to seek no further training or education. For that purpose the researcher could collect data on numerous variables prior to students' graduation. After graduation, most students will naturally fall into one of the three categories. Discriminant Analysis could then be used to determine which variable(s) are the best predictors of students' subsequent educational choice. 2 Discriminant Analysis For example, a medical researcher may record different variables relating to patients' backgrounds in order to learn which variables best predict whether a patient is likely to recover completely (group 1), partially (group 2), or not at all (group 3). A biologist could record different characteristics of similar types (groups) of flowers, and then perform a discriminant function analysis to determine the set of characteristics that allows for the best discrimination between the types. 3 Discriminant Analysis The data consist of five measurements on each of 32 skulls found in the southwestern and eastern districts of Tibet. 1. Greatest length of skull (measure 1) 2. Greatest horizontal breadth of skull (measure 2) 3. Height of skull (measure 3) 4. Upper face length (measure 4) 5. Face breadth between outermost points of cheekbones (measure 5) There are also place and grouping variables. 4 Discriminant Analysis The data can be divided into two groups. The first comprises skulls 1 to 17 found in graves in Sikkim and the neighbouring area of Tibet (Type A skulls). The remaining 15 skulls (Type B skulls) were picked up on a battlefield in the Lhasa district and are believed to be those of native soldiers from the eastern province of Khams. These skulls were of particular interest since it was thought at the time that Tibetans from Khams might be survivors of a particular human type, unrelated to the Mongolian and Indian types that surrounded them. 5 Discriminant Analysis There are two questions that might be of interest for these data: Do the five measurements discriminate between the two assumed groups of skulls and can they be used to produce a useful rule for classifying other skulls that might become available? Taking the 32 skulls together, are there any natural groupings in the data and, if so, do they correspond to the groups assumed? 6 Discriminant Analysis Classification is an important component of virtually al scientific research. Statistical techniques concerned with classification are essentially of two types. The first (cluster analysis) aims to uncover groups of observations from initially unclassified data. The second (discriminant analysis) works with data that is already classified into groups to derive rules for classifying new (and as yet unclassified) individuals on the basis of their observed variable values. 7 Discriminant Analysis Classification is an important component of virtually al scientific research. Statistical techniques concerned with classification are essentially of two types. The first (cluster analysis) aims to uncover groups of observations from initially unclassified data. The second (discriminant analysis) works with data that is already classified into groups to derive rules for classifying new (and as yet unclassified) individuals on the basis of their observed variable values. 8 Discriminant Analysis Initially it is wise to take a look at your raw data. 9 Discriminant Analysis Select matrix scatter Use Define to select. 10 Discriminant Analysis Select matrix variables and markers. Note that greatest length of skull is above the list shown. Use OK to accept. 11 Discriminant Analysis 12 Discriminant Analysis While this diagram only allows us to asses the group separation in two dimensions, it seems to suggest that face breadth between outer- most points of cheek bones (meas5), greatest length of skull (meas1), and upper face length (meas4) provide the greatest discrimination between the two skull types. 13 Discriminant Analysis We shall now use Fisher’s linear discriminant function to derive a classification rule for assigning skulls to one of the two predefined groups on the basis of the five measurements available. 14 Discriminant Analysis Now proceed to complete the analysis. 15 Discriminant Analysis As before use the secondary screens to select the grouping variable (place) and use Define Range. 16 Discriminant Analysis Select the independents, use OK to run. 17 Discriminant Analysis From the statistics button select Now proceed to complete the analysis. 18 Discriminant Analysis The resulting descriptive output displays, means and standard deviations of each of the five measurements for each type of skull and overall are given in the Group Statistics table. Group Statistics Place where skulls were found Sikkem or Tibet Lhasa Total Greatest length of skull Greatest horizontal breadth of skull Height of skull Upper face length Face breadth between outermost points of cheek bones Greatest length of skull Greatest horizontal breadth of skull Height of skull Upper face length Face breadth between outermost points of cheek bones Greatest length of skull Greatest horizontal breadth of skull Height of skull Upper face length Face breadth between outermost points of cheek bones Valid N (listwise) Unweighted Weighted 17 17.000 Mean 174.824 Std. Deviation 6.7475 139.353 7.6030 17 17.000 132.000 69.824 6.0078 4.5756 17 17 17.000 17.000 130.353 8.1370 17 17.000 185.733 8.6269 15 15.000 138.733 6.1117 15 15.000 134.767 76.467 6.0263 3.9118 15 15 15.000 15.000 137.500 4.2384 15 15.000 179.938 9.3651 32 32.000 139.063 6.8412 32 32.000 133.297 72.938 6.0826 5.3908 32 32 32.000 32.000 133.703 7.4443 32 32.000 19 Discriminant Analysis The within-group covariance matrices shown in the Covariance Matrices table suggest that the sample values differ to some extent, see Box’s test for equality of covariances (see Log Determinants and Test Results). Covariance Matrices Place where skulls were found Sikkem or Tibet Lhasa Greatest length of skull Greatest horizontal breadth of skull Height of skull Upper face length Face breadth between outermost points of cheek bones Greatest length of skull Greatest horizontal breadth of skull Height of skull Upper face length Face breadth between outermost points of cheek bones Greatest length of skull 45.529 Greatest horizontal breadth of skull 25.222 Height of skull 12.391 Upper face length 22.154 Face breadth between outermost points of cheek bones 27.972 25.222 57.805 11.875 7.519 48.055 12.391 22.154 11.875 7.519 36.094 -.313 -.313 20.936 1.406 16.769 27.972 48.055 1.406 16.769 66.211 74.424 -9.523 22.737 17.794 11.125 -9.523 37.352 -11.263 .705 9.464 22.737 17.794 -11.263 .705 36.317 10.724 10.724 15.302 7.196 8.661 11.125 9.464 7.196 8.661 17.964 20 Discriminant Analysis The within-group covariance matrices shown in the Covariance Matrices table suggest that the sample values differ to some extent, but according to Box’s test for equality of covariances (tables Log Determinants and Test Results) these differences are not statistically significant (F(15,3490) = 1.2, p = 0.25). Log Determinants Place where skulls were found Sikkem or Tibet Lhasa Pooled within-groups Rank 5 5 5 Test Results Log Determinant 16.164 15.773 16.727 The ranks and natural logarithms of determinants printed are those of the group covariance matrices. Box's M F Approx. df1 df2 Sig. 22.371 1.218 15 3489.901 .249 Tests null hypothesis of equal population covariance matrices. 21 Discriminant Analysis It appears that the equality of covariance matrices assumption needed for Fisher’s linear discriminant approach to be strictly correct is valid here. In practice, Box’s test is not of great use since even if it suggests a departure for the equality hypothesis, the linear discriminant may still be preferable over a quadratic function. Here we shall simply assume normality for our data relying on the robustness of Fisher’s approach to deal with any minor departure from the assumption. 22 Discriminant Analysis The resulting discriminant analysis shows the eigenvalue (here 0.93) represents the ratio of the between-group sums of squares to the within-group sum of squares of the discriminant scores. It is this criterion that is maximized in discriminant function analysis. Eigenvalues Function 1 Eigenvalue .930a % of Variance 100.0 Cumulative % 100.0 Canonical Correlation .694 a. First 1 canonical discriminant functions were used in the analysis. 23 Discriminant Analysis The canonical correlation is simply the Pearson correlation between the discriminant function scores and group membership coded as 0 and 1. For the skull data, the canonical correlation value is 0.694 so that 0.694 × 100 = 48% of the variance in the discriminant function scores can be explained by group differences. Eigenvalues Function 1 Eigenvalue .930a % of Variance 100.0 Cumulative % 100.0 Canonical Correlation .694 a. First 1 canonical discriminant functions were used in the analysis. 24 Discriminant Analysis Wilk’s Lambda provides a test for assessing the null hypothesis that in the population the vectors of means of the five measurements are the same in the two groups. The lambda coefficient is defined as the proportion of the total variance in the discriminant scores not explained by differences among the groups, here 51.8%. The formal test confirms that the sets of five mean skull measurements differ significantly between the two sites 2 ( (5) = 18.1, p = 0.003). If the equality of mean vectors hypothesis had been accepted, there would be little point in carrying out a linear discriminant function analysis. The 2 W ilks' Lambda Test of Function(s) 1 Wilks' Lambda .518 Chi-square 18.083 df 5 Sig. .003 25 Discriminant Analysis Next we come to the Classification Function Coefficients. This table is displayed as a result of checking Fisher’s in the Statistics sub-dialogue box. Classification Function Coefficients Greatest length of skull Greatest horizontal breadth of skull Height of skull Upper face length Face breadth between outermost points of cheek bones (Constant) Place where skulls were found Sikkem or Tibet Lhasa 1.468 1.558 2.361 2.205 2.752 .775 2.747 .952 .195 .372 -514.956 -545.419 Fisher's linear discriminant functions 26 Discriminant Analysis It can be used to find Fisher’s linear discrimimant function as defined by simply subtracting the coefficients given for each variable in each group giving the following result: Sikkern or Tibet Lhasa Difference 1.468 1.558 -0.090 2.361 2.205 0.156 2.752 2.747 0.005 Upper face length (measure 4) 0.775 0.952 -0.177 Face breadth between outermost points of cheekbones (measure 5) 0.195 0.372 -0.177 Greatest length of skull (measure 1) Greatest horizontal breadth of skull (measure 2) Height of skull (measure 3) Z = 0 09 meas1 + 0.156 meas2+ 0.005 meas3 - 0 177.meas4 - 0 177.meas5 27 Discriminant Analysis Z = 0 09 meas1 + 0.156 meas2+ 0.005 meas3 - 0 177.meas4 - 0 177.meas5 The difference between the constant coefficients provides the sample mean of the discriminant function scores z 30.463 28 Discriminant Analysis The coefficients defining Fisher’s linear discriminant function in the equation are proportional to the unstandardised coefficients given in the “Canonical Discriminant Function Coefficients” table which is produced when Unstandardised is checked in the Statistics sub-dialogue box. Canonical Discriminant Function Coefficients Greatest length of skull Greatest horizontal breadth of skull Height of skull Upper face length Face breadth between outermost points of cheek bones (Constant) Function 1 .048 -.083 -.003 .095 .095 -16.222 Unstandardized coefficients 29 Discriminant Analysis These scores can be compared with the average of their group means (shown in the Functions at Group Centroids table) to allocate skulls into groups. Here the threshold against which a skull’s discriminant score is evaluated is 0 0585= ½ (0 877 + 0 994) Functions at Group Centroids Place where skulls were found Sikkem or Tibet Lhasa Function 1 -.877 .994 Unstandardized canonical discriminant functions evaluated at group means Thus new skulls with discriminant scores above 0.0585 would be assigned to the Lhasa site (type B); otherwise, they would be classified as type A. 30 Discriminant Analysis When variables are measured on different scales, the magnitude of an unstandardised coefficient provides little indication of the relative contribution of the variable to the overall discrimination. The “Standardized Canonical Discriminant Function Coefficients” listed attempt to overcome this problem by rescaling of the variables to unit standard deviation. Standardized Canonical Discriminant Function Coefficients Greatest length of skull Greatest horizontal breadth of skull Height of skull Upper face length Face breadth between outermost points of cheek bones Function 1 .367 -.578 -.017 .405 .627 31 Discriminant Analysis For our data, such standardisation is not necessary since all skull measurements were in millimetres. Standardization should, however, not matter much since the within-group standard deviations were similar across different skull measures. According to the standardized coefficients, skull height (meas3) seems to contribute little to discriminating between the two types of skulls. Standardized Canonical Discriminant Function Coefficients Greatest length of skull Greatest horizontal breadth of skull Height of skull Upper face length Face breadth between outermost points of cheek bones Function 1 .367 -.578 -.017 .405 .627 32 Discriminant Analysis A question of some importance about a discriminant function is: how well does it perform? One possible method of evaluating performance is to apply the derived classification rule to the data set and calculate the misclassification rate. 33 Discriminant Analysis Repeat using the following classification. Now proceed to complete the analysis. 34 Discriminant Analysis This is known as the re-substitution estimate and the corresponding results are shown in the Original part of the Classification Results table. According to this estimate, 81.3% of skulls can be correctly classified as type A or type B on the basis of the discriminant rule. Classification Resultsb,c Original Count % Cross-validated a Count % Place where skulls were found Sikkem or Tibet Lhasa Sikkem or Tibet Lhasa Sikkem or Tibet Lhasa Sikkem or Tibet Lhasa Predicted Group Membership Sikkem or Tibet Lhasa 14 3 3 12 82.4 17.6 20.0 80.0 12 5 6 9 70.6 29.4 40.0 60.0 Total 17 15 100.0 100.0 17 15 100.0 100.0 a. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case. b. 81.3% of original grouped cases correctly classified. c. 65.6% of cross-validated grouped cases correctly classified. 35 Discriminant Analysis However, estimating misclassification rates in this way is known to be overly optimistic and several alternatives for estimating misclassification rates in discriminant analysis have been suggested. One of the most commonly used of these alternatives is the so called leaving one out method, in which the discriminant function is first derived from only n – 1 sample members, and then used to classify the observation left out. The procedure is repeated n times, each time omitting a different observation. 36 Discriminant Analysis The Cross-validated part of the Classification Results table shows the results from applying this procedure. The correct classification rate now drops to 65.6%, a considerably lower success rate than suggested by the simple re-substitution rule. Classification Resultsb,c Original Count % Cross-validated a Count % Place where skulls were found Sikkem or Tibet Lhasa Sikkem or Tibet Lhasa Sikkem or Tibet Lhasa Sikkem or Tibet Lhasa Predicted Group Membership Sikkem or Tibet Lhasa 14 3 3 12 82.4 17.6 20.0 80.0 12 5 6 9 70.6 29.4 40.0 60.0 Total 17 15 100.0 100.0 17 15 100.0 100.0 a. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case. b. 81.3% of original grouped cases correctly classified. c. 65.6% of cross-validated grouped cases correctly classified. 37 Discriminant Analysis We now turn to applying cluster analysis to the skull data. Here the prior classification of the skulls will be ignored and the data simply “explored” to see if there is any evidence of interesting “natural” groupings of the skulls and if there is, whether these groups correspond in anyway with Morant’s classification. Here we will use two hierarchical agglomerative clustering procedures, complete and average linkage clustering and then k-means clustering. 38 Discriminant Analysis Select Analyze > Classify > Hierarchical Cluster 39 Discriminant Analysis In the usual way select the variables of interest 40 Discriminant Analysis Select the plots desired 41 Discriminant Analysis Select the desired method Now proceed to complete the analysis. 42 Discriminant Analysis The complete linkage clustering output shows which skulls or clusters are combined at each stage of the cluster procedure. First, skull 8 is joined with skull 13 since the Euclidean distance between these two skulls is smaller than the distance between any other pair of skulls. The distance is shown in the column labelled “Coefficients”. 43 Discriminant Analysis Second, skull 15 is joined with skull 17 and so on. Agglomeration Schedule Stage 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Cluster Combined Cluster 1 Cluster 2 8 13 15 17 9 23 8 19 24 28 21 22 16 29 7 8 2 3 27 30 5 9 18 32 5 7 2 15 6 16 14 25 24 31 11 18 1 20 4 10 14 21 6 11 Coefficients 3.041 5.385 5.701 5.979 6.819 6.910 7.211 8.703 8.874 9.247 9.579 9.874 10.700 11.522 12.104 12.339 13.528 13.537 13.802 14.062 15.588 16.302 Stage Cluster First Appears Cluster 1 Cluster 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 4 0 0 0 0 0 3 0 0 11 8 9 2 0 7 0 0 5 0 0 12 0 0 0 0 16 6 15 18 Next Stage 4 14 11 8 17 21 15 13 14 23 13 18 24 28 22 21 23 22 26 28 25 24 44 Discriminant Analysis The dendrogram is simpler to interpret. 45 46 Discriminant Analysis The dendrogram may, on occasions, also be useful in deciding the number of clusters in a data set with a sudden increase in the size of the difference in adjacent steps taken as an informal indication of the appropriate number of clusters to consider. For the dendrogram, a fairly large jump occurs between stages 29 and 30 (indicating a three- group solution) and an even bigger one between this penultimate and the ultimate fusion of groups (a two-group solution). 47 Discriminant Analysis For an alternate approach use Now proceed to produce the plot 48 Discriminant Analysis The initial steps agree with the complete linkage solution, but eventually the trees diverge with the average linkage dendrogram successively adding small clusters to one increasingly large cluster. For the average linkage dendrogram it is not clear where to cut the dendrogram to give a specific number of groups. 49 50 Discriminant Analysis Since we believe there are two groups a final cluster analysis, employing this information, may be attempted. 51 Discriminant Analysis The variable selection and number of clusters are shown. 52 Discriminant Analysis The resulting cluster output shows the Initial Cluster Centre table displays the starting values used by the algorithm. Initial Cluster Centers Cluster Greatest length of skull Greatest horizontal breadth of skull Height of skull Upper face length Face breadth between outermost points of cheek bones 1 200.0 2 167.0 139.5 130.0 143.5 82.5 125.5 69.5 146.0 119.5 53 Discriminant Analysis The Iteration History table indicates that the algorithm has converged. Iteration Historya Iteration 1 2 Change in Cluster Centers 1 2 16.626 16.262 .000 .000 a. Convergence achieved due to no or small change in cluster centers. The maximum absolute coordinate change for any center is .000. The current iteration is 2. The minimum distance between initial centers is 48.729. 54 Discriminant Analysis The Final Cluster Centres tables describe the final cluster solution. Final Cluster Centers Cluster Greatest length of skull Greatest horizontal breadth of skull Height of skull Upper face length Face breadth between outermost points of cheek bones 1 188.4 2 174.1 141.3 137.6 135.8 77.6 131.6 69.7 138.5 130.4 55 Discriminant Analysis The Number of Cases in each Cluster tables describe the final cluster solution. Number of Cases in each Cluster Cluster Valid Missing 1 2 13.000 19.000 32.000 .000 56 Discriminant Analysis How does the k-means two-group solution compare with the original classification of the skulls into types A and B? We can investigate this by first using the Save button on the k-Means Cluster Analysis dialogue box to save cluster membership for each skull in the Data View spreadsheet. 57 Discriminant Analysis The new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type (variable place). The display shows the resulting table; the k-means clusters largely agree with the skull types as originally suggested by Morant, with cluster 1 consisting primarily of Type B skulls (those from Lhasa) and cluster 2 containing mostly skulls of Type A (from Sikkim and the neighbouring area of Tibet). Only six skulls are wrongly placed. 58 Discriminant Analysis The new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type. 59 Discriminant Analysis The new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type. 60 Discriminant Analysis The new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type. Assumed type of skull Cluster Number of Case A B Count Count 1 2 11 2 15 4 61 Discriminant Analysis The k-means clusters largely agree with the skull types as originally suggested, with cluster 1 consisting primarily of Type B skulls (those from Lhasa) and cluster 2 containing mostly skulls of Type A (from Sikkim and the neighbouring area of Tibet). Only six skulls are wrongly placed. 62