discriminant function

Chapter 8 Discriminant Analysis 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification:  classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data, i.e., predicts unknown or missing values Classification—A Two-Step Process  Model construction: describing a set of predetermined classes  Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute  The set of tuples used for model construction is training set  The model is represented as classification rules, decision trees, or mathematical formulae  Prediction: for classifying future or unknown objects  Estimate accuracy of the model  The known label of test sample is compared with the classified result from the model  Accuracy rate is the percentage of test set samples that are correctly classified by the model  Test set is independent of training set, otherwise over-fitting will occur  If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known Classification Process : Model Construction Classification Algorithms Training Data NAME RANK M ike M ary B ill Jim D ave Anne A ssistan t P ro f A ssistan t P ro f P ro fesso r A sso ciate P ro f A ssistan t P ro f A sso ciate P ro f YEARS TENURED 3 7 2 7 6 3 no yes yes yes no no Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classification Process: Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) NAME Tom M erlisa G eorge Joseph RANK Y E A R S TE N U R E D A ssistant P rof 2 no A ssociate P rof 7 no P rofessor 5 yes A ssistant P rof 7 yes Tenured? Supervised vs. Unsupervised Learning  Supervised learning (classification)  Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations  New data is classified based on the training set  Unsupervised learning (clustering)  The class labels of training data is unknown  Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data Discrimination— Introduction Discrimination is a technique concerned with allocating new observations to previously defined groups. There are k samples from k distinct populations: 1  x11  x11p  G1 :     Gk  x 1  x 1 n1 p  n11 k   x11  x1kp   :    x k   x k  nk p  nk 1 One wants to find the so-called discriminant function and related rule to identify the new observations. Example 11.3 Bivariate case Discriminant function and rule Discriminant function: w  x   l'x  x  G1 if w  x   a Rule   x  G2 if w  x   a Example 11.1: Riding mowers Consider two groups in city: riding-mower owners and those without riding mowers. In order to identify the best sales prospects for an intensive sales campaign, a riding-mower manufacturer is interested in classifying families as prospective owners or nonowners on the basis of income and lot size. Example 11.1: Riding mowers 2: x2 : x1 : 2 (Income in $1000s) (Lot size 1000 ft) 60 18.4 85.5 16.8 64.8 21.6 61.5 20.8 87 23.6 110.1 19.2 108 17.6 82.8 22.4 69 20 93 20.8 51 22 81 20 Nonowners x1 : (Income in $1000s) 75 52.8 64.8 43.2 84 49.2 59.4 66 47.4 33 51 63 x2: 2 (Lot size 1000 ft) 19.6 20.8 17.2 20.4 17.6 17.6 16 18.4 16.4 18.8 14 14.8 Example 11.1: Riding mowers Classify as G1 G2 True G1 10 2 G2 2 10 8.2 Discriminant by Distance Assume k=2 for simplicity G1 : N p  μ 1 ,Σ1 , G2 : N p  μ 2  ,Σ2  Discrimina nt function : w x   d 2  x,G1   d 2  x,G2   x  G1 Rule :   x  G2 if if w x   0 w x   0 8.2 Discriminant by Distance Consider the Mahalanobis distance d 2 x,G j    x  μ  j  ' Σ j 1  x  μ  j  , j  1,2. when Σ1  Σ2  Σ  'Σ  x μ    x μ 'Σ  x μ  1       2  x  μ  μ  'Σ μ   -μ    2   w  x   x μ 1 1 -1 1 2  2 -1 1 -1 2  2 8.2 Discriminant by Distance Let 1 1 μ  μ  μ 2   2 c  Σ -1 μ 1  μ 2   The discrimina nt function w x  can be w x    x  μ 'Σ -1 μ 1  μ 2    c'  x  μ  8.2 Discriminant by Distance When μ 1 , μ 2  , Σ are known, their estimators are x j Σ~ 1 nj  j   xi n j i 1 1  A1  A2   n1  n2  2 Where    xi j   x  j   xi j   x  j  ' nj Aj i 1 Example Univariate Case with equal variance G1 : N 1 , 12 , G2 : N  2 , 22  1  x  G1 Rule :   x  G2 if if xa xa a 2 1 a  μ 1  μ 2  2 Example Univariate Case with equal variance G1 : N 1 , 12 , G2 : N  2 , 22  a* a*  2 1   1 2  1   2 8.3 Fisher’s Discriminant Function Idea: projection, ANOVA 8.3 Fisher’s Discriminant Function Training samples G1 : N p  μ 1 , Σ , x1 ,  , x n1  Gk : N p  μ k , Σ ,  k  k  x1 ,  , x nk 1 k  8.3 Fisher’s Discriminant Function Projection the data on a direction l  R p , the F-statistics l'Bl k  1 Fl  , l'El n  k  where B   na  xa  x  xa  x ' k a 1 E    x j  xa x j  x ' k na a 1 j 1 a  a  8.3 Fisher’s Discriminant Function To find l *  R p such that Fl*  maxp Fl lR The solution of l * is the eigenvector associated with the largest eigenvalue of B  .E  0 Discriminant function: u x   l'x, where l  l  (B) Two Populations B  n1  x 1  x  x 1  x '  n2  x 2   x  x 2   x ' n1 x 1  n2 x 2  x n1  n2 Note and E  A1  A2 n1n2  x 1  x 2   x 1  x 2  ' B n1  n2 We have There is only one non-zero eigenvalue of B  E  0 as rank B   1. (B) Two Populations The associated eigenvector is E 1  x 1  x 2  . Discriminant function: u  x   x'E  x  G1 if Rule:   x  G2 if where 1 u x   u x    1 1  2   c' x  x 2 x 1 x  2   c'x when Σ1  Σ2  (B) Two Populations When Σ1  Σ 2 ,  where  ˆ 12 is replaced by 1  ˆ c'x  2   c ' x ˆ 1   2 ˆ 1  ˆ 2 1  c'A1c n1  1 1 1 1     1 2  x  x '  A1  A2  A1  A1  A2   x 1  x 2    n1  1  ˆ 22 1  c'A2 c n2  1 1 1 1     1 2  x  x '  A1  A2  A2  A1  A2   x 1  x 2    n2  1 Example Inset Classification No. 1 2 3 4 5 6 7 8 9 10 11 Note: n.g. c.g. y Table 2.1 Data of two species of insects x1 x2 n. g. c. g. 6.36 5.24 1 1 5.92 5.12 1 2 5.92 5.36 1 1 6.44 5.64 1 1 6.40 5.16 1 1 6.56 5.56 1 1 6.64 5.36 1 1 6.68 4.96 1 1 6.72 5.48 1 1 6.76 5.60 1 1 6.72 5.08 1 1 y 2.4713 2.3335 2.3663 2.5481 2.4714 2.5702 2.5650 2.5213 2.6034 2.6309 2.5488 No. Table 2.1 Data of two species of insects x1 x2 n. g. c. g. 1 2 3 4 5 6 7 8 9 10 11 12 6.00 5.60 5.65 5.76 5.96 5.72 5.64 5.44 5.04 4.56 5.48 5.76 4.88 4.64 4.96 4.80 5.08 5.04 4.96 4.88 4.44 4.04 4.20 4.80 data x1 and x2 are the characteristics of insect (Hoel,1947) means natural group (species), the classified group, the value of the discriminant function 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 y 2.3227 2.1796 2.2343 2.2456 2.3391 2.2674 2.2343 2.1682 1.9977 1.8106 2.0863 2.2456 Example Inset Classification  6.4654   5.9878   2   5.5500  x  , x   , x    5.3236 4.7267 5 . 0122        2.6765 1.2942   4.8097 3.1364  E  , B   1.2942 1.7545 3.1364 2.0453     1 The eigenvalue of B  E  0 is 1.9187 and the associated eigenvector is  0.2759  1 2  1 E x x  .  0.1367  Example Inset Classification The discriminant function is u x1 , x2   0.2759 x1  0.1367 x2 and the associated value of each observation is given in the table. The cutting point is   2.3447. classify as G1 G2 Classification is True G1 10 1 G2 0 12  If we use   2.3831 ˆ1  0.0939,ˆ 2  0.1497  , we have the same classification. 8.4 Bayes’ Discriminant Analysis A. Idea There are k populations G1, …, Gk in Rp. A partition of Rp, R1, …, Rk , is determined based on a training sample. Rule: x  Gi if x falls into Ri Loss: c j | i  : x is from Gi , but x falls into Rj The Probability of this misclassification P j | i   R pi  x dx , where pi  x  is the density of x  Gi . j 8.4 Bayes’ Discriminant Analysis Expected cost of misclassification is ECM  R1 , k k i 1 j 1 , Rk    qi  c  j | i  p  j | i  where q1, …, qk are prior probabilities. We want to minimize ECM(R1, …, Rk ) w.r.t. R1, …, Rk . B. Method Theorem 6.4.1 Let k ht  x    qi pi  x  c  t | i  i 1 i t Then the optimal Rt’s are Rt  x : ht  x   h j  x , j  t, t  1, , k . Corollary 1 Take c  j | i    ij  1 if i  j and 0 if i  j . Then Proof: Rt  x : qt pt  x   q j p j  x , j  t, t  1, , k . k ht  x    qi pi  x   qt pt  x  i 1  c  x   qt pt  x  Corollary 2 In the case of k=2 h1  x   q2 p2  x  c12 h2  x   q1 p1  x  c21 we have R1   x:q2 p2  x  c 1| 2   q1 p1  x  c  2 |1 R2   x:q2 p2  x  c  2 |1  q1 p1  x  c 1| 2  Discriminant function: u  x   p1  x  p2  x   x  G1 if u  x   d Rule:   x  G2 if u  x   d q2 c 1| 2  where d  q1c  2 |1 Corollary 3 In the case of k=2 and  N p  μ 1 ,Σ  if x  G1 x~  2  ,Σ  if x  G  N μ 2  p Then u  x   p1  x   expw x  p2  x  1 1     2   where w x   x   μ  μ 'Σ -1  μ 1  μ  2   2    x  G1 Rule :   x  G2 if if w x   ln d w x   ln d C. Example 11.3: Detection of hemophilia A carriers For the detection of hemophilia A carriers, to construct a procedure for detecting potential hemophilia A carriers, blood samples were assayed for two groups of women and measurements on the two variables. The first group of 30 women were selected from a population of women who did not carry the hemophilia gene. This group was called the normal group. The second group of 22 women was selected from known hemophilia A carriers. This group was called the obligatory carriers. C. Example 11.3: Detection of hemophilia a carriers Variables: log10 (AHF activity) log10 (AHF-like antigen) Populations: population of women who did not carry the hemophilia gene (n1=30) population of women who are known hemophilia A carriers (n2=45) C. Example 11.3: Detection of hemophilia a carriers C. Example 11.3: Detection of hemophilia a carriers Data set normal  log10(AHF activity) log10(AHF-like antigen) Obligatory carrier  -0.0056 -0.1698 -0.3469 -0.0894 -0.1679 -0.0836 -0.1979 -0.0762 -0.1913 -0.1092 -0.5268 -0.0842 -0.0225 0.0084 -0.1827 0.1237 -0.4702 -0.1519 0.0006 -0.2015 -0.1932 0.1507 -0.1259 -0.1551 -0.1952 0.0291 -0.228 -0.0997 -0.1972 -0.0867 -0.1657 -0.1585 -0.1879 0.0064 0.0713 0.0106 -0.0005 0.0392 -0.2123 -0.119 -0.4773 0.0248 -0.058 0.0782 -0.1138 0.214 -0.3099 -0.0686 -0.1153 -0.0498 -0.2293 0.0933 -0.0669 -0.1232 -0.1007 0.0442 -0.171 -0.0733 -0.0607 -0.056 log10(AHF activity) log10(AHF-like antigen) -0.3478 -0.4719 -0.2447 -0.3351 -0.1878 -0.3618 -0.4986 -0.5015 -0.1326 -0.6911 -0.3608 -0.4535 -0.3479 -0.3539 -0.361 -0.3226 -0.4319 -0.2734 -0.5573 -0.3755 -0.495 -0.5107 -0.1652 -0.4232 -0.2375 -0.2205 -0.2154 -0.3447 -0.254 -0.3778 -0.4046 -0.0639 -0.0149 -0.0312 -0.174 -0.1416 -0.1508 -0.0964 -0.2642 -0.0234 -0.3352 -0.1744 -0.4055 -0.2444 -0.4784 0.1151 -0.2008 -0.086 -0.2984 0.0097 -0.339 0.1237 -0.1682 -0.1721 0.0722 -0.1079 -0.0399 0.167 -0.0687 -0.002 0.0548 -0.1865 -0.0153 -0.2483 0.2132 -0.0407 -0.0998 0.2876 0.0046 -0.0219 0.0097 -0.0573 -0.2682 -0.1162 0.1569 -0.1368 0.1539 0.14 -0.0776 0.1642 0.1137 0.0531 0.0867 0.0804 0.0875 0.251 0.1892 -0.2418 0.1614 0.0282 C. Example 11.3: Detection of hemophilia a carriers SAS output C. Example 11.3: Detection of hemophilia a carriers C. Example 11.3: Detection of hemophilia a carriers C. Example 11.3: Detection of hemophilia a carriers

discriminant function

Related documents

Products

Support

discriminant function

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib