discriminant function

advertisement
Chapter 8
Discriminant Analysis
8.1 Introduction
 Classification is an important issue in
multivariate analysis and data mining.
 Classification:

classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying
new data, i.e., predicts unknown or missing
values
Classification—A Two-Step Process
 Model construction: describing a set of predetermined classes

Each tuple/sample is assumed to belong to a predefined class, as determined
by the class label attribute

The set of tuples used for model construction is training set

The model is represented as classification rules, decision trees, or
mathematical formulae
 Prediction: for classifying future or unknown objects

Estimate accuracy of the model
 The known label of test sample is compared with the classified result from
the model
 Accuracy rate is the percentage of test set samples that are correctly
classified by the model
 Test set is independent of training set, otherwise over-fitting will occur

If the accuracy is acceptable, use the model to classify data tuples whose class
labels are not known
Classification Process :
Model Construction
Classification
Algorithms
Training
Data
NAME RANK
M ike
M ary
B ill
Jim
D ave
Anne
A ssistan t P ro f
A ssistan t P ro f
P ro fesso r
A sso ciate P ro f
A ssistan t P ro f
A sso ciate P ro f
YEARS TENURED
3
7
2
7
6
3
no
yes
yes
yes
no
no
Classifier
(Model)
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classification Process:
Use the Model in Prediction
Classifier
Testing
Data
Unseen Data
(Jeff, Professor, 4)
NAME
Tom
M erlisa
G eorge
Joseph
RANK
Y E A R S TE N U R E D
A ssistant P rof
2
no
A ssociate P rof
7
no
P rofessor
5
yes
A ssistant P rof
7
yes
Tenured?
Supervised vs.
Unsupervised Learning
 Supervised learning (classification)

Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations

New data is classified based on the training set
 Unsupervised learning (clustering)

The class labels of training data is unknown

Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data
Discrimination—
Introduction
Discrimination is a technique concerned with allocating new
observations to previously defined groups.
There are k samples from k distinct populations:
1
 x11
 x11p

G1 :  
  Gk
 x 1  x 1
n1 p
 n11
k 
 x11
 x1kp 

: 

 x k   x k 
nk p
 nk 1
One wants to find the so-called discriminant function and
related rule to identify the new observations.
Example 11.3 Bivariate case
Discriminant function and rule
Discriminant function: w  x   l'x
 x  G1 if w  x   a
Rule 
 x  G2 if w  x   a
Example 11.1: Riding mowers
Consider two groups in city: riding-mower owners
and those without riding mowers. In order to identify
the best sales prospects for an intensive sales
campaign, a riding-mower manufacturer is interested
in classifying families as prospective owners or nonowners on the basis of income and lot size.
Example 11.1: Riding mowers
2:
x2 :
x1 :
2
(Income in $1000s) (Lot size 1000 ft)
60
18.4
85.5
16.8
64.8
21.6
61.5
20.8
87
23.6
110.1
19.2
108
17.6
82.8
22.4
69
20
93
20.8
51
22
81
20
Nonowners
x1 :
(Income in $1000s)
75
52.8
64.8
43.2
84
49.2
59.4
66
47.4
33
51
63
x2:
2
(Lot size 1000 ft)
19.6
20.8
17.2
20.4
17.6
17.6
16
18.4
16.4
18.8
14
14.8
Example 11.1: Riding mowers
Classify as
G1
G2
True
G1
10
2
G2
2
10
8.2 Discriminant by Distance
Assume k=2 for simplicity
G1 : N p  μ 1 ,Σ1 , G2 : N p  μ 2  ,Σ2 
Discrimina nt function : w x   d 2  x,G1   d 2  x,G2 
 x  G1
Rule : 
 x  G2
if
if
w x   0
w x   0
8.2 Discriminant by Distance
Consider the Mahalanobis distance
d 2 x,G j    x  μ  j  ' Σ j 1  x  μ  j  , j  1,2.
when Σ1  Σ2  Σ

'Σ  x μ    x μ 'Σ  x μ 
1    

 2  x  μ  μ  'Σ μ   -μ   
2


w  x   x μ
1
1
-1
1
2
 2
-1
1
-1
2
 2
8.2 Discriminant by Distance
Let
1 1
μ  μ  μ 2  
2
c  Σ -1 μ 1  μ 2  
The discrimina nt function w x  can be
w x    x  μ 'Σ -1 μ 1  μ 2  
 c'  x  μ

8.2 Discriminant by Distance
When μ 1 , μ 2  , Σ are known, their estimators are
x j
Σ~
1 nj  j
  xi
n j i 1
1
 A1  A2 

n1  n2  2
Where
   xi j   x  j   xi j   x  j  '
nj
Aj
i 1
Example Univariate Case with equal variance
G1 : N 1 , 12 , G2 : N  2 , 22 
1
 x  G1
Rule : 
 x  G2
if
if
xa
xa
a
2
1
a  μ 1  μ 2 
2
Example Univariate Case with equal variance
G1 : N 1 , 12 , G2 : N  2 , 22 
a*
a*
 2 1   1 2

1   2
8.3 Fisher’s Discriminant Function
Idea: projection, ANOVA
8.3 Fisher’s Discriminant Function
Training samples
G1 : N p  μ 1 , Σ ,
x1 ,  , x n1

Gk : N p  μ k , Σ ,

k 
k 
x1 ,  , x nk
1
k 
8.3 Fisher’s Discriminant Function
Projection the data on a direction l  R p , the F-statistics
l'Bl k  1
Fl 
,
l'El n  k 
where
B   na  xa  x  xa  x '
k
a 1
E    x j  xa x j  x '
k na
a 1 j 1
a 
a 
8.3 Fisher’s Discriminant Function
To find l *  R p such that
Fl*  maxp Fl
lR
The solution of l * is the eigenvector associated with the largest
eigenvalue of
B  .E  0
Discriminant function: u x   l'x, where l  l 
(B) Two Populations
B  n1  x 1  x  x 1  x '  n2  x 2   x  x 2   x '
n1 x 1  n2 x 2 
x
n1  n2
Note
and
E  A1  A2
n1n2
 x 1  x 2   x 1  x 2  '
B
n1  n2
We have
There is only one non-zero eigenvalue of B  E  0 as
rank B   1.
(B) Two Populations
The associated eigenvector is
E 1  x 1  x 2  .
Discriminant function: u  x   x'E
 x  G1 if
Rule: 
 x  G2 if
where
1
u x  
u x  

1
1
 2
  c' x  x
2
x
1
x
 2
  c'x
when Σ1  Σ2

(B) Two Populations
When
Σ1  Σ 2 , 
where 
ˆ 12
is replaced by
1  ˆ c'x  2 

c
'
x
ˆ
1
  2
ˆ 1  ˆ 2
1

c'A1c
n1  1
1
1
1




1
2
 x  x '  A1  A2  A1  A1  A2   x 1  x 2  

n1  1

ˆ 22
1

c'A2 c
n2  1
1
1
1




1
2
 x  x '  A1  A2  A2  A1  A2   x 1  x 2  

n2  1
Example Inset Classification
No.
1
2
3
4
5
6
7
8
9
10
11
Note:
n.g.
c.g.
y
Table 2.1 Data of two species of insects
x1
x2
n. g.
c. g.
6.36
5.24
1
1
5.92
5.12
1
2
5.92
5.36
1
1
6.44
5.64
1
1
6.40
5.16
1
1
6.56
5.56
1
1
6.64
5.36
1
1
6.68
4.96
1
1
6.72
5.48
1
1
6.76
5.60
1
1
6.72
5.08
1
1
y
2.4713
2.3335
2.3663
2.5481
2.4714
2.5702
2.5650
2.5213
2.6034
2.6309
2.5488
No.
Table 2.1 Data of two species of insects
x1
x2
n. g.
c. g.
1
2
3
4
5
6
7
8
9
10
11
12
6.00
5.60
5.65
5.76
5.96
5.72
5.64
5.44
5.04
4.56
5.48
5.76
4.88
4.64
4.96
4.80
5.08
5.04
4.96
4.88
4.44
4.04
4.20
4.80
data x1 and x2 are the characteristics of insect (Hoel,1947)
means natural group (species),
the classified group,
the value of the discriminant function
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
y
2.3227
2.1796
2.2343
2.2456
2.3391
2.2674
2.2343
2.1682
1.9977
1.8106
2.0863
2.2456
Example Inset Classification
 6.4654 
 5.9878 
 2   5.5500 
x 
, x  
, x  

5.3236
4.7267
5
.
0122






 2.6765 1.2942 
 4.8097 3.1364 
E 
, B 

1.2942
1.7545
3.1364
2.0453




1
The eigenvalue of B  E  0 is 1.9187 and the associated
eigenvector is
 0.2759 
1
2 
1
E
x
x

.
 0.1367 
Example Inset Classification
The discriminant function is u x1 , x2   0.2759 x1  0.1367 x2
and the associated value of each observation is given in the
table. The cutting point is   2.3447.
classify as
G1
G2
Classification is
True
G1
10
1
G2
0
12

If we use   2.3831 ˆ1  0.0939,ˆ 2  0.1497  , we
have the same classification.
8.4 Bayes’ Discriminant Analysis
A. Idea
There are k populations G1, …, Gk in Rp.
A partition of Rp, R1, …, Rk , is determined based on a training
sample.
Rule: x  Gi if x falls into Ri
Loss:
c j | i  : x is from Gi , but x falls into Rj
The Probability of this misclassification P j | i   R pi  x dx ,
where pi  x  is the density of x  Gi .
j
8.4 Bayes’ Discriminant Analysis
Expected cost of misclassification is
ECM  R1 ,
k
k
i 1
j 1
, Rk    qi  c  j | i  p  j | i 
where q1, …, qk are prior probabilities.
We want to minimize ECM(R1, …, Rk ) w.r.t. R1, …, Rk .
B. Method
Theorem 6.4.1
Let
k
ht  x    qi pi  x  c  t | i 
i 1
i t
Then the optimal Rt’s are
Rt  x : ht  x   h j  x , j  t, t  1, , k .
Corollary 1
Take c  j | i    ij  1 if i  j and 0 if i  j .
Then
Proof:
Rt  x : qt pt  x   q j p j  x , j  t, t  1, , k .
k
ht  x    qi pi  x   qt pt  x 
i 1
 c  x   qt pt  x 
Corollary 2
In the case of k=2
h1  x   q2 p2  x  c12
h2  x   q1 p1  x  c21
we have
R1   x:q2 p2  x  c 1| 2   q1 p1  x  c  2 |1
R2   x:q2 p2  x  c  2 |1  q1 p1  x  c 1| 2 
Discriminant function: u  x   p1  x  p2  x 
 x  G1 if u  x   d
Rule: 
 x  G2 if u  x   d
q2 c 1| 2 
where d 
q1c  2 |1
Corollary 3
In the case of k=2 and
 N p  μ 1 ,Σ  if x  G1
x~
 2  ,Σ  if x  G

N
μ
2
 p
Then u  x   p1  x   expw x 
p2  x 
1 1




2


where w x   x   μ  μ 'Σ -1  μ 1  μ  2  
2


 x  G1
Rule : 
 x  G2
if
if
w x   ln d
w x   ln d
C. Example 11.3:
Detection of hemophilia A carriers
For the detection of hemophilia A carriers, to construct a
procedure for detecting potential hemophilia A carriers,
blood samples were assayed for two groups of women
and measurements on the two variables. The first group
of 30 women were selected from a population of women
who did not carry the hemophilia gene. This group was
called the normal group. The second group of 22 women
was selected from known hemophilia A carriers. This
group was called the obligatory carriers.
C. Example 11.3:
Detection of hemophilia a carriers
Variables:
log10 (AHF activity)
log10 (AHF-like antigen)
Populations:
population of women who did not carry
the hemophilia gene (n1=30)
population of women who are known
hemophilia A carriers (n2=45)
C. Example 11.3:
Detection of hemophilia a carriers
C. Example 11.3:
Detection of hemophilia a carriers
Data set
normal

log10(AHF activity)
log10(AHF-like antigen)
Obligatory
carrier

-0.0056 -0.1698 -0.3469 -0.0894 -0.1679 -0.0836 -0.1979 -0.0762 -0.1913 -0.1092
-0.5268 -0.0842 -0.0225 0.0084 -0.1827 0.1237 -0.4702 -0.1519 0.0006 -0.2015
-0.1932 0.1507 -0.1259 -0.1551 -0.1952 0.0291 -0.228 -0.0997 -0.1972 -0.0867
-0.1657 -0.1585 -0.1879 0.0064 0.0713 0.0106 -0.0005 0.0392 -0.2123 -0.119 -0.4773
0.0248 -0.058 0.0782 -0.1138 0.214 -0.3099 -0.0686 -0.1153 -0.0498 -0.2293 0.0933
-0.0669 -0.1232 -0.1007 0.0442 -0.171 -0.0733 -0.0607 -0.056
log10(AHF activity)
log10(AHF-like antigen)
-0.3478
-0.4719
-0.2447
-0.3351
-0.1878
-0.3618 -0.4986 -0.5015 -0.1326 -0.6911 -0.3608 -0.4535 -0.3479 -0.3539
-0.361 -0.3226 -0.4319 -0.2734 -0.5573 -0.3755 -0.495 -0.5107 -0.1652
-0.4232 -0.2375 -0.2205 -0.2154 -0.3447 -0.254 -0.3778 -0.4046 -0.0639
-0.0149 -0.0312 -0.174 -0.1416 -0.1508 -0.0964 -0.2642 -0.0234 -0.3352
-0.1744 -0.4055 -0.2444 -0.4784
0.1151 -0.2008 -0.086 -0.2984 0.0097 -0.339 0.1237 -0.1682 -0.1721 0.0722
-0.1079 -0.0399 0.167 -0.0687 -0.002 0.0548 -0.1865 -0.0153 -0.2483 0.2132
-0.0407 -0.0998 0.2876 0.0046 -0.0219 0.0097 -0.0573 -0.2682 -0.1162 0.1569
-0.1368 0.1539 0.14 -0.0776 0.1642 0.1137 0.0531 0.0867 0.0804 0.0875 0.251
0.1892 -0.2418 0.1614 0.0282
C. Example 11.3:
Detection of hemophilia a carriers
SAS output
C. Example 11.3:
Detection of hemophilia a carriers
C. Example 11.3:
Detection of hemophilia a carriers
C. Example 11.3:
Detection of hemophilia a carriers
Download