FisherL_Q.doc

advertisement
Fisher’s Linear & Quadratic Discriminant Functions:
k: number of multivariate normal (sub) populations
n: number of features (measurements, observations) for each individual = number of
elements in the vector X of observations.
X: the vector of observations on an individual to be classified.
D2 : The squared Mahalanobis distance
F: The Fisher Linear Discriminant function
pj : The proportion (prior probability) of our big population that is from subpopulation j.
j: The multivariate normal distribution for subpopulation j, mean j variance j.
Goal: We want Pr{ pop i | X} where X is a multivariate vector of measurements on an
individual and we have i=1,2,…,k populations from which to choose. We will assume
that the X’s from each population have a multivariate normal distribution j.
(1) Bayes’ Rule { relates Pr{A|B} to Pr{B|A} }
Pr{ pop i | X} = Pr{ pop i and X}/ Pr{X} = [pi Pr{ X | pop i }]/ [j=1,k pj Pr{ X | pop j }]
=[ pi i ]/ [j=1,k pj j ]
(2) Simplest case, all populations equally likely (p) and have same covariance matrix .
pi i = p (2)-n/2||-1/2 exp( -0.5 Dj2 )= p (2)-n/2||-1/2 exp( -0.5 (X-j)’-1(X-j) =
[ p (2)-n/2||-1/2 exp( -0.5 X’-1X)] exp(-2 Fj)
where

(a) Dj2 = squared Mahalanobis distance = (X-j)’-1(X-j
X’-1X - 2[-0.5j’j + j’-1X] = X’-1X - 2[aj + bj’X]
(b) Fj = -0.5j’j + j’-1X = aj + bj’X = “Fisher Linear
Discriminant Function”
(3) Note that anything in common to the numerator and denominator of Pr{ pop i | X} =
=[ pi i ]/ [j=1,k pj j ] will cancel out so in this simplest case,
[ p (2)-n/2||-1/2 exp( -0.5 X’-1X)] is eliminated leaving the simpler expression
Pr{ pop i | X}= exp( Fi)/ [j=1,kexp(Fj)] so, compute exp(Fi) for i=1,…,k and divide each
by the sum to get probabilities for each population.
(4) The Fisher Linear Discriminant function is seen to be everything in ln(pi i) that
changes with i. When pi and i change, only the 2 term is constant so we have:
Fi = [-0.5i’iI +ln(pi) -0.5ln|iX’i-1X + i’i-1X
Note that ln(pi) is omitted if p is constant and -0.5ln|iX’i-1X is omitted if there
is a common variance-covariance matrix. When X’i-1X appears, for obvious reasons
this is called Fisher’s Quadratic Discriminant Function. Only the intercept
[-0.5i’ii +ln(pi) -0.5ln|i is affected by unequal p.
Download