Fisher’s Linear & Quadratic Discriminant Functions: k: number of multivariate normal (sub) populations n: number of features (measurements, observations) for each individual = number of elements in the vector X of observations. X: the vector of observations on an individual to be classified. D2 : The squared Mahalanobis distance F: The Fisher Linear Discriminant function pj : The proportion (prior probability) of our big population that is from subpopulation j. j: The multivariate normal distribution for subpopulation j, mean j variance j. Goal: We want Pr{ pop i | X} where X is a multivariate vector of measurements on an individual and we have i=1,2,…,k populations from which to choose. We will assume that the X’s from each population have a multivariate normal distribution j. (1) Bayes’ Rule { relates Pr{A|B} to Pr{B|A} } Pr{ pop i | X} = Pr{ pop i and X}/ Pr{X} = [pi Pr{ X | pop i }]/ [j=1,k pj Pr{ X | pop j }] =[ pi i ]/ [j=1,k pj j ] (2) Simplest case, all populations equally likely (p) and have same covariance matrix . pi i = p (2)-n/2||-1/2 exp( -0.5 Dj2 )= p (2)-n/2||-1/2 exp( -0.5 (X-j)’-1(X-j) = [ p (2)-n/2||-1/2 exp( -0.5 X’-1X)] exp(-2 Fj) where (a) Dj2 = squared Mahalanobis distance = (X-j)’-1(X-j X’-1X - 2[-0.5j’j + j’-1X] = X’-1X - 2[aj + bj’X] (b) Fj = -0.5j’j + j’-1X = aj + bj’X = “Fisher Linear Discriminant Function” (3) Note that anything in common to the numerator and denominator of Pr{ pop i | X} = =[ pi i ]/ [j=1,k pj j ] will cancel out so in this simplest case, [ p (2)-n/2||-1/2 exp( -0.5 X’-1X)] is eliminated leaving the simpler expression Pr{ pop i | X}= exp( Fi)/ [j=1,kexp(Fj)] so, compute exp(Fi) for i=1,…,k and divide each by the sum to get probabilities for each population. (4) The Fisher Linear Discriminant function is seen to be everything in ln(pi i) that changes with i. When pi and i change, only the 2 term is constant so we have: Fi = [-0.5i’iI +ln(pi) -0.5ln|iX’i-1X + i’i-1X Note that ln(pi) is omitted if p is constant and -0.5ln|iX’i-1X is omitted if there is a common variance-covariance matrix. When X’i-1X appears, for obvious reasons this is called Fisher’s Quadratic Discriminant Function. Only the intercept [-0.5i’ii +ln(pi) -0.5ln|i is affected by unequal p.