Lecture-10 - CVIP Lab - University of Louisville

advertisement

PATTERN RECOGNITION

Lecture 16 – Linear Discriminant

Analysis

Professor Aly A. Farag

Computer Vision and Image Processing Laboratory

University of Louisville

URL: www.cvip.uofl.edu

; E-mail: aly.farag@louisville.edu

Planned for ECE 620 and ECE 655 - Summer 2011

TA/Grader: Melih Aslan; CVIP Lab Rm 6, msaslan01@lousiville.edu

Introduction

• In chapter 3, the underlying probability densities were known (or given)

• The training sample was used to estimate the parameters of these probability densities (ML,

MAP estimations)

• In this chapter, we only know the proper forms for the discriminant functions: similar to nonparametric techniques

• They may not be optimal, but they are very simple to use

• They provide us with linear classifiers

1

Linear discriminant functions and decisions surfaces

• Definition

It is a function that is a linear combination of the components of x g(x) = w t x + w

0 where w is the weight vector and w

0

(1) the bias

• A two-category classifier with a discriminant function of the form (1) uses the following rule:

Decide

Decide

If g(x) = 0

1

if g(x) > 0 and

1 if w t x > -w

0

2 if g(x) < 0 and

2 otherwise

x is assigned to either class

2

3

– The equation g(x) = 0 defines the decision surface that separates points assigned to the category

1 from points assigned to the category

2

– When g(x) is linear, the decision surface is a hyperplane

– Algebraic measure of the distance from x to the hyperplane (interesting result!)

4

5

x

 x p

 r .

w

(since w is colinear with x x w p and w w

1 ) sin ce g(x)

0 and w t

.

w

 w

2 therefore r

 g ( x ) w in particular d(0, H)

 w

0 w

– In conclusion, a linear discriminant function divides the feature space by a hyperplane decision surface

– The orientation of the surface is determined by the normal vector w and the location of the surface is determined by the bias

6

– The multi-category case

• We define c linear discriminant functions g i

( x )

 w i t x

 w i 0 i

1,..., c and assign x to

 i if g i

(x) > g classification is undefined j

(x)

 j

i; in case of ties, the

• In this case, the classifier is a “linear machine”

• A linear machine divides the feature space into c decision

• regions, with g i region

R i

(x) being the largest discriminant if x is in the

For a two contiguous regions

R i and

R j

; the boundary that separates them is a portion of hyperplane H ij defined by:

• w i

– w j g i

(x) = g j

(x)

(w i d ( x

– w j

, H

) and ij

)

 t x + (w g w i i

 i0 g w

– w j0

) = 0 j j 7

8

– It is easy to show that the decision regions for a linear machine are convex, this restriction limits the flexibility and accuracy of the classifier

9

Class Exercises

• Ex. 13 p.159

• Ex. 3 p.201

• Write a C/C++/Java program that uses a k-nearest neighbor method to classify input patterns. Use the table on p.209 as your training sample.

Experiment the program with the following data:

– k = 3 x

1

= (0.33, 0.58, - 4.8) x

2

= (0.27, 1.0, - 2.68) x

3

= (- 0.44, 2.8, 6.20)

– Do the same thing with k = 11

– Compare the classification results between k = 3 and k = 11

(use the most dominant class voting scheme amongst the k classes)

10

Generalized Linear Discriminant

Functions

• Decision boundaries which separate between classes may not always be linear

• The complexity of the boundaries may sometimes request the use of highly non-linear surfaces

• A popular approach to generalize the concept of linear decision functions is to consider a generalized decision function as: g(x) = w

1 f

1

(x) + w

2 f

2

(x) + … + w where f i

(x), 1 x

R n (Euclidean Space)

N f

N

(x) + w

N+1

(1)

 i

N are scalar functions of the pattern x,

11

• Introducing f n+1

(x) = 1 we get:

g ( x )

N i

1

1 w i f i

( x )

 w

T

.

where w

(w

1

, w

2

,..., w

N

, w

N

1

)

T and

(f

1

( x ), f

2

( x ),..., f

N

( x ), f

N

1

( x ))

T

• This latter representation of g(x) implies that any decision function defined by equation (1) can be treated as linear in the (N + 1) dimensional space (N + 1 > n)

g(x) maintains its non-linearity characteristics in

R

n

12

• The most commonly used generalized decision function is g(x) for which f i

(x) (1

 i

N) are polynomials g ( x )

( )

T x T: is the vector transpose form

Where is a new weight vector, which can be calculated from the original w and the original linear f i

(x), 1

 i

N

• Quadratic decision functions for a 2-dimensional feature space g ( x )

 w

1 x

1

2  w

2 x

1 x

2

 w

3 x

2

2  w

4 x

1

 w

5 x

2

 w

6 here : w

(w

1

, w

2

,..., w

6

)

T and

(x

1

2

, x

1 x

2

, x

2

2

, x

1

, x

2

, 1 )

T

13

• For patterns x

R n , the most general quadratic decision function is given by: g ( x )

 i n 

1 w ii x i

2  n i

1  

  

1 j n i 1 w ij x i x j

 i n 

1 w i x i

 w n

1

(2)

The number of terms at the right-hand side is: l

N

1

 n

 n ( n

1 )

 n

1

2

( n

1 )( n

2 )

2

This is the total number of weights which are the free parameters of the problem

– If for example n = 3, the vector is 10-dimensional

– If for example n = 10, the vector is 65-dimensional

14

• In the case of polynomial decision functions of order m, a typical f i

(x) is given by: f i

( x )

 x i

1 e

1 x i

2 e

2 where 1

 i

1

, i

2

...

x e m i m

,..., i m

 n and e i

, 1

 i

 m is 0 or 1.

– It is a polynomial with a degree between 0 and m. To avoid repetitions, we request i

1

 i

2

 i m g m

( x )

 i

1 n n   

1 i

2

 i

1

...

i m

 n i m

1 w i

1 i

2

...

i m x i

1 x i

2

...

x i m

 g m

1

( x )

(where g 0 (x) = w n+1

) is the most general polynomial decision function of order m

15

Example 1: Let n = 3 and m = 2 then: g

2

( x )

 i

1

3 

1 i

2

3 

 i

1 w i

1 i

2 x i

1 x i

2

 w

1 x

1

 w

2 x

2

 w

3 x

3

 w

4

w

11 x

1

2

 w

1 x

1

 w

12 x

1 x

2 w

2 x

2

 w

3 w

13 x

1 x

3 x

3

 w

4

 w

22 x

2

2  w

23 x

2 x

3

 w

33 x

3

2

Example 2: Let n = 2 and m = 3 then: g

3

( x )

 i

1

2 

1 i

2

2 

 i

1 i

3

2 

 i

2 w i

1 i

2 i

3 x i

1 x i

2 x i

3

 g

2

( x )

w

111 x

1

3  w

112 x

1

2 x

2

 w

122 x

1 x

2

2  w

222 x

2

3  g

2

( x ) where g

2

( x )

 i

1

2 

1 i

2

2 

 i

1 w i

1 i

2 x i

1 x i

2

 g

1

( x )

w

11 x

1

2  w

12 x

1 x

2

 w

22 x

2

2  w

1 x

1

 w

2 x

2

 w

3

16

– The commonly used quadratic decision function can be represented as the general n- dimensional quadratic surface: g(x) = x T Ax + x T b +c where the matrix A = (a ij

), the vector b = (b

1

, b and c, depends on the weights w ii

, w ij

, w i

2

, …, b n

) T of equation (2)

– If A is positive definite then the decision function is a hyperellipsoid with axes in the directions of the eigenvectors of A

• In particular: if A = I n

(Identity), the decision function is simply the n-dimensional hypersphere

17

• If A is negative definite, the decision function describes a hyperhyperboloid

• In conclusion: it is only the matrix A which determines the shape and characteristics of the decision function

18

Problem: Consider a 3 dimensional space and cubic polynomial decision functions

1.

How many terms are needed to represent a decision function if only cubic and linear functions are assumed

2.

Present the general 4 th pattern space order polynomial decision function for a 2 dimensional

3.

Let R 3 be the original pattern space and let the decision function associated with the pattern classes

1 and

2 be: g ( x )

2 x

1

2  x

2

3

 x

2 x

3

4 x

1

2 x

2

1 for which g(x) > 0 if x

 

1 and g(x) < 0 if x

 

2 a) Rewrite g(x) as g(x) = x T Ax + x T b + c b) Determine the class of each of the following pattern vectors:

(1,1,1), (1,10,0), (0,1/2,0)

19

• Positive Definite Matrices

1. A square matrix A is positive definite if x T Ax>0 for all nonzero column vectors x.

2. It is negative definite if x T Ax < 0 for all nonzero x.

3. It is positive semi-definite if x T Ax

0.

4. And negative semi-definite if x T Ax

0 for all x.

These definitions are hard to check directly and you might as well forget them for all practical purposes.

20

More useful in practice are the following properties, which hold when the matrix A is symmetric and which are easier to check.

The ith principal minor of A is the matrix A i formed by the first i rows and columns of A. So, the first principal minor of A is the matrix A i the matrix:

= (a

11

), the second principal minor is

A

2

 a

11 a

12 a

21 a

22

 , and so on.

21

– The matrix A is positive definite if all its principal minors A

1

, A

2 determinants

, …, A n have strictly positive

– If these determinants are non-zero and alternate in signs, starting with det(A

1

)<0, then the matrix A is negative definite

– If the determinants are all non-negative, then the matrix is positive semi-definite

– If the determinant alternate in signs, starting with det(A

1

)

0, then the matrix is negative semi-definite

22

To fix ideas, consider a 2x2 symmetric matrix:

A

 a

11 a

12 a

21 a

22

 .

 It is positive definite if: a) det(A

1

) = a

11

> 0 b) det(A

2

) = a

11 a

22

– a

12 a

12

> 0

 It is negative definite if: a) det(A

1

) = a

11

< 0 b) det(A

2

) = a

11 a

22

– a

12 a

12

> 0

 It is positive semi-definite if: a) det(A

1

) = a

11

0 b) det(A

2

) = a

11 a

22

– a

12 a

12

0

 And it is negative semi-definite if: a) det(A

1

) = a

11 b) det(A

2

) = a

11

0 a

22

– a

12 a

12

0.

23

Exercise 1:

Check whether the following matrices are positive definite, negative definite, positive semi-definite, negative semidefinite or none of the above.

( a ) A



2 1

1 4



( b ) A



2 4

4

8



( c ) A



2

2

2

4



( d ) A



2 4

4 3



24

Solutions of Exercise 1:

• A

1

A

2

= 2 >0

= 8 – 1 = 7 >0

A is positive definite

• A

1

A

2

= -2

= (-2 x –8) –16 = 0

A is negative semi-positive

• A

1

A

2

= - 2

= 8 – 4 = 4 >0

A is negative definite

• A

1

A

2

= 2 >0

= 6 – 16 = -10 <0

A is none of the above

25

Exercise 2:

Let

A



2 1

1 4



1.

Compute the decision boundary assigned to the matrix A (g(x) = x T Ax + x T b + c) in the case where b T = (1 , 2) and c = - 3

2.

Solve det(A-

I) = 0 and find the shape and the characteristics of the decision boundary separating two classes

1 and

2

3.

Classify the following points:

 x T = (0 , - 1)

 x T = (1 , 1)

26

Solution of Exercise 2:

1.

2.

g(x)

(x

1

, x

2

)

2 1

1 4



 x

1 x

2



( x

1

, x

2

)

1

2



3

(2x

1

2x

2x

1

2

1

2

 x

2

, x

1

4 x

2

)

 x

1 x

2



 x

1

2 x

2

 x

1 x

2

 x

1 x

2

4 x

2

2 

2 x

1 x

2

4 x

2

2

 x

1

 x

1

2 x

2

2 x

2

3

3

3

For

1

3

2 using



2 -

1

1 4 -



 x

1 x

2



0 , we obtain :

 (-1 x

-

( 1

2

) x

1

2

) x x

2

0

0

(

1

2 ) x

 x

0

This latter equation is a straight line colinear to the vector:

1 2

V

1

( 1 , 1

2 )

T

27

For

2

3

2 using



2 -

1

1 4 -



 x x

2

1



0 , we obtain :

 ( x

1

2

(

1

1

) x

1

2

 x

) x

2

2

0

0

( 2

1 ) x

1

 x

2

0

This latter equation is a straight line colinear to the vector:

V

2

( 1 , 1

2 )

T

The ellipsis decision boundary has two axes, which are respectively colinear to the vectors V

1 and V

2

3.

X = (0 , -1) T

 g(0 , -1) = -1 < 0

 x

 

2

X = (1 , 1) T

 g(1 , 1) = 8 > 0

 x

 

1

28

Download