universal pattern recognition system based on fisher criterion

advertisement
VERSATILE PATTERN RECOGNITION
SYSTEM BASED ON FISHER CRITERION
Maciej Smiatacz, Witold Malina
Wydział Elektroniki, Telekomunikacji i Informatyki
Politechnika Gdańska
ul. G. Narutowicza 11/12, 80-952 Gdańsk
ABSTRACT
In this work we present a complete pattern recognition system that can be used for classification of any
digital images (bitmaps). The feature extraction algorithm that we implemented is universal so different
applications (e.g. character or face recognition) do not require any modifications in the system architecture. The system is entirely based on Fisher linear classifier.
INTRODUCTION
Discriminant analysis [1] is well known and widely used in many fields of pattern
recognition. Generally, it helps us to determine which variables discriminate between
two or more naturally occurring groups, so it can be treated as a feature selection technique. In practice, however, this approach is represented mostly by the Fisher criterion
that can be applied directly to pattern classification. The idea of Fisher classifier lies in
finding such a vector d that the patterns belonging to opposite classes would be optimally separated after projecting them onto d. The basic form of Fisher criterion is related to the two-class case and it can be expressed with the following formula [2]
dT Bd
F (d) = T
d Σd
(1)
where B – between-class scatter matrix, B = ∆∆T, ∆ = µ1 – µ2,
µi – mean vector of class ci, i = 1, 2,
Σ = P(c1) Σ1 + P(c2)Σ2,
Σi
– covariance matrix of class ci,
P(ci) – a priori probability of class ci.
The optimal Fisher discriminant vector dopt maximises the value of F. In order to find
it we have to calculate the first derivative of F and solve the equation F’(d) = 0. Finally,
we get the straightforward formula describing the solution
dopt = Σ-1∆
(2)
In order to perform the classification we have to project the conditional densities
p(y/ci) of each class onto dopt and find the point α where p (d Topt y/c1 ) = p (d Topt y/c 2 ) .
Then the decision rule for unknown pattern y becomes very simple:
y ∈ c1 ⇔ d Topt y ≤ α
y ∈ c2 ⇔ d Topt y > α
(3)
As we can see, Fisher criterion is easy to apply and provides linear solutions. This is
why it was employed in many pattern recognition systems (e.g. [3, 4]) and its improved
versions were proposed [5, 6]. Nevertheless, its obvious drawback is the necessity to
invert the covariance matrix Σ. Consequently, if every pattern consists of N elements (N
feature values) then we have to collect at least N+1 linearly independent training samples with the aim of making Σ non-singular.
In practical applications the value of N can be quite large, especially if we want to
use the intensity of each pixel in a digital image as the separate feature. In this case the
arrays of pixel values (the bitmaps) must be converted to vectors first. This implies concatenation of the columns in each array so that original N×N image becomes a vector
with N2 elements.
Treating each pixel intensity as a feature value seems to be an attractive idea because
it allows us to classify the bitmaps directly and eliminates the need for specialised feature extraction unit. This makes the recognition system versatile and its application depends only on the training set contents. Unfortunately, if the original image resolution is
128×128 pixels then each vector has 16384 elements and the covariance matrix dimensions are 16384×16384 which makes 268435456 elements. If we assume that elements
of the covariance matrix are represented by double precision (8-byte) values then the
total memory amount necessary to store the matrix is 2 gigabytes. It is still difficult to
operate efficiently on such large matrices and in practice it is impossible to invert them.
In order to overcome the problems mentioned above we proposed a two-stage Fisher
classifier [7]. It uses matrices as the input data structure and does not require the column
concatenation. As a result the covariance matrix has the same resolution as the original
image. In our previous publication [7] we presented the preliminary results but our experiments were limited to the two-class problems only. In the following sections, however, we shall describe the complete pattern recognition system based on the two-stage
Fisher classifier.
SYSTEM ARCHITECTURE
One of the simplest ways to build a multi-class system using a two-class algorithm is
to construct a sequential classifier that separates only one class from the others at a time.
The class that was chosen at the current stage is not taken into consideration in the following steps and the training process ends when the last two classes get separated. This
way we reduce a multi-class problem to the set of the two-class decisions. The concept
is simple but in addition to the classification method we have to define some measure
describing the quality of discrimination i.e. the separability of the classes. In the case of
Fisher criterion the quality of classification is expressed by means of the so-called
Fisher distance
DF(1, 2) =
where mi = dTµi
(m1 − m2 ) 2
σ 12 + σ 22
(4)
and σi2 = dTΣid.
Thanks to the above formula the implementation of sequential Fisher classifier is
fairly easy. The basic block diagram of our training procedure is shown on fig. 1.
Fig. 1. Block diagram of the training procedure in sequential pattern recognition system
The core of the system is encapsulated in the module that performs the selection of
the most distinctive class, i.e. the class that can be separated from the others with the
lowest error possible. Lets us assume that we have already separated n classes so that
there is L – n left (L is the total number of classes). In order to separate the next class we
must carry out the following steps:
1. choose the class index i from the set containing indices of classes that haven’t been
separated yet (Ln),
2. construct the new class cx that includes the patterns from all the classes indicated by
Ln except ci,
3. build the two-stage Fisher classifier to discriminate ci and cx,
4. calculate the Fisher distance DF(i,x) (4) for ci and cx,
5. repeat steps 1. to 4. using all the class indices contained in Ln,
6. select the class ci for which DF(i,x) is the highest.
We would like to point out that the classifier mentioned in the 3rd step is a two-stage
classifier. At the first stage it creates the mean matrices A (i ) and A ( x ) for both classes
and then calculates the optimal vector dM that ensures maximal distance between A (i )
and A ( x ) projected onto it. Thus the projection of training images A (ik ) onto dM can be
treated as a sort of “discriminant feature” extraction process
y (ki ) = ( A (ki ) ) T d M
(5)
(i )
where y k — “discriminant feature” vector describing k-th image of i-th class.
Having calculated the y (ik ) feature vectors we can use the standard form of Fisher criterion (1) to create the classifier and find the discriminant vector d.
The final result of the training process is a decision tree that serves as the base of the
classification algorithm. It is a binary tree with at least one leaf attached to each node
(fig. 2). The classification is performed by checking the decision rules (3) stored in the
subsequent nodes until a leaf is reached. Each step involves two projections: the first
one produces the “discriminant feature” vector y (5) and the second one is necessary to
calculate d Topt y (3). The decision tree always contains L-1 nodes so in the worst case we
have to make 2(L-1) projections. Because the dimensions of matrices and vectors involved in these projections are relatively small, in a typical case (where the number of
classes is reasonable) the classification takes very little time.
Fig. 2. An example of the decision tree produced by the sequential classifier
EXPERIMENTAL RESULTS
The system described above was implemented as a C++ Windows application. Our
experiments were carried out on a standard PC equipped with Intel Pentium IV 1.5GHz
processor but the code was not optimised for maximum performance. We tested our
sequential classifier using the well-known NIST database containing binary images of
handwritten digits. These images are normalised and the resolution is 32×32 pixels (fig.
3).
Fig. 3. Some of the training examples used in experiments
Our training set consisted of 10 classes (digits from 0 to 9) and each class was represented by 40 images. The testing set was created from 400 images not included in the
training set. Figure 4 illustrates the classifiers that we obtained for each node of the decision tree that was created as a result of the training procedure. The classes were separated in the following sequence: 6/-, 0/-, 4/-, 1/-, 7/-, 9/-, 3/-, 5/-, 2/8 (fig. 2).
In order to compare our two-stage approach with some standard method we run the
same test using the sequential classifier based on the typical Fisher algorithm (1). In this
case, however, we had to replace the covariance matrix Σ with the identity matrix I. It
was necessary because the 32×32 images formed the feature vectors containing 1024
elements so the resulting covariance matrix built of them was singular. If we decided to
use the formula (2) then we would have to collect at least 1024 training images for
every class, which is practically impossible.
We carried out two experiments using the data described above. In the second one
the original testing set was used as a training one and vice versa. The results are summarised in Table I. It is noticeable that our two-stage classifier outperforms the traditional
Fisher criterion in terms of speed. The recognition rates are similar and satisfactory for
both methods. Two examples of wrong decisions made by the two-stage algorithm are
presented on fig. 5.
a)
b)
c)
d)
Class “6”
Class “0”
f)
Class “4”
g)
Class “1”
h)
Class “9”
Class “3”
Class “5”
Class “2”
e)
Class “7”
i)
Fig. 4. The classifiers obtained for subsequent nodes of the decision tree (two-stage
method, 1st experiment)
Fig. 5. Two examples of mistakes made by the two-stage classifier
Obviously much more accurate results could be achieved if a better version of the
multi-class algorithm was used. For example a highly effective sequential classifier was
proposed in [8]. The aim of this work, however, was only to prove that the two-stage
Fisher classifier can be successfully applied to solving realistic multi-class recognition
problems. It is evident that the system presented here could be further optimised to produce better results.
Table I. Experimental results
Two-stage classifier
Recognition rates
1st
experiment
2nd
experiment
Standard Fisher criterion (1) with identity matrix
Time
Recognition rates
Time
Training Testing Average Training Testing Training Testing Average Training Testing
set
set
set
set
87%
73%
80%
84%
392s
88%
77%
82.5%
77%
80.5%
1041s
5s
84%
81%
50s
82.5%
CONCLUSION
The results listed in Table I indicate that the training of the new classifier is about
three times faster in comparison with the standard Fisher criterion and the recognition
time is reduced from 50s to 5s. After projection onto dopt the patterns are closely packed
and their distribution is almost Gaussian (fig. 4) so the use of Fisher criterion is fully
justified.
Our experiments showed that the two-stage Fisher classifier could be successfully
used as an efficient base for a versatile pattern recognition system. We are aware of the
drawbacks of the approach presented in this paper so in our future work we will concentrate on developing more effective implementation of the proposed classifier.
LITERATURE
[1] G. J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition, John Wiley & Sons, Inc.,
1992.
[2] J. Sammon, An Optimal Discriminant Plane, IEEE Transactions on Computers, vol. C-19, pp. 826 829, 1970.
[3] Ch. Liu, H. Wechsler, A Shape- and Texture-Based Enhanced Fisher Classifier for Face Recognition,
IEEE Transactons on Image Processing, vol. 10, no. 4, 2001.
[4] P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7,
1997.
[5] W. Malina, On an Extended Fisher Criterion for Feature Selection, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 3, no. 5, pp. 611-614, 1981.
[6] T. Okada, S. Tomita, An Extended Fisher Criterion for Feature Extraction – Malina’s Method and Its
Problems, Electronics and Communications in Japan, vol. 67-A, no. 6, pp. 10-17, 1984.
[7] M. Smiatacz, W. Malina, Modifying the Input Data Structure for Fisher Classifier, 2nd Conference on
Computer Recognition Systems (KOSYR´2001), pp. 363-367, 2001.
[8] A. Kołakowska, W. Malina, Application of Fisher Sequential Classifier to Digit Recognition, Proceedings of Sixth International Conference on Pattern Recognition and Information Processing
(PRIP´2001), pp. 213-217, 2001.
Download