8.FRiS-GRADl

advertisement
ATTRIBUTE SELECTION THROUGH DESCISION RULES
CONSTRACTION
(algorithm FRiS-GRAD) 1
N. Zagoruiko2, I. Borisova2, V. Dyubanov2, O. Kutnenko2
2
zag@math.nsc.ru**,
Sobolev Institute of Mathematics, SB RAS
biamia@mail.ru, vladimir.dyubanov@gmail.com, olga@math.nsc.ru
This paper presents a new algorithm FRiS-GRAD of simultaneous informative
attributes selection and decision rule construction. This algorithm is based on Function
of Rival Similarity (FRiS-function) measuring similarity of objects with standards of
rival patterns. Standards or representatives are the objects of training dataset which are
the most similar to all other objects of their patterns. The quality of attribute
subsystems is estimated by average value of FRiS-function through all training objects.
Results of algorithm FRiS-GRAD using on real physic and genetic tasks prove its
usefulness and efficiency.
Introduction
Main ideas of using function of rival
similarity (FRiS-function) for pattern
recognition algorithms construction were
presented on a conference PRIA-8 and
published in [1]. In this paper we presented
results of our last researching concerning
problems of attribute selection and suitability
of selected attribute systems estimating.
There are three complex questions what need
to be answer through attribute selection:
1. How should search through the large
number of variants of attribute subsystems be
organized?
2. How should the quality (informativeness)
of attribute subsystem be calculated?
3. Is the subsystem found on training dataset
suitable for recognition of control objects, or
it is accidental and not fitted for further use?
We have answered all these questions with
help of Function of Rival Similarity. Briefly
remind its definition.
Human perception of similarity is of relative
nature and depending on the peculiarities of a
competitive environment. To estimate
measure of similarity of object Z with rival
patterns A and B in the scale of relations we
will use the following values:
FA/B=(rA-rB)/(rA+rB) - similarity Z with A,
FB/A=(rA-rB)/(rA+rB) - similarity Z with B.
Here rA and rB – distances from Z to patterns
A and B, correspondently.
1
Работа поддержана РФФИ, грант № 08-01-00040
The algorithms used in algorithm FRiSGRAD
Algorithm FRiS-Stolp [1] is used for decision
rules building. As a result of it’s work subset
of objects-standards (stolps) for every pattern
in dataset is selected. This subset is a compact
description of whole dataset by small number
of typical objects and can be used for control
objects classifying (recognition). This
algorithm is supposed to select the minimal
number of stolps which protect all training
samples from incorrect classification during
Cross-Validation.
Average value of rival similarity Fs of all
objects of the training sample not selected as
stolps is used as a measure of training quality.
At the same time this value estimates
informativeness of attribute system where
algorithm FRiS-Stolp was run.
To classify new object Z, two stolps of rival
patterns, which are nearest to it, are found.
Object Z classified as an object of the pattern
rival similarity F with which stolp is
maximum.
To select most informative attribute
subsystem we use main ideas of basic greedy
approaches (forward and backward searches).
Forward selection (algorithm Addition) tries
to increase attribute subsystem quality Q as
much as possible for each inclusion of
attributes,
and
backward
elimination
(algorithm Deletion) tries to achieve this for
each deletion of attributes.
In algorithm AdDel next combination of these
two approaches is used: at first n1 informative
attributes are selected by method Add. Then
n2 worst of them (n2< n1)) are eliminated by
method Del. Number of attributes in selected
subset after that two steps is equal to (n2- n1).
Such consecution of actions (algorithms Add
and Del) repeats until quality of attributes in
selected is maximum.
In algorithm GRAD [2] we add and eliminate
“granules” (sets consisting of several
attributes) instead of single attributes.
Granules are formed from attributes with high
individual informativeness.
Algorithm FRiS-GRAD
To find the best subsystem of attributes we
use procedure of directed search, offered in
algorithm GRAD. On each step some variant
of attribute subsystem forms and then
algorithm FRIS-Stolp is started to construct
set of stolps and to calculate quality of
training Fs in the subsystem. Value Fs is used
further as a measure of informativeness of this
subsystem.
As a result algorithm FRiS-GRAD selects
most informative attribute subspace and gives
us the rule to classify new objects in this
subspace.
The output of the FRiS-GRAD contains
descriptions of required number of
informative subsystems with their values of
FRiS-quality Fs. Every subsystem i with
quality Fs(i) provide us it’s own point of view
on the training dataset. Because of it results of
a new object Z classifying can change from
subsystem to subsystem and are characterized
by different values of FRiS-function. To make
collective decision we estimate values of rival
similarity Fj(i) of recognized object Z with
pattern j in informative subspace i. Z
classifies as an object of pattern Sj with
maximum value Fs(i)*Fj(i).
The same attribute can contain in different
granules forming selected informative
subsystem. We assume what such attribute is
more important in comparing the attributes
containing in single granule. To take this fact
into account we assign to each informative
attribute its weight proportionate to the
1
Работа поддержана РФФИ, грант № 08-01-00040
number of granules in selected subsystem
included this attribute. So algorithm FRiSGRAD determines optimal number of
attributes in informative subsystem, selects
required number of most informative
subsystems, estimates significance of each
attribute (calculates weights) and allows
making collective decision for pattern
recognition task.
Informativeness and suitability
In the majority of existing methods, share U
of objects of training samples correctly
recognized by the kNN rule, estimated by the
method Cross-Validation, is used as an
attribute subsystem quality.
Another criterion which presumably allows
select informative subsystems of attributes is
based on Fisher's idea to estimate
informativeness
through
the
ratio
Q | 1      where μ1, μ2 are
sample means of patterns, δ1, δ2 - their
variances.
Next method is described in the family of of
the training sample algorithms entitled
RELIEF [4]. For each object a value
W(a)=(r2-r1)/(rmax-rmin) is calculated. Here
r1(a) is a distances from a to the nearest
"one's own" pattern, r2(a) to the nearest
"competitor" pattern. Average value W is
used as a measure of informativeness of
attribute system.
To estimate FRiS-informativeness of some
subspace of attributes we can use simplify
version of FRiS-criterion Fs calculated under
the assumption that all objects of the training
dataset are the stolps of the patterns. This
criterion F1s is used in FRiS-GRAD on a stage
of granules forming to speed up rate of the
algorithm work.
In [1] the experiment, where these four
criteria (U, Q, W, and F1s) were compared, is
described. It was shown that criterion F1s
allowed us to estimate the informativeness of
attributes more objectively then others.
Informativeness estimations got with the most
popular criterion U appeared to be too
optimistic.
If the number of attributes N is larger than the
number of objects M in dataset the probability
to select irrelevant attributes because of
random successful combination of values is
get high. A.N.Kolmogorov paid attention to
this problem in 1933 [5]. So we should be
able to answer the question: whether the
subsystem found on training dataset is
suitable for recognition of control objects, or
it is accidental and not fitted for further use?
We offer a method to estimate "nonrandomness" of selected subsystem, realized
in algorithm FRiS-Test.
1) On sample of M training objects we carry
out procedure Cross-Validation V of times:
on each step on M(V-1)/V to casually chosen
objects it is found 10 feature subsets and on
them we recognize the others M/V objects.
As a result we estimate training quality average value of FRiS-function Fs. share of
correctly recognized objects P and similarity
Ft of test objects to the nearest stolps. For
suitable attribute subsystems values Fs and Ft
are close to each other. Difference between
them is designated by value G=|1-Fs/Ft|.
2) We randomly mix values of each attribute
in dataset, and breaks all the dependencies
between describing and target attributes. For
such table we repeat sequence of procedures
from first point, and during Cross-Validation
fix values Fs*, P*, Ft* and G*=|1- Fs*/ Ft*|.
Our experiments are shown what probability
to find subsystem with high values Fs* and
Ft* at the same moment for random dataset is
miserable. So to check out some attribute
subsystem we use next rule: value Fs for
suitable attribute system is higher than Fs*
and G>>G*.
Real tasks decision
We used our algorithm FRiS-GRAD for
solving real physic and genetic tasks
Analyzed dataset in physic task consists of
specters
of fine-dyspersated substances
microparticles, obtained with x-ray analyzer.
Each specter is characterized by values of its
activity (reflection amplitude) in 981 spectral
bands. According to their chemical
composition all analyzed substances are
divided into two patterns A and B. Specters
of the same substance microparticles can
differ from each other because of
heterogeneity of the substance and
experiments conditions. Training and test
datasets consisted of 55 objects of both
classes.
1
Работа поддержана РФФИ, грант № 08-01-00040
We tested algorithm FRiS-GRAD on a
problem of small dimension subset of
attributes
(spectral
bands)
selection
simultaneously
with
decision
rule
construction. Also we used this task for the
best algorithm settings finding.
Recognition quality in whole attribute space
was high enough: P=105/110, Fs=0,376,
Ft=0,281 G=0,338. These results are much
better than results on random dataset: P*=(4565)/110, Fs*=0,298, Ft*=0,043 и G*=5,93.
By algorithm FRiS-GRAD running we
selected 10 most informative attribute
subsystems, consisting of from 2 tо 5
attributes. All 7 attributes forming these 10
systems were from 368 to 374 spectral bands.
Value
P=107/110
increased.
Other
characteristics of training were the following:
Fs=0,752, Ft=0,588 и G=0,279.
In the genetic task analyzed dataset consists
of a matrix of gene expression vectors
obtained from DNA micro-arrays [6] for a
number of patients with two different types of
leukemia (ALL and AML). Initial number of
genes is N=7129. Training set consists of 38
samples (27 ALL and 11AML). Test set has
34 samples (20 ALL and 14 AML).
This task was interesting for us because
results of it’s solving by different researchers
were published and we could compare
effectiveness of our algorithm with
competitors. In the work [6] two attributes
(genes X95735 and HG1612) are shown
which allow recognizing all test objects
without any mistakes. But there is no
explanations how to select these two genes by
only training dataset. The best two-dimension
subsystem, correctly found in this work by
method SVM in combination with algorithm
RFE (sort of backward elimination), has a
misrecognition rate on the test data 12%.
Another attribute subsystems (4 and 128
attributes) allowing recognizing 31 objects
and 33 objects correspondently are presented
in this work but no instructions are given how
to select the best one among them without
using test dataset. It program took 3 hour of
algorithm working
Our results were the following. Recognition
quality in whole attribute space: P=23/34
(67,6%), Fs=0,332, Ft=0,114 and G=1,912.
For random dataset results are similar enough
P*=(16-25)/34 (47,0% - 73,5%), Fs*=0,310,
Ft*= 0,018 и G*=16,22. It means what
recognition on all attributes nonrandom, but
relative closeness of FRiS- values shows it’s
weakness.
Algorithm
FRiS-GRAD
selected
10
informative attribute subsystems. Dimension
of selected subsystems changed from 2 to
4.(14 different attributes in all subsystems).
Number of correctly recognized test objects
varied from 30 to 32. Collective decision is
P=33/34 (97,1%), Fs=0,526, Ft=0,548 и G =
0,040.
If we take into account hundreds informative
subsystems, selected while we tested settings
for our algorithm, number of informative
genes was 58 (from 7129). Learning on this
subset of attributes allowed selecting 10
informative attribute subsystems by algorithm
FRiS-GRAD.
Dimension
of
selected
subsystems changed from 2 to 4. In this
subsystems number of correctly recognized
test objects varied from 30 to 33. Collective
decision rule recognized all objects correctly
with quality: Fs=0,582, Ft=0,605 и G=0,038.
All selected subsystems consist of 14
different
genes
(759[D88422_at],
1778[M19507_at],
1881[M27891_at],
2014[M54995_at],
2232[M77142_at],
2287[M84526_at], 2641[U05259_rna1_at],
3257[U46751_at],
4049[X03934_at],
4249[X53586_rna1_at], 4278[X55668_at],
4679[X82240_rna1_s_at], 6846[X95735_at],
6199[M28130_rna1_s_at],
6217[M27783_s_at]).
Running time of algorithm FRiS-GRAD for
leukemia task (38x7129) was equal to 25
seconds on uniprocessor computer with clock
speed 2.21 GHz.
From the data resulted by us it is possible to
draw a conclusion that results of diagnostics
the leukemia received at use of all of 7129
attributes, differ from the results received on
the casual table a little, and the big trust do
not deserve. Those feature subsystems which
are chosen by algorithm FRiS-GRAD, it is
possible to consider suitable for preliminary
1
Работа поддержана РФФИ, грант № 08-01-00040
use. However the conclusions about a
practical acceptability of these results
received on data of so small number of
patients, it would be possible to do only after
their analysis by experts-genetics. Found by
us the «concentrated» subset of attributes also
can be of interest for the research purposes.
Results of leukemia task were compared with
published results of other authors and
appeared to be competitive. They corroborate
what algorithm FRiS-GRAD has high
efficiency in combination with high rate of
work, and FRiS-function is usable for
different problems of Data Mining solving.
Литература
1.N.G. Zagoruiko, I.A. Borisova, V.V.
Dyubanov, and O.A. Kutnenko. Methods of
Recognition Based on the Function of Rival
Similarity//Pattern Recognition and Image
Analysis, 2008, Vol. 18. No.1pp.1-6.
2. N.G. Zagoruiko, O.A. Kutnenko and A.A.
Ptitsin. Algorithm GRAD for Selection of
Informative Genetic Feature // Proc. Int.
Conf. on Computational Molecular Biology,
Moscow. 2005. pp.8-9.
3. Zagoruiko N.G. Applied methods of data
and knowledge mining // Pub. IM SB RAS,
Novosibirsk, 1999 , 273 p. ) [In Russian].
4. Kira K., Rendell L. The Feature Selection
Problem: Traditional Methods and a New
Algorithm //Proc. 10th Nat’l Conf. Artificial
Intelligence (AAAI-92), 1992, p. 129-134.
5. A. N. Kolmogorov, On the Suitability of
Statistically Obtained Forecast Formulas//
Zavodskaya Laboratoriya, No. 1,1993,
pp.164–167 [In Russian].
6. Isabelle Guyon, Jason Weston, Stephen
Barnhill, Vladimir Vapnik Gene Selection for
Cancer Classification using Support Vector
Machines. Machine Learning. 2002, 46 (1-3):
pp. 389-422.
Download