ATTRIBUTE SELECTION THROUGH DESCISION RULES CONSTRACTION (algorithm FRiS-GRAD) 1 N. Zagoruiko2, I. Borisova2, V. Dyubanov2, O. Kutnenko2 2 zag@math.nsc.ru**, Sobolev Institute of Mathematics, SB RAS biamia@mail.ru, vladimir.dyubanov@gmail.com, olga@math.nsc.ru This paper presents a new algorithm FRiS-GRAD of simultaneous informative attributes selection and decision rule construction. This algorithm is based on Function of Rival Similarity (FRiS-function) measuring similarity of objects with standards of rival patterns. Standards or representatives are the objects of training dataset which are the most similar to all other objects of their patterns. The quality of attribute subsystems is estimated by average value of FRiS-function through all training objects. Results of algorithm FRiS-GRAD using on real physic and genetic tasks prove its usefulness and efficiency. Introduction Main ideas of using function of rival similarity (FRiS-function) for pattern recognition algorithms construction were presented on a conference PRIA-8 and published in [1]. In this paper we presented results of our last researching concerning problems of attribute selection and suitability of selected attribute systems estimating. There are three complex questions what need to be answer through attribute selection: 1. How should search through the large number of variants of attribute subsystems be organized? 2. How should the quality (informativeness) of attribute subsystem be calculated? 3. Is the subsystem found on training dataset suitable for recognition of control objects, or it is accidental and not fitted for further use? We have answered all these questions with help of Function of Rival Similarity. Briefly remind its definition. Human perception of similarity is of relative nature and depending on the peculiarities of a competitive environment. To estimate measure of similarity of object Z with rival patterns A and B in the scale of relations we will use the following values: FA/B=(rA-rB)/(rA+rB) - similarity Z with A, FB/A=(rA-rB)/(rA+rB) - similarity Z with B. Here rA and rB – distances from Z to patterns A and B, correspondently. 1 Работа поддержана РФФИ, грант № 08-01-00040 The algorithms used in algorithm FRiSGRAD Algorithm FRiS-Stolp [1] is used for decision rules building. As a result of it’s work subset of objects-standards (stolps) for every pattern in dataset is selected. This subset is a compact description of whole dataset by small number of typical objects and can be used for control objects classifying (recognition). This algorithm is supposed to select the minimal number of stolps which protect all training samples from incorrect classification during Cross-Validation. Average value of rival similarity Fs of all objects of the training sample not selected as stolps is used as a measure of training quality. At the same time this value estimates informativeness of attribute system where algorithm FRiS-Stolp was run. To classify new object Z, two stolps of rival patterns, which are nearest to it, are found. Object Z classified as an object of the pattern rival similarity F with which stolp is maximum. To select most informative attribute subsystem we use main ideas of basic greedy approaches (forward and backward searches). Forward selection (algorithm Addition) tries to increase attribute subsystem quality Q as much as possible for each inclusion of attributes, and backward elimination (algorithm Deletion) tries to achieve this for each deletion of attributes. In algorithm AdDel next combination of these two approaches is used: at first n1 informative attributes are selected by method Add. Then n2 worst of them (n2< n1)) are eliminated by method Del. Number of attributes in selected subset after that two steps is equal to (n2- n1). Such consecution of actions (algorithms Add and Del) repeats until quality of attributes in selected is maximum. In algorithm GRAD [2] we add and eliminate “granules” (sets consisting of several attributes) instead of single attributes. Granules are formed from attributes with high individual informativeness. Algorithm FRiS-GRAD To find the best subsystem of attributes we use procedure of directed search, offered in algorithm GRAD. On each step some variant of attribute subsystem forms and then algorithm FRIS-Stolp is started to construct set of stolps and to calculate quality of training Fs in the subsystem. Value Fs is used further as a measure of informativeness of this subsystem. As a result algorithm FRiS-GRAD selects most informative attribute subspace and gives us the rule to classify new objects in this subspace. The output of the FRiS-GRAD contains descriptions of required number of informative subsystems with their values of FRiS-quality Fs. Every subsystem i with quality Fs(i) provide us it’s own point of view on the training dataset. Because of it results of a new object Z classifying can change from subsystem to subsystem and are characterized by different values of FRiS-function. To make collective decision we estimate values of rival similarity Fj(i) of recognized object Z with pattern j in informative subspace i. Z classifies as an object of pattern Sj with maximum value Fs(i)*Fj(i). The same attribute can contain in different granules forming selected informative subsystem. We assume what such attribute is more important in comparing the attributes containing in single granule. To take this fact into account we assign to each informative attribute its weight proportionate to the 1 Работа поддержана РФФИ, грант № 08-01-00040 number of granules in selected subsystem included this attribute. So algorithm FRiSGRAD determines optimal number of attributes in informative subsystem, selects required number of most informative subsystems, estimates significance of each attribute (calculates weights) and allows making collective decision for pattern recognition task. Informativeness and suitability In the majority of existing methods, share U of objects of training samples correctly recognized by the kNN rule, estimated by the method Cross-Validation, is used as an attribute subsystem quality. Another criterion which presumably allows select informative subsystems of attributes is based on Fisher's idea to estimate informativeness through the ratio Q | 1 where μ1, μ2 are sample means of patterns, δ1, δ2 - their variances. Next method is described in the family of of the training sample algorithms entitled RELIEF [4]. For each object a value W(a)=(r2-r1)/(rmax-rmin) is calculated. Here r1(a) is a distances from a to the nearest "one's own" pattern, r2(a) to the nearest "competitor" pattern. Average value W is used as a measure of informativeness of attribute system. To estimate FRiS-informativeness of some subspace of attributes we can use simplify version of FRiS-criterion Fs calculated under the assumption that all objects of the training dataset are the stolps of the patterns. This criterion F1s is used in FRiS-GRAD on a stage of granules forming to speed up rate of the algorithm work. In [1] the experiment, where these four criteria (U, Q, W, and F1s) were compared, is described. It was shown that criterion F1s allowed us to estimate the informativeness of attributes more objectively then others. Informativeness estimations got with the most popular criterion U appeared to be too optimistic. If the number of attributes N is larger than the number of objects M in dataset the probability to select irrelevant attributes because of random successful combination of values is get high. A.N.Kolmogorov paid attention to this problem in 1933 [5]. So we should be able to answer the question: whether the subsystem found on training dataset is suitable for recognition of control objects, or it is accidental and not fitted for further use? We offer a method to estimate "nonrandomness" of selected subsystem, realized in algorithm FRiS-Test. 1) On sample of M training objects we carry out procedure Cross-Validation V of times: on each step on M(V-1)/V to casually chosen objects it is found 10 feature subsets and on them we recognize the others M/V objects. As a result we estimate training quality average value of FRiS-function Fs. share of correctly recognized objects P and similarity Ft of test objects to the nearest stolps. For suitable attribute subsystems values Fs and Ft are close to each other. Difference between them is designated by value G=|1-Fs/Ft|. 2) We randomly mix values of each attribute in dataset, and breaks all the dependencies between describing and target attributes. For such table we repeat sequence of procedures from first point, and during Cross-Validation fix values Fs*, P*, Ft* and G*=|1- Fs*/ Ft*|. Our experiments are shown what probability to find subsystem with high values Fs* and Ft* at the same moment for random dataset is miserable. So to check out some attribute subsystem we use next rule: value Fs for suitable attribute system is higher than Fs* and G>>G*. Real tasks decision We used our algorithm FRiS-GRAD for solving real physic and genetic tasks Analyzed dataset in physic task consists of specters of fine-dyspersated substances microparticles, obtained with x-ray analyzer. Each specter is characterized by values of its activity (reflection amplitude) in 981 spectral bands. According to their chemical composition all analyzed substances are divided into two patterns A and B. Specters of the same substance microparticles can differ from each other because of heterogeneity of the substance and experiments conditions. Training and test datasets consisted of 55 objects of both classes. 1 Работа поддержана РФФИ, грант № 08-01-00040 We tested algorithm FRiS-GRAD on a problem of small dimension subset of attributes (spectral bands) selection simultaneously with decision rule construction. Also we used this task for the best algorithm settings finding. Recognition quality in whole attribute space was high enough: P=105/110, Fs=0,376, Ft=0,281 G=0,338. These results are much better than results on random dataset: P*=(4565)/110, Fs*=0,298, Ft*=0,043 и G*=5,93. By algorithm FRiS-GRAD running we selected 10 most informative attribute subsystems, consisting of from 2 tо 5 attributes. All 7 attributes forming these 10 systems were from 368 to 374 spectral bands. Value P=107/110 increased. Other characteristics of training were the following: Fs=0,752, Ft=0,588 и G=0,279. In the genetic task analyzed dataset consists of a matrix of gene expression vectors obtained from DNA micro-arrays [6] for a number of patients with two different types of leukemia (ALL and AML). Initial number of genes is N=7129. Training set consists of 38 samples (27 ALL and 11AML). Test set has 34 samples (20 ALL and 14 AML). This task was interesting for us because results of it’s solving by different researchers were published and we could compare effectiveness of our algorithm with competitors. In the work [6] two attributes (genes X95735 and HG1612) are shown which allow recognizing all test objects without any mistakes. But there is no explanations how to select these two genes by only training dataset. The best two-dimension subsystem, correctly found in this work by method SVM in combination with algorithm RFE (sort of backward elimination), has a misrecognition rate on the test data 12%. Another attribute subsystems (4 and 128 attributes) allowing recognizing 31 objects and 33 objects correspondently are presented in this work but no instructions are given how to select the best one among them without using test dataset. It program took 3 hour of algorithm working Our results were the following. Recognition quality in whole attribute space: P=23/34 (67,6%), Fs=0,332, Ft=0,114 and G=1,912. For random dataset results are similar enough P*=(16-25)/34 (47,0% - 73,5%), Fs*=0,310, Ft*= 0,018 и G*=16,22. It means what recognition on all attributes nonrandom, but relative closeness of FRiS- values shows it’s weakness. Algorithm FRiS-GRAD selected 10 informative attribute subsystems. Dimension of selected subsystems changed from 2 to 4.(14 different attributes in all subsystems). Number of correctly recognized test objects varied from 30 to 32. Collective decision is P=33/34 (97,1%), Fs=0,526, Ft=0,548 и G = 0,040. If we take into account hundreds informative subsystems, selected while we tested settings for our algorithm, number of informative genes was 58 (from 7129). Learning on this subset of attributes allowed selecting 10 informative attribute subsystems by algorithm FRiS-GRAD. Dimension of selected subsystems changed from 2 to 4. In this subsystems number of correctly recognized test objects varied from 30 to 33. Collective decision rule recognized all objects correctly with quality: Fs=0,582, Ft=0,605 и G=0,038. All selected subsystems consist of 14 different genes (759[D88422_at], 1778[M19507_at], 1881[M27891_at], 2014[M54995_at], 2232[M77142_at], 2287[M84526_at], 2641[U05259_rna1_at], 3257[U46751_at], 4049[X03934_at], 4249[X53586_rna1_at], 4278[X55668_at], 4679[X82240_rna1_s_at], 6846[X95735_at], 6199[M28130_rna1_s_at], 6217[M27783_s_at]). Running time of algorithm FRiS-GRAD for leukemia task (38x7129) was equal to 25 seconds on uniprocessor computer with clock speed 2.21 GHz. From the data resulted by us it is possible to draw a conclusion that results of diagnostics the leukemia received at use of all of 7129 attributes, differ from the results received on the casual table a little, and the big trust do not deserve. Those feature subsystems which are chosen by algorithm FRiS-GRAD, it is possible to consider suitable for preliminary 1 Работа поддержана РФФИ, грант № 08-01-00040 use. However the conclusions about a practical acceptability of these results received on data of so small number of patients, it would be possible to do only after their analysis by experts-genetics. Found by us the «concentrated» subset of attributes also can be of interest for the research purposes. Results of leukemia task were compared with published results of other authors and appeared to be competitive. They corroborate what algorithm FRiS-GRAD has high efficiency in combination with high rate of work, and FRiS-function is usable for different problems of Data Mining solving. Литература 1.N.G. Zagoruiko, I.A. Borisova, V.V. Dyubanov, and O.A. Kutnenko. Methods of Recognition Based on the Function of Rival Similarity//Pattern Recognition and Image Analysis, 2008, Vol. 18. No.1pp.1-6. 2. N.G. Zagoruiko, O.A. Kutnenko and A.A. Ptitsin. Algorithm GRAD for Selection of Informative Genetic Feature // Proc. Int. Conf. on Computational Molecular Biology, Moscow. 2005. pp.8-9. 3. Zagoruiko N.G. Applied methods of data and knowledge mining // Pub. IM SB RAS, Novosibirsk, 1999 , 273 p. ) [In Russian]. 4. Kira K., Rendell L. The Feature Selection Problem: Traditional Methods and a New Algorithm //Proc. 10th Nat’l Conf. Artificial Intelligence (AAAI-92), 1992, p. 129-134. 5. A. N. Kolmogorov, On the Suitability of Statistically Obtained Forecast Formulas// Zavodskaya Laboratoriya, No. 1,1993, pp.164–167 [In Russian]. 6. Isabelle Guyon, Jason Weston, Stephen Barnhill, Vladimir Vapnik Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning. 2002, 46 (1-3): pp. 389-422.