Lapko A., Lapko V.

advertisement

5

COLLECTIVE OF NONPARAMETRIC ALGORITHMS OF PATTERN

RECOGNITION, BASED ON RANDOMIZED METHOD OF THEIR

OPTIMIZATION

1

A.V. Lapko 2 , V.A. Lapko 2

2 Institute of computational modelling of Siberian Branch Russian Academy of Science;

660036, Krasnoyarsk, Akademgorodok 50, ctr.44, E-mail: lapko@icm.krasn.ru

Nonparametric algorithms of pattern recognition, based on randomized method of their optimization are offered. The idea of the method involves acceptance of casual character of nuclear functions diffusiveness factors and choice of their distribution law parameters in optimization of nonparametric deciding rules. Properties of the developed qualifiers are investigated and results of their comparison with traditional nonparametric algorithms of recognition of images are analyzed.

Introduction

The existing paradox of traditional methods of identification of stochastic models involves comparison with the final casual sample of supervision of investigated objects variable a concrete set of parameters of model which is optimal in some conditions. Essentially new randomized approach of definition of nonparametric algorithms of pattern recognition diffusivenes factors based on a nuclear estimation of Rozenblatt-Parzen probability density is offered.

For the first time the technique of the casual choice of nuclear functions diffusivenes factors for the synthesis of a nonparametric estimation of probability density was offered in 1975 by T.Wagner [1]. Formation of casual sequence of diffusivenes factors at estimation probability density p

  is carried out from

 sample of distances between initial supervision x i

, i

1 , n

 and their k the nearest neighbors. Despite of a visual simplicity of the approach, there is the unsolved problem concerning the choice of value k and a substantiation of consequences of such choice.

n this work on the basis of the asymptotic properties analysis of the nonparametric estimation p

  of probability density p

  the opportunity for finding the rational distribution law p

 

 c

0 , h

 of diffusivenes factors c in a class of sedate functions is shown. The received results are used at synthesis of nonparametric pattern recognition algorithms in conditions of casual values of nuclear functions

diffusivenes factors.

Randomized method of optimization

The substantiation of the randomized m ethod of nonparametric pattern recognition algorithms optimization will be investigated on the example with probability density estimation. Let V

 x i

, i

1 , n

be the sample of n statistically independent supervision of a random variable probability density p

  x

R

1 with the type of which is unknown. We shall consider, that p

 

is limited and continuous with all derivatives up to the second order inclusively. As approximation of the empirical data required probability density p

 

V of we shall accept statistics of Rozenblatt-Parzen type [3]

_______________________________________________________________________

1 The work has been executed as the part of the grant of the Russian Foundation for basic research №07-01-00006а.

.

6

Where

 

 

 n

1 n

 c

1

 x c i x i

, (1) i

1 i

 

- the nuclear functions, satisfying to conditions of positivity, symmetry and normalization

. Diffusivenes factors c i are random variables and are characterized by probability density p h

 

  c t

, (2)

  t

 h t

1

1

 c

 

0 , h

.

The left border of an interval of change c follows from conditions of asymptotic unbiasedness in volume of initial data

Asymptotic expression of mean square deviation p

 

: p

 

from c

  n

0

 under increase

.

p

  for casual values of nuclear functions diffusivenes factors in nonparametric estimations of probability density p

  are given as:

W p

~ t

1 t

 

  h

W

   

0

1   dc ~

2

  dx

2

 t

1

5

 t h

4 p

4

, (3) and modification is given as:

W

1 p

 h

0

M

        dc ~

~ h

2

2

 t

 t

3

1

 p . (4)

The analysis of expressions (3), (4) shows, that the nonparametric estimation of probability density with casual values of diffusivenes factors (1) possesses properties of asymptotic unbiasedness and solvencies. It is characterized by lower value of displacement

(4) and a little bit greater value of mean square deviation

(3) in comparison with a traditional nonparametric estimation of probability density of nuclear type. It is necessary to expect display of potential efficiency of a nonparametric estimation of probability density(1) at final volumes of statistical data.

The modified nonparametric algorithm of classification

Let us define nuclear functions diffusivenes factors in the form of c v

 c

 v

, where

 v

- an mean square deviation estimation of parameters x v

, v

1 , k of classified objects, and c - as a random variable with probability density p h

 c

0 ; .

Let us accept the procedure of sequence parameters formation c с h

1

 t

1

. (5) on the basis of a random variable

  with the uniform distribution law.

Let us generate the sequence of diffusivenes factors on the basis of procedure (9) and compare its elements in the casual way to the nuclear functions in nonparametric estimations of probability density of Bayes equation of a dividing surface corresponding criterion of a posteriori probability maximum [2]. Then the nonparametric estimation of the dividing surface equation with casual nuclear functions diffusivenes factors for a two-alternative task of pattern recognition could be given as f

~

12

 n k

1

 v

1

 v i n

 

1

  v k

1

1 c i

 x v c i

 v x i v

. where

 

- «instructions of a teacher» from training sample

 x i

,

 

, i

1 , n

,

 

 

1

1

 x i  

1 x i  

2

.

,

The optimization of pattern recognition nonparametric algorithm m

12

:

 x x

 

1

, если

 

2

, если f

~

12 f

~

12

0

0 ,

(6) on the right border h of a range of definition of probability density p h

  is carried out from the condition of a minimum of an empirical error of classification by a method of

«sliding examination».

Collective of nonparametric classification algorithms

Let us use the principles of deciding rules collective synthesis for increase of efficiency of pattern recognition nonparametric algorithms in conditions of casual values of diffusivenes factors nuclear functions. Let

~ j

12

 

, j

1 , M be the

nonparametric solving rules for a two-alternative task of pattern recognition which are constructed on the

V

 x i

,

 same

 

, i

1 , n

 training sample according to the technique stated above. Solving rules are characterized by the same optimum parameter h of the range of definition right border of probability density p h

  of diffusivenes

 factors, but their different casual sequences c i j

, i

1 , n

, j

1 , M .

Let us take the advantage of one of well known collective estimation approaches [4], for example, the method of «voting» and construct solving rule

7 m

12 where M j

,

:

 x x

 

1

 

2

,

, если если j

1 ,

M

1

M

M

2

M

M

2

M

M

1

M

;

, (7)

2 - number of «decisions» which are accepted with members of collective about belonging of object with a feature set x in favor of j -th class.

In multialternative statement of the task of pattern recognition each member of collective

~ j

12

 

, j

1 , M uses solution of type (6). The final conclusion, for example, x

  t is accepted, if frequency of decisions of collective members in favour of t -th class is the maximal one.

Application of collective (6) allows to increase the reliability of accepted decisions in conditions of casual values of nonparametric algorithms

diffusivenes factors.

Results of computing experiment

Research was carried out under the solution of a two-alternative task of pattern recognition in k - measured space of attributes.

The analysis of computing experiments results shows, that statistical estimations of distribution laws of empirical errors of pattern recognition of traditional nonparametric algorithm and its updatings (6) for casual values of diffusivenes factors authentically do not differ. However, possible discrepancy for definition of diffusivenes factors optimum values estimations leads to decrease in efficiency of traditional nonparametric algorithm of pattern recognition in comparison with its updating.

This fact can be explained with essentially greater efficiency stability of nonparametric algorithm with casual diffusivenes factors when parameter h changes.

Potential opportunities of the randomized optimization method are most fully realized with the use of collective (7) pattern recognition of nonparametric algorithms(6).

.

Authentic advantage of a solving rule (7) is observed when the quantity of its elements

M

10 is above the traditional nonparametric qualifier, their collective and algorithm of pattern recognition (6) with casual diffusivenes factors nuclear functions.

When the dimension k of classified objects attributes space at the fixed volume of training sample n there is no decrease in efficiency of compared nonparametric pattern recognition algorithms of is not observed. However, the advantage of collective of nonparametric solving rules with casual diffusivenes factors is kept and especially shown for great values k >10.

If values of parameter t of the distribution law p h

 

are more than 4 estimations of errors modified (6), traditional pattern recognition nonparametric algorithms and its collective authentically do not differ for k

2 ; 20

.

Algorithms collective advantage (7) with casual diffusivenes factors above collective of traditional nonparametric qualifiers is marked for all t

  and k

2 ; 20

. Its efficiency, in comparison with a traditional nonparametric solving rule their collective, can be proved by use of variable nuclear measures of affinity between points in space of classified objects attributes which are defined by the diffusivenes factors distribution law.

Application of collective estimation principles allows to give the variable measures of the affinity, peculiar to the modified nonparametric algorithms with casual diffusivenes factors, steadier character.

8

Conclusion

The

randomized

method of nonparametric algorithms of pattern recognition optimization, based on a choice of distribution law parameters of casual values of nuclear functions diffusivenes factors from a condition of a empirical error classification minimum are offered. Solving rules possess significant stability to errors of definition of their optimum parameters.

Application of nonparametric algorithms collective corresponding various sequences of casual diffusivenes factors values, provides authentic decrease in a error of pattern recognition errors in comparison with traditional nonparametric qualifiers.

Perspectivity of the researches direction involves the opportunity create algorithmic means confidential estimation of nonparametric estimation of the dividing surface equation and its diffusivenes factors.

References

1.

Devroye L., Diorfy L. Nonparametric density estimation (

(in Russian).

L

1

view). - Moscow: Mir, 1988. - 407 p

2.

Lapko A.V., Lapko V.A., Sokolov М.I., Chentcov

S.V. Nonparametric systems of classification. -

Novosibirsk: Nauka, 2000. – 240 pp (in Russian).

3.

Parzen, E. On the estimation of a probability density function and mode/ Ann. Math. Statist., 1962.-P. 1065.

4.

Rastrigin L.A. Hybrid recognition // Automatics and telemechanics, 1993. - №4. -3-20 pp (in Russian).

Download