A discrete transform for pattern recognition of immune type based on the tree structure of data *1 By C. Karanikas and G. Proios Department of Mathematics, Aristotle University of Thessaloniki, Thessaloniki 54006,Greece Department of Maritime and Enterprising Services, Aegean University, Chios Greece Abstract It is shown, by an invertible discrete transform that any finite sequence or any collection of strings of any length can be presented as a random walk on trees . More generally any positive continuous probability measure can be written as a random walk on trees and as a infinite product of wavelet-type p-adic Haar polynomials, with coefficients related to the random walk. All these transforms create the mathematical background needed for coding any discrete information and also for exploring the local variability and diversity of the information. Relying on the underlying computational algorithms, with several examples and applications we propose that these transforms can be efficiently used for pattern recognition of immune type. In other words we propose a mathematical platform for detecting self and non self strings of any alphabet, based on negative selection algorithms, for scouting data’s periodicity and self-similarity and for measuring the diversity of chaotic strings with fractal dimension methods. In particular we estimate reliably the entropy and the ratio of chaotic data with self similarity. 1. Introduction Recall that any collection S of strings of length N written in an alphabet {a1 ,..., a p } of p letters (p>1), with a correspondence ai p i; i 1,..., p , can be considered as a set of integers N x i p i 1 , (1) i 1 where i 0,..., p 1; i 1,..., N , clearly x is less than pN. Similarly any collection S of integers in [0, pN –1], can be written as a collection of strings expressed by their digits i 0,..., p 1; i 1,..., N in the base p, as in (1). It is obvious that in both cases S can be considered to be a set of pN observations having value 1 on any element of S and 0 otherwise. Using another terminology S can be considered as a sequence of pN elements such that: T = {tn , n = 0,…, pN -1: tn =1 if nS and tn =0 otherwise} (2) If we call S self set and its complement non-self set the main immune type pattern recognition problem is to recognize whether a sting of length N written in an alphabet of p letters is self or non-self. In addition we have the problem to determine detectors for the possible periodicity or self-similarity of S and moreover to measure its diversity, in order to increase or replace S with another collection of strings having 1 Research supported by the EU Project IST-2000-26016 IMCOMP different diversity. Recall that in immunology the effectiveness of a set of antibodies against a collection of viruses depends on the diversity of antibodies. Since antibodies can be represented as a chaotic set of strings, a natural measure of their diversity is the information entropy. The aim of this paper is to create the mathematical background for coding the information of a self set on a p-adic tree in order to construct immune-type detectors, negative selection algorithms (see [DA], [DFH] ) and measures for the diversity of self sets (see [FMT]) using information entropy and chaotic dimensions methods. Recall now that a tree is a standard data structure used in computer science and elsewhere for organizing information. Information in a tree is stored in nodes, starting with a root node and ending with terminal nodes called leaves. Nodes are linked to other nodes through branches. Leaves are nodes without any branches. The most common type of tree is the binary (or 2-adic) tree, in which each node has no more than two branches. Now we list some simple cases of self sets as in (2) : Case 1. Suppose that the self set S is concentrated on a subinterval of [0, pN-1] e.g. S is the left part of it. Then a simple pattern rule for detecting whether a random integer is in S, is to identify the digits of the maximum and minimum elements of S. Note that the diversity of this set is apparently very small. Case 2. Suppose that the self set S has self-similarity e.g. consider the set of all N integers x i 3i 1 in base 3 described by the rule i 0,2; i 1,..., N . Note that i 1 this set composes the usual (middle-third) Cantor set (see [E]) and a simple detector for a string to be outside of S, is to check whether i 1, for, some; i 1,..., N . The diversity of this set is obviously larger than the diversity of the set discussed in previous case. This diversity can be estimated by the information entropy (which is log(2)/log(3), see [E]). Case 3. Suppose that S has cardinality c and is randomly distributed on the interval [0, pN-1], then on any large subinterval of it having length pm we get around [c m-N .p ] self elements, where [x] is the upper integer part of a number x i.e. the smallest integer > x. But on any small subinterval of [0, pN-1], it is hard to decide whether this interval contains part of the self sets, even if we use any probability and any computational method. Moreover in this case the diversity of S is large. To formulate the detectors problem of self sets S and to distinguish cases as above, we propose a non-linear transform for the corresponding sequence (2). This invertible transform could be considered as a random walk on a p-adic tree. Some of the advantages of this transform are the ability to detect local variability, self similarity and to measure the information entropy . In section 2 of this work we shall define, the non-linear tree transform of any data of pN (p>1) non-negative observations. We shall show that this transform is invertible and we shall examine its properties. 2 In section 3, we describe a set of M observations, where M has a prime number factorization M p1N1 ... prN r , N i 1, i 1,..., r , as a compound tree i.e. a tree having N1+…+ Nr generations, such that for Ni generations each node has pi branches ( i = 1,…,r). Notice that since any self set S can be embedded, as in (2), on a set with cardinality M, one can have a desired tree structure choosing M with suitable prime number factorization. In section 4 we use the results of section 2 to show that any continuous probability measure can be written as an infinite product of p-adic Haar polynomials. Notice that for p = 2 the construction is based on the usual Haar wavelet system and our p-adic infinite product is in fact a generalization of known results see [FKP] and [K]. We should note here that similar infinite products have been used to estimate the Hausdorff dimension of certain fractals (see [BK1], [BK2], [BKM] and [BKP]). Thus the diversity of self-similar sets is estimated by the information entropy or the fractal dimension we believe, that the proposed tree structure of self sets is a natural tool for computational applications of immune type. In the last section, we give several applications, examples and problems for immune type computing as follows: We determine a set of detectors for the self sets, as in cases 1 and 2 above, and we discuss negative selection algorithms, based on the tree transform, for the detection of random strings. Moreover we discuss the problem of detection of self sets with simple periodicity. For self similar strings with constant ratio r, expressed as a set of integers (i.e. as in (2)), we figured out an algorithm (see application c) to detect this ratio. This algorithm, based on the null walks of the tree transform, is working perfectly whenever the denumerator of r is the base of digits expansion and quite successfully whenever we have a different base. For collections of strings (like for example antibodies) we examine the problem of measuring their diversities. The diversity of a collection of strings S is estimated by the formula H2(S) (see application d), The formula H2(S) derived from the Hausdorff dimension of measures (see [B] ) for the particular case of non-ergodic markovian processes, as in previous works ([BK2], [BKM] and [BKP]). Moreover we compare this Hausdorff dimension method of measuring the diversity, with the usual information entropy formula (see formula H(S) in application d), used in [FMT] for the diversity of antibodies. In particular we show that for several known chaotic data S with self-similarity, the formula H2(S) gives approximately the dimension of the corresponding chaotic system, even with unknown similarities and fractal structure. In particular in example 6, we estimate the entropy of three known fractals (Cantor, Sierpienski and a fractal in base 5) and we get correct estimates with accuracy less than 2%. We discuss moreover other applications of the tree transform as signal denoising and edge detection. 3 Finally we highlight that the main advantages of the tree transform are: the commodity for constructing computational algorithms, the multiresolution structure, the benefit of analyzing local properties of data and as the tree transform is a random walk, it is especially suitable for examining fractal sets and chaos. 2. The Tree Transform Let p, N >1 be positive integers and let T {t1 ,..., t p N } be a non-negative data. We define a family of vector valued functions, Rn ,k : R p R p , n=0,1,…,N, k=1,…,pn N n kp Rn , k (T ) : ts , k 1,..., p n , s ( k 1) p N n 1 N n pN notice that R0,1 (T ) : ts and that RN ,k (T ) tk ; k 1,..., p N . s 1 This family of vector valued functions has a tree structure (p-adic tree structure) having N+1 generations such that: R0,1(T) corresponds to the initial node of the tree; each Rn,k(T), n=1,…,N-1 corresponds to the k node of the n-th generation and RN,k(T) is the k branch (or leaf) of the last generation. For any n = 1,…,N and k = 1,…, pn we call p-adic walks of T the real numbers: Rn 1,[ k / p ] (T ) 0 0 R (T ) an , k (T ) : n , k Rn 1,[ k / p ] (T ) 0 . Rn 1,[ k / p ] N We call tree transform of T the map: T {an,k (T ) : n 1,..., N ; k 1,..., p } . It is straightforward to see that this map is not linear. We examine bellow the properties of the tree transform of T. Lemma 1 Let T {t1 ,..., t p N } then we have: 1. For all n,k, an,k (T) lies on the interval [0,1] . 2. all numbers an,k (T) are invariant under dilation i.e. an,k(b.T) = an,k(T) , where b>0 and b.T {bt1 ,..., bt p N } . 3. For a [ k / p ]s any n = 2,…,N, if n ,k a n 1,s 0 for some s = 1,…, pn-1, we have (T ) 1 . 4. If for some pair n,k, n < N and k = 1,…, pn, an,k(T) = 0 then for all s such that [s/p]=k, an+1,s(T) = 0. 5. For any n = 1,..,,N, if [k/p] = k/p and a n ,k 0 , we have a n ,k (T ) 1 a n ,s [ s / p ] k / p , s k (T ) 6. If T and S are as in (2), then R0,1(T) is the cardinality of S. The proofs of the Lemma 1 are straightforward. 4 Proposition 1. For any integer p>1 and any non-negative data T {t1 ,..., t p N } , ,we have: tk aN , k (T )aN 1,[ k / p ] (T )...a1,[ k / p N 1 ] (T ). R0,1 (T ) , where k = 1,…, pN. Thus the tree transform is invertible and we write: T {an,k (T ) : n 1,..., N ; k 1,..., p n } . Moreover to reconstruct T one requires pN –1 walks a n , k (T ) , n = 1,…,N and k such that [k/p]k/p and the sum R0,1(T). Proof If tk 0 a N , k 0 and so the equation is true. Suppose that tk 0 Then from the definition of an , k ' s it is easy to see that tk RN , k RN 1,[ k / p ] RN 1,[ k / p ] RN 2,[ k / p 2 ] ... R1,[ k / p N 1 ] R0,[ k / p N ] .R0,1 aN ,k aN 1,[ k / p ]...a1,[ k / p N 1 ].R0,1 , in fact we observe that tk RN ,k , and , R0,1 R0,[ k / p N ] . Finally by 5 of Lemma 1, the reconstruction formula requires all walks a n ,k such that [k/p]k/p , these are totally pN –1 . 3. Random walks on compound trees Now we consider a non negative data series T {t1 ,..., t M } , such that M p1N1 ... p rN r , N i 1, i 1,..., r , where p1 p2 ... pr are primes. Let N N1 ... N r and v(n) ,n =0,1,…,N, be the n-th element of the following sequence : p ,..., p 1 N1 1 , p1N 1 p21 ,..., p1N 1 p2N 2 ,..., p1N 1 p2N 2 ... prNr11 ,..., p1N 1 p2N 2 ... prNr11 p1k ,..., M ; assume that v(0) =1. For n = 1,2,…,N, k = 1,…,v(n) we define : vkM (n) Rn ,k (T ) : ts , k 1,..., v(n ) , s ( k 1) M 1 v(n) notice that RN , k (T ) : tk ; k 1,..., v ( N ) and R0,1 (T ) : M t s . s 1 As in section 2 we denote: 0 Rn ,k (T ) an ,k (T ) : Rn 1,[ kv ( n 1) / M ] (T ) 5 Rn 1,[ kv ( n 1) / M ] (T ) 0 Rn 1,[ kv ( n 1) / M ] (T ) 0 where n = 1,…,N and k = 1,…,v(n). Clearly we get random walks on this compound tree having N generations, such that for Ni generations each node has pi branches ( i = 1,…,r). It is easy to check the inversion formula: tk RN ,k RN 1,[ kv ( N 1) / M ] R N 1,[ kv ( N 1) / M ] RN 2,[ kv ( N 2 ) / M ] ... R1,[ kv (1) / M ] R0,1 .R0,1 a N , k a N 1,[ kv ( N 1) / M ] ...a1,[ kv (1) / M ] .R0,1 , and the corresponding Lemma 1 for the compound tree transform. 4. p-adic Haar-Riesz Products In this section we express any non-negative continuous probability measure on [0,1] as a random walk on an infinite p-adic tree furthermore to write it as a weak * limit of p-adic Haar polynomials. For the theory of measures we refer to [S], for Haar-Riesz products see [K] and [BK2] . We call p-adic Haar function the sequence of step function defined as follows 0 [ xpn 1 ] k hnj,k ( x ) : 1 mod([ xpn ] 1, p ) j 1 or, j p p 1 mod([ xpn ] 1, p ) j 1 j p x[0,1], n=1,2,…, k = 1,…,pn-1, j = 1,…,p, where mod(x,p) is the modulo of x/p. Lemma 3 (1) For p = 2, {hn1,k ( x), n 1,..., k 1,..., p n1} is the usual Haar system. (2) For any p = 2,3,… and for fixed j = 1,…,p-1, the system {hnj,k ( x), n 1,..., k 1,..., p n1} is orthogonal (in the L2 sense i.e. 1 h 0 j n ,k ( x)hmj ,t ( x)dx 0, k t, or, n m ). (3) For any p = 2,3,… , j, i =1,…,p-1, 1 h 0 j n ,k ( x)hni ,k ( x)dx 0, i j, n 1,..., k 1,..., p n1 . The proof can be easily seen from the graphs of the corresponding functions. Definition 1. For any p>1 and any non-negative continuous probability measure on [0,1], we call walks of on the infinite p-adic tree the sequence {an,k ( ), n 0,1,2,...; k 1,..., p n } determined as follows: For any integer N>1, if TN t1 , t2 ,..., t N , where, tk : k / pN d; k 1,..., p N ( k 1) / p N we shall write an,k ( ) : an,k (T ), n 01,2,..., N ; k 1,..., p n . 6 , N 1,2,... (3) Definition 2. We call p-adic Haar- Riesz product a sequence of p-adic Haar n 1 N p p polynomials ( j 1 cnj,k hnj,k ( x )), N 1,2,... . We say that the p-adic Haar n 1 k 1 Riesz product converges weak* to a measure on [0,1], if for any continuous function f on [0,1] we have: 1 f ( x)d ( x) lim 0 1 N 0 N p n 1 f ( x ) ( j 1 cnj,k hnj,k ( x )) dx p n 1 k 1 Proposition 2 Any non-negative continuous probability measure on [0,1], is the weak* limit of a p-adic Riesz-Haar polynomial N p p j j ( j 1 cn ,k hn ,k ( x )), N 1,2,... n 0 k 1 for any p > 1, where n 1 (4) p 1 1 ( 1 an ,k j ( ) s1 an ,k s ( )); j 1,..., p 1; p 1 p cnj,k p n ,k c (5) here {an,k ( ), n 0,1,2,...; k 1,..., p n } is its p-adic infinite walks. Proof Let p > 1 and TN t1 , t2 ,..., t N be as in (3), N =1,2,… , if for any x [ s N1 , sN ) , p N 1 p p n 1 ( n 1 k 1 p c hnj,k ( x )) t s , s = 1,…,pN j j 1 n ,k (6) then one can use standard mathematical analysis arguments to get that the sequence (4) converges weak* to . To determine the coefficients of the Haar-Riesz product (5) in terms of the p-adic random walks we have to consider the following: Since the measure is probability from Proposition 1 we have ts aN , s aN 1,[ s / p ] ...a1,[ s / p N 1 ] , s = 1,…,pN , thus it suffices to determine the coefficients from the following equations: p c hnj,k ( x ) an ,m ( ) , where m is mod j j 1 n ,k ([ xp n ] 1, p) , n =1,2,…, k= 1,…,pn-1. In order to obtain (5) one can easily solve the following system: 7 .... cnp,k a n 1,m ( ) ( p 1)c .............. .... .... p n ,k c .... ... a n 1, pk 1 ( ) ..... c1n ,k cn2,k .... ( p 1)cnp,k1 cnp,k a n 1, pk p 1 ( ) c c ( p 1)c1n ,k cn2,k c .... 1 n ,k 1 n ,k 2 n ,k 2 n ,k ......... c p 1 n ,k c p n ,k 1 s 1 a n 1, pk s ( ) p 1 5 Immune type computational applications In this section we provide arguments that the tree transform on a set of strings is a natural mathematical tool for immune type pattern recognition and computational applications. (a) Detectors determined by zero walks on the tree We consider a non empty self set S with cardinality c and its p-adic tree transform. Through this section we shall denote by b(r) the number of zero walks in generation r, 1 r N , clearly 0 b(r) < pr . One can define the following set of detectors: If b(1) 0, let (without loss of generality) a1,1 ( S ) ... a1,b (1) ( S ) 0 then any string in base p with first digit from the set I (1) { 1 ,..., b (1) } is outside of S. It is trivial to see that this set I (1) of detectors provides with probability c/(p-b(1))pN -1a detection for any random string in base p and length N to be in S. In general, if b(k) 0 , 1 < k, then the b(k)-b(k-1)p zero walks provides “new” detectors for the k-th generation of the p-adic tree. Denote by I (k ) the set of new detectors, I (k ) consists of strings of length k, such that any string with its first k digits equal to a string in I (k ) is outside of S. The set of detectors k n 1 I (n ) provides detection for the self set with probability c/( pk -b(k))p(N-k) . Example 1. Consider the set S = {1,…,30} in a 5-adic tree with 4 generations. It is not difficult to see that b(1)=4, b(2)=23, b(3)=119. Then with the b(1) digits we can check 500 (= 4. 125) non self elements. In the second generation we have b(2) – b(1).5 = 3 new (say) zero walks. With the 3 new detectors we may check other 75 (= 3 .25) non self elements. Finally in the 3-th generation we have 4 (=119-5.23) new zero walks and so we can check the other 20 (= 4.5) non-zero elements. Note that the probability to check whether a random string is in S is 30/125 and 3/5 in case one use the detectors of the first and the second generation (respectively). (b) The problem of detecting periodicity The problem of detecting data’s periodicity is always a central problem in signal processing and in mathematical analysis. Since our tree transform is invertible, it has all data information and so one could detect the periodicity on this transform. Next we examine some detectors for the periodicity of some simple periodic self sets. We shall presend our results on the general problem in future publications. 8 Let S be a self set as in (2) having periodicity r i.e. tn+r = tn ,n=1,2,…,pN-r, we suppose that the cardinality of S is c < pN , we observe that the numbers of zero walks b(n), n = 1,…,N depend from the cardinality c, the number of the elements of S on a period (i.e. the number of elements of S in the segment [1, r]) and the distribution of the elements of S on each period. Now we consider a periodic self set with two frequencies r and q as bellow. 1 n i mod( r ), or, n j mod( q) T tn : tn , n 1,..., p N , i r, j q otherwise 0 To find the frequencies r and q, we figure out the following formula: n n p p b( n ) P n N n N n , n 2,3,..., N p p r q Example 2 Let p=2, N= 6, r =17, q = 23, i =1 and q = 8. In this case it is easy to see that b(6) = 57, b(5) = 25, b(4) = 9 and b(3) = 3. Note that the last formula satisfied exactly for n = 4,5,6. Moreover using Mathematica or MatLab one can easily get r =17 and q = 23, from a table of pairs (r,q) satisfying the equations above for n = 6,5,4. (c ) Detecting self-similar sets Since a great variety of self-similar fractal sets (see [E]) is unclassified, the problem of detecting by a standard method similarities, seems intractable. In this application we shall examine detectors for fractal-type sets of constant ratio. Roughly speaking whenever we get a self-similar set, its walks on a p-adic tree provide some indications for similarities. In this case to allocate similarities we have to examine the integer digits of this set on the suitable base. In fractal theory we call thin sets with constant ratio r, 0 < r < 1, Cantor-type sets on [0,1] constructed be the following iterated process: From the initial set J0=[0,1] we eliminate an open interval (or intervals) of length r, we denote by J 1 the reminder collection of subintervals of J0 , clearly (J0 - J1 )/J0 = r. From each subinterval of J1 we eliminate a subinterval of ratio’s r to obtain a set J2 such that (J1 – J2 )/J1 = r. With this process we may construct a sequence {Jn, n =0,1,…} of closed sets whose intersection is an infinite set, moreover note that parts of Jn+1 are dilation of Jn . As an example recall that the usual Cantor set is constructed by eliminating the middle third of each of its subintervals i.e. r = 1/3 , J 1 = [0,1/3][2/3,1], J2 = [0,1/9][2/9,1/3] [2/3,7/9][8/9,1],…. There are several other ways to present the middle third Cantor set, e.g. it is the set of all x in [0,1] such that 9 x j 3 j , j {0,2}, j 1,... or the set of all infinite strings j 1 {1 , 2 ,..., j ,...}, j 1, j 1,2,... in base 3. On our set of integers [1,pN] we call self sets with constant ratio r, 0< r < 1, sets constructed by the following finite iterated process: From the initial set J0 = [1,pN] we eliminate a set (or sets) of successive integers (called integer interval) of cardinality <r pN>, we denote by J1 the reminder collection of integer subintervals of J0 From each integer subinterval of J1 we eliminate integer subintervals of ratio’s r to obtain a set J2 . With this process we may construct a sequence {J n, n =0,1,…,N} of integer intervals. We shall denote S = JN the self set produced by this process. We shall examine whether a set S of integers (or strings) on a p-adic tree is a set of constant ratio. Moreover we shall check possible self-similarities to determine S. For the notion of similarity and self-similarity on the (topological) space of (infinite) stings on an alphabet of p letters, see [E]. Our effort is based on the following 3 observations for the case of a self set S with constant ratio r: Observation 1. The number of distinct walks on any p-adic tree is restricted (see examples bellow). This observation provides an indication for the fractal type structure of S. Observation 2. To detect the similarities of S one has to rummage the integer digits of their elements on a suitable base (usually the denumerator of the rational number r) . Observation 3. There is a natural relation between the ratio r and the numbers of zero walks b(n), n=1,…,p. The equations. We figured out that the following relations are satisfied: b(1) pr b(2) p 2 r b(1) pr b(1) p ( pr b(1)) p b(n ) p r b(n 1) pr b(n 1) p (n 1) p r b(n 1) p r b(k ) p n 2 n n k 1 n k where, n = 3,4,… . Notice that these equations with variable r are step functions and is easily to check an approximate solution for r (e.g. by exploring their graphs). According to these observations and the equations, given a set S and its walks on a padic tree we check the following: Whether the p-adic analysis of the self set has a restricted number of distinct random walks. Whether for some ratio r the numbers b(n) satisfies the equations in observation 3. Note that to get an approximate estimate of r, one can use several simple numerical methods. 10 Provided that r = k/m, where k,m are natural numbers it is self-evident to express each number in S by its integer digits expansion in base m. Thus one has to check whether some integer digits of S in base m do not occur e.g. in case of the usual Cantor set all integer digits in base 3 are i {0,2}, i 1,2,... . Next we define some self sets of constant ratio. We shall see that using the observations 1-3 and the equations above one can check their fractal-type structure. Example 3. The usual 3-adic Cantor set on a 3-adic tree Let p=3 and N=6, then pN =729; let S corresponds to the usual 3-adic Cantor set i.e. S {n (1, 1,..., 6 ) : i 1, i 1,...,6} , where i {0,2}, i 1,...,6 , are the integer digits in base 3. 1 2 We observe all non zero walks an ,k , n 1,...,6, k 1,...,3n . The number of zero walks are: b(1) =1, b(2) = 5, b(3) = 19, b(4) = 65, b(5) =211 and b(6) = 665. The equations satisfied with r =1/3 . Thus the natural bases for the integer digits expansion of S is 3 It easy to detect that all strings (1 , 1 ,..., 6 ) of S satisfies condition i 1, i 1,...,6 ; moreover the similarities of S are the shift operators (see [E]). Example 4. The usual Cantor set on a 5-adic tree Let p=5 and N=4, then pN =625; let example 3. S1 S {1,2,...,625} , We observe that all non zero walks a4,k where S is as in 1 , or,1, k 1,...,54 . 2 The number of zero walks are: b(1) = 1, b(2) =13, b(3) = 94 and b(4) = 577. The equation for n=3 satisfied with r 1/3 . Thus the natural base of integer digits expansion is 3 and we follow example 3 for further analysis. Example 5. A set with constant ratio on a 5-adic tree Let p=5 and N=4, then pN =625; let S corresponds to a thin set with constant ratio such that S {n : i (n) 1, i (n) 3, i 1,...,4} , where i (n) {0,1,2,3,4}, i 1,...,4 , are the integer digits of n = 1,…,625 in base 5. All non zero random walks are 1/3. We have that b(1) = 2, b(2) =16, b(3) = 98 and b(4) = 544. The equations satisfied with r =2/5 . We expand S in base 5 and we easily determine its detectors and similarities. (d) Entropy of information for immune type applications Let S be a collection of strings of length N in an alphabet of p letters. As we explained in the introduction, a significant immune type problem is measurement of the diversity of S (in the set of all strings of length N). A standard method for this 11 measurement is the utilization of the usual information entropy formula (see below and [B]). As we shall see this formula does not work for cases of sets S described by an alphabet p1 where p1<p and p1 does not divide p. For this reasoning we propose an entropy formula as the formulae used in previous works ([BK1], [BK2] and [BKM]) for fractals described by non-homogenous markovian processes. Note that in a recent work [FMT] for immune type algorithms, the authors proposed a method for estimates of the diversity of a collection of antibodies S written in an alphabet of p letters and having length N. This method is based on the usual information entropy formula H(S) defined as follows: 1 N H ( S ) : H m ( S ) , N m1 p 1 where H m ( S ) : pm, j log p ( pm, j ), m 1,..., N and pm,j is the probability of j 0 occurrence the digit j in generation m of the strings S . Note that in terms of our tree transform analysis of data S, pm,j is the number of nonzero walks am,k divided by (pm-b(j)), where k=j mod(p) and b(j) is as above, j=1,…,p ; m=1,…,N. This formula works perfectly for the Cantor set S in example 3, where p=3, pm,j = ½ for j=0,2 and pm,1 =0 , m=1,…,N, thus H(S) = log(2)/log(3), but does not work in case we consider a part of S (as in example 4), in a p-adic tree, such that p is not a multiple of 3. In this case we get pm,j 1/p (large m) for all j = 0,…,p-1 and so Hm(S) 1. See example 4, where p=5 , p4,j = 9/47 or 10/47 j = 0,…,4 i.e almost 1/5. One can verify this argument for several Cantor type sets in base p1 embedded on padic trees, where p1 does not divide p. Therefore the usual entropy formula could give measurement for the diversity for self sets S with ratio k/p , provided that S is examined by its p-adic digits. But in case where S is expressed in a different base, this formula is not useful, because H(S) is approximately 1 for any S. According to previous works [BK1], [BK2] and [BKM] for the entropy of non homogenious markovian processes we propose a new entropy formula for a collection of strings as follows. Let S be a collection of strings of length M, written in an alphabet of p letters p>1. As in (1) we express S as a collection of integers, let N be an integer such that the maximum element of S is less than 2N , we may consider T = {tn , n = 0,…, 2N -1: tn =1 if nS and tn =0 otherwise} and the tree transform {an,k (T ), n 1,..., N ; k 1,...,2n } We define the entropy H2(S) of S in the frame of the binary tree transform by : 2 N 1 H 2 ( S ) : (an,h ( m,k ) (T ) log an,h( m,k ) (1 an,h( m,k ) (T )) log( 1 an,h ( m,k ) (T ) N log( 2 c ) m1 n1 N where c is the cardinality of S, h ( m, k ) : 2[ 0log(0) =0. 12 m 2 N k 1 ] 1 , with the assumption that Examples 6 Since the ratio of example 5 is 2/5, it is well known that its Hausdorff dimension or entropy is log(3)/log(5) 0.682606. We apply the entropy H2(S) in the frame of a binary tree and we get H2(S) 0.667259 i.e. we have an error less than 2%. Since the ratio of the usual Cantor set (example 3) is 1/3 its well known that its entropy is log(2)/log(3) 0.63093. The entropy in the frame of a binary tree is H2(S) 0..61885 i.e. we have an error less than 2%. The analog fractal of the Sierpinski triangle in one dimension, is expressed by the digits in a 4-dic tree (see [E]) as follows: S {n : i (n) 3, i (n) {0,1,2,3}, i 1,2,...} . Its ratio is ¼ and its entropy is log(3)/(log(4) 0.792481. It is remarkable that our estimate of H(S) in the frame of binary tree gives the same number i.e. H2(S) = 0.792481. Note in the forthcoming work [BK3] we give a variety of entropies formulas for pattern recognition of chaotic data. Other-type problems (e) Non linear denoising filter. Let T = {tn , n = 1,…, pN } be non negative data, and W = {wn , n = 1,…, pN } be a white noise added on T. If the variability of T is c i.e. max{| tn - tn+1 | : n = 1,…, pN -1} = c , then it is obvious that the walks of the tree transform of T are related to c. If we call self data, all data with variability less than c a replacement of some on the walks of the tree transform of T+W by other walks could denoise T+W making it a self data. The authors prepare a work for signal denoising using the tree transform. (f) Edge detection of images As an application of the local variability on 2-dimensions one could consider an image of 3n x 3n pixels as a 3-adic tree. Using the multiresolution structure of the tree transform one could make easily edge detection of images by erasing the internal square of the nine squares appearing on this 3-adic division of an image . 13 REFERENCES [B] P. Bilingsley. "Ergodic Theory and Information" New York, Wiley, 1965. [BK1] A. Bisbas and C. Karanikas. “On the Hausdorff dimension of Rademacher Riesz Products", Monatshefte Fur Mathematik ,110,15-21 (1990). [BK2] A. Bisbas and C. Karanikas. "Dimension and entropy of a non ergodic Markovian Process and it's relation to Rademacher Riesz products". Monatshefte Fur Mathematik 118,21-32 (1994). [BK3] A. Bisbas and C. Karanikas. "Dimension and entropy for pattern recognition of chaotic data" (under preperation) [BKM] A. Bisbas C. Karanikas and W. Moran. "Tameness for the Distribution os sums of Markov Random Variables" Math. Proc. Cambrdge Phil. Soc vol 121 (1),1997115-128. [BKP] A. Bisbas, C. Karanikas and G. Proios. "On the distribution of digits dyadic expansions " Results Math. (1998), no 3-4, 330-341. [E] G.A. Erger. “Measure,Topology and Fractal Geometry”. Springer-Verlag, New York, 1990. [DA] D. Dasgupta and N. Attoch-Okine. Immunity-Based Systems: A Survey. In Procceedings of the IEEE Conference, Man and Cybernetics, Orlando 1997. [DFP] P. D’haeseleer, S. Forrest and P. Helman. An Immunological approach to change detections: algorithms, analysis, and implications. In the Procceedings of the IEEE Symbosium on Research in security and Privacy. Oakland, CA, May 1996. [FKP] R. Feferman, C. Kenigand and J. Pipher. "The theory of weights and the Dirichlet problem of elliptic equations". Annals of Math. 134 (1991),65-124. [FMT T. Fukuda, K. Mori and M. Tsukiyama. "Parallel Search for Multi-Model Function Optimization with Diversity and Learning of Immune Algorithm", pp.210219. Artificial Immune Systems and Their Applications, D. Dasgupta (Editor), Springer, (1998). [K] C. Karanikas “The Hausdorff Dimension of fractals with very weak selfsimilarity” Chaos Solitons Fractals, 11,(2000), no.1-3, 275-280. [S] A. Shiryayev. "Probability". Berlin-Heidelberg-New York: Springer 1984.