Recall that any string of length N in terms of a language of p , (p>1

advertisement
A discrete transform for pattern recognition of immune type based
on the tree structure of data *1
By
C. Karanikas and G. Proios
Department of Mathematics, Aristotle University of Thessaloniki, Thessaloniki
54006,Greece
Department of Maritime and Enterprising Services, Aegean University, Chios Greece
Abstract
It is shown, by an invertible discrete transform that any finite sequence or any
collection of strings of any length can be presented as a random walk on trees . More
generally any positive continuous probability measure can be written as a random
walk on trees and as a infinite product of wavelet-type p-adic Haar polynomials, with
coefficients related to the random walk. All these transforms create the mathematical
background needed for coding any discrete information and also for exploring the
local variability and diversity of the information. Relying on the underlying
computational algorithms, with several examples and applications we propose that
these transforms can be efficiently used for pattern recognition of immune type. In
other words we propose a mathematical platform for detecting self and non self
strings of any alphabet, based on negative selection algorithms, for scouting data’s
periodicity and self-similarity and for measuring the diversity of chaotic strings with
fractal dimension methods. In particular we estimate reliably the entropy and the ratio
of chaotic data with self similarity.
1. Introduction
Recall that any collection S of strings of length N written in an alphabet {a1 ,..., a p }
of p letters (p>1), with a correspondence ai  p  i; i  1,..., p , can be considered as
a set of integers
N
x    i p i 1 ,
(1)
i 1
where  i  0,..., p  1; i  1,..., N , clearly x is less than pN. Similarly any collection S
of integers in [0, pN –1], can be written as a collection of strings expressed by their
digits  i  0,..., p  1; i  1,..., N in the base p, as in (1). It is obvious that in both
cases S can be considered to be a set of pN observations having value 1 on any
element of S and 0 otherwise. Using another terminology S can be considered as a
sequence of pN elements such that:
T = {tn , n = 0,…, pN -1: tn =1 if nS and tn =0 otherwise}
(2)
If we call S self set and its complement non-self set the main immune type pattern
recognition problem is to recognize whether a sting of length N written in an alphabet
of p letters is self or non-self. In addition we have the problem to determine detectors
for the possible periodicity or self-similarity of S and moreover to measure its
diversity, in order to increase or replace S with another collection of strings having
1
Research supported by the EU Project IST-2000-26016 IMCOMP
different diversity. Recall that in immunology the effectiveness of a set of antibodies
against a collection of viruses depends on the diversity of antibodies. Since antibodies
can be represented as a chaotic set of strings, a natural measure of their diversity is the
information entropy.
The aim of this paper is to create the mathematical background for coding the
information of a self set on a p-adic tree in order to construct immune-type detectors,
negative selection algorithms (see [DA], [DFH] ) and measures for the diversity of
self sets (see [FMT]) using information entropy and chaotic dimensions methods.
Recall now that a tree is a standard data structure used in computer science and
elsewhere for organizing information. Information in a tree is stored in nodes, starting
with a root node and ending with terminal nodes called leaves. Nodes are linked to
other nodes through branches. Leaves are nodes without any branches. The most
common type of tree is the binary (or 2-adic) tree, in which each node has no more
than two branches.
Now we list some simple cases of self sets as in (2) :
Case 1. Suppose that the self set S is concentrated on a subinterval of [0, pN-1] e.g. S
is the left part of it. Then a simple pattern rule for detecting whether a random integer
is in S, is to identify the digits of the maximum and minimum elements of S. Note that
the diversity of this set is apparently very small.
Case 2. Suppose that the self set S has self-similarity e.g. consider the set of all
N
integers x    i 3i 1 in base 3 described by the rule  i  0,2; i  1,..., N . Note that
i 1
this set composes the usual (middle-third) Cantor set (see [E]) and a simple detector
for a string to be outside of S, is to check whether  i  1, for, some; i  1,..., N . The
diversity of this set is obviously larger than the diversity of the set discussed in
previous case. This diversity can be estimated by the information entropy (which is
log(2)/log(3), see [E]).
Case 3. Suppose that S has cardinality c and is randomly distributed on the interval
[0, pN-1], then on any large subinterval of it having length pm we get around
[c
m-N
.p
] self elements, where [x] is the upper integer part of a number x i.e. the
smallest integer > x. But on any small subinterval of [0, pN-1], it is hard to decide
whether this interval contains part of the self sets, even if we use any probability and
any computational method. Moreover in this case the diversity of S is large.
To formulate the detectors problem of self sets S and to distinguish cases as above, we
propose a non-linear transform for the corresponding sequence (2). This invertible
transform could be considered as a random walk on a p-adic tree. Some of the
advantages of this transform are the ability to detect local variability, self similarity
and to measure the information entropy .
In section 2 of this work we shall define, the non-linear tree transform of any data of
pN (p>1) non-negative observations. We shall show that this transform is invertible
and we shall examine its properties.
2
In section 3, we describe a set of M observations, where M has a prime number
factorization M  p1N1 ... prN r , N i  1, i  1,..., r , as a compound tree i.e. a tree having
N1+…+ Nr generations, such that for Ni generations each node has pi branches ( i =
1,…,r). Notice that since any self set S can be embedded, as in (2), on a set with
cardinality M, one can have a desired tree structure choosing M with suitable prime
number factorization.
In section 4 we use the results of section 2 to show that any continuous probability
measure can be written as an infinite product of p-adic Haar polynomials. Notice that
for p = 2 the construction is based on the usual Haar wavelet system and our p-adic
infinite product is in fact a generalization of known results see [FKP] and [K]. We
should note here that similar infinite products have been used to estimate the
Hausdorff dimension of certain fractals (see [BK1], [BK2], [BKM] and [BKP]). Thus
the diversity of self-similar sets is estimated by the information entropy or the fractal
dimension we believe, that the proposed tree structure of self sets is a natural tool for
computational applications of immune type.
In the last section, we give several applications, examples and problems for immune
type computing as follows:
We determine a set of detectors for the self sets, as in cases 1 and 2 above, and we
discuss negative selection algorithms, based on the tree transform, for the detection of
random strings. Moreover we discuss the problem of detection of self sets with simple
periodicity.
For self similar strings with constant ratio r, expressed as a set of integers (i.e. as in
(2)), we figured out an algorithm (see application c) to detect this ratio. This
algorithm, based on the null walks of the tree transform, is working perfectly
whenever the denumerator of r is the base of digits expansion and quite successfully
whenever we have a different base.
For collections of strings (like for example antibodies) we examine the problem of
measuring their diversities. The diversity of a collection of strings S is estimated by
the formula H2(S) (see application d), The formula H2(S) derived from the Hausdorff
dimension of measures (see [B] ) for the particular case of non-ergodic markovian
processes, as in previous works ([BK2], [BKM] and [BKP]). Moreover we compare
this Hausdorff dimension method of measuring the diversity, with the usual
information entropy formula (see formula H(S) in application d), used in [FMT] for
the diversity of antibodies.
In particular we show that for several known chaotic data S with self-similarity, the
formula H2(S) gives approximately the dimension of the corresponding chaotic
system, even with unknown similarities and fractal structure. In particular in example
6, we estimate the entropy of three known fractals (Cantor, Sierpienski and a fractal in
base 5) and we get correct estimates with accuracy less than 2%.
We discuss moreover other applications of the tree transform as signal denoising and
edge detection.
3
Finally we highlight that the main advantages of the tree transform are: the
commodity for constructing computational algorithms, the multiresolution structure,
the benefit of analyzing local properties of data and as the tree transform is a random
walk, it is especially suitable for examining fractal sets and chaos.
2. The Tree Transform
Let p, N >1 be positive integers and let T  {t1 ,..., t p N } be a non-negative data. We
define a family of vector valued functions, Rn ,k : R p  R p , n=0,1,…,N, k=1,…,pn
N
n
 kp

Rn , k (T ) : 
ts , k  1,..., p n  ,

 s  ( k 1) p N n 1

N n
pN
notice that R0,1 (T ) :  ts and that RN ,k (T )  tk ; k  1,..., p N .
s 1
This family of vector valued functions has a tree structure (p-adic tree structure)
having N+1 generations such that: R0,1(T) corresponds to the initial node of the tree;
each Rn,k(T), n=1,…,N-1 corresponds to the k node of the n-th generation and RN,k(T)
is the k branch (or leaf) of the last generation.
For any n = 1,…,N and k = 1,…, pn we call p-adic walks of T the real numbers:
Rn 1,[ k / p ] (T )  0
 0
 R (T )
an , k (T ) :  n , k
Rn 1,[ k / p ] (T )  0 .
 Rn 1,[ k / p ]

N
We call tree transform of T the map: T  {an,k (T ) : n  1,..., N ; k  1,..., p } . It is
straightforward to see that this map is not linear. We examine bellow the properties of
the tree transform of T.
Lemma 1 Let T  {t1 ,..., t p N } then we have:
1. For all n,k, an,k (T) lies on the interval [0,1] .
2. all numbers an,k (T) are invariant under dilation i.e. an,k(b.T) = an,k(T) , where
b>0 and b.T  {bt1 ,..., bt p N } .
3. For
a
[ k / p ]s
any n = 2,…,N, if
n ,k
a n 1,s  0 for some s = 1,…, pn-1, we have
(T )  1 .
4. If for some pair n,k, n < N and k = 1,…, pn, an,k(T) = 0 then for all s such that
[s/p]=k, an+1,s(T) = 0.
5. For any n = 1,..,,N, if [k/p] = k/p and a n ,k  0 , we have
a n ,k (T )  1 
a
n ,s
[ s / p ] k / p , s  k
(T )
6. If T and S are as in (2), then R0,1(T) is the cardinality of S.
The proofs of the Lemma 1 are straightforward.
4
Proposition 1. For any integer p>1 and any non-negative data T  {t1 ,..., t p N } , ,we
have:
tk  aN , k (T )aN 1,[ k / p ] (T )...a1,[ k / p N 1 ] (T ). R0,1 (T ) ,
where k = 1,…, pN.
Thus the tree transform is invertible and we write:
T  {an,k (T ) : n  1,..., N ; k  1,..., p n } .
Moreover to reconstruct T one requires pN –1 walks a n , k (T ) , n = 1,…,N and k such
that [k/p]k/p and the sum R0,1(T).
Proof
If tk  0  a N , k  0 and so the equation is true. Suppose that tk  0
Then from the definition of an , k ' s it is easy to see that
tk 
RN , k
RN 1,[ k / p ]
RN 1,[ k / p ] RN  2,[ k / p 2 ]
...
R1,[ k / p N 1 ]
R0,[ k / p N ]
.R0,1
 aN ,k aN 1,[ k / p ]...a1,[ k / p N 1 ].R0,1 ,
in fact we observe that
tk  RN ,k , and , R0,1  R0,[ k / p N ] .
Finally by 5 of Lemma 1, the reconstruction formula requires all walks
a n ,k
such
that [k/p]k/p , these are totally pN –1 .
3. Random walks on compound trees
Now we consider a non negative data series T  {t1 ,..., t M } , such that
M  p1N1 ... p rN r , N i  1, i  1,..., r , where p1  p2  ...  pr are primes.
Let N  N1  ...  N r and v(n) ,n =0,1,…,N, be the n-th element of the following
sequence :
p ,..., p
1
N1
1

, p1N 1 p21 ,..., p1N 1 p2N 2 ,..., p1N 1 p2N 2 ... prNr11 ,..., p1N 1 p2N 2 ... prNr11 p1k ,..., M ;
assume that v(0) =1. For n = 1,2,…,N, k = 1,…,v(n) we define :
 vkM

 (n)

Rn ,k (T ) :   ts , k  1,..., v(n )  ,
s ( k 1) M 1

 v(n)

notice that RN , k (T ) : tk ; k  1,..., v ( N ) and R0,1 (T ) :
M
t
s
.
s 1
As in section 2 we denote:
0


Rn ,k (T )
an ,k (T ) : 
 Rn 1,[ kv ( n 1) / M ] (T )

5
Rn 1,[ kv ( n 1) / M ] (T )  0
Rn 1,[ kv ( n 1) / M ] (T )  0
where n = 1,…,N and k = 1,…,v(n). Clearly we get random walks on this compound
tree having N generations, such that for Ni generations each node has pi branches ( i
= 1,…,r).
It is easy to check the inversion formula:
tk 
RN ,k
RN 1,[ kv ( N 1) / M ]
R N 1,[ kv ( N 1) / M ] RN  2,[ kv ( N  2 ) / M ]
...
R1,[ kv (1) / M ]
R0,1
.R0,1
 a N , k a N 1,[ kv ( N 1) / M ] ...a1,[ kv (1) / M ] .R0,1 ,
and the corresponding Lemma 1 for the compound tree transform.
4. p-adic Haar-Riesz Products
In this section we express any non-negative continuous probability measure on [0,1]
as a random walk on an infinite p-adic tree furthermore to write it as a weak * limit of
p-adic Haar polynomials. For the theory of measures we refer to [S], for Haar-Riesz
products see [K] and [BK2] .
We call p-adic Haar function the sequence of step function defined as follows
 0
[ xpn 1 ]  k

hnj,k ( x ) :   1 mod([ xpn ]  1, p )  j  1 or, j  p
 p  1 mod([ xpn ]  1, p )  j  1
j p

x[0,1], n=1,2,…, k = 1,…,pn-1, j = 1,…,p, where mod(x,p) is the modulo of x/p.
Lemma 3
(1) For p = 2, {hn1,k ( x), n  1,..., k  1,..., p n1} is the usual Haar system.
(2) For any p = 2,3,… and for fixed j = 1,…,p-1, the system
{hnj,k ( x), n  1,..., k  1,..., p n1} is orthogonal (in the L2 sense i.e.
1
h
0
j
n ,k
( x)hmj ,t ( x)dx  0, k  t, or, n  m ).
(3) For any p = 2,3,… , j, i =1,…,p-1,
1
h
0
j
n ,k
( x)hni ,k ( x)dx  0, i  j, n  1,..., k  1,..., p n1 .
The proof can be easily seen from the graphs of the corresponding functions.
Definition 1. For any p>1 and any non-negative continuous probability measure  on
[0,1], we call walks of  on the infinite p-adic tree the sequence
{an,k (  ), n  0,1,2,...; k  1,..., p n } determined as follows:
For any integer N>1, if
TN  t1 , t2 ,..., t N , where, tk :
k / pN
 d; k  1,..., p
N
( k 1) / p N
we shall write an,k (  ) : an,k (T ), n  01,2,..., N ; k  1,..., p n .
6
, N  1,2,...
(3)
Definition 2. We call p-adic Haar- Riesz product a sequence of p-adic Haar
n 1
 N p

p
polynomials   (  j 1 cnj,k hnj,k ( x )), N  1,2,... . We say that the p-adic Haar n 1 k 1

Riesz product converges weak* to a measure  on [0,1], if for any continuous function
f on [0,1] we have:
1
 f ( x)d ( x)  lim
0
1
N 

0
N
p n 1
f ( x ) (  j 1 cnj,k hnj,k ( x )) dx
p
n 1 k 1
Proposition 2
Any non-negative continuous probability measure  on [0,1], is the weak* limit of a
p-adic Riesz-Haar polynomial
 N p

p
j
j
 (  j 1 cn ,k hn ,k ( x )), N  1,2,...
 n 0 k 1

for any p > 1, where
n 1
(4)
p 1
1
( 1  an ,k  j (  )  s1 an ,k s (  )); j  1,..., p  1;
p
1

p
cnj,k 
p
n ,k
c
(5)
here {an,k (  ), n  0,1,2,...; k  1,..., p n } is its p-adic infinite walks.
Proof
Let p > 1 and TN  t1 , t2 ,..., t N  be as in (3), N =1,2,… , if for any x  [ s N1 , sN ) ,
p
N 1 p
p
n 1
 (
n 1 k 1
p
c hnj,k ( x ))  t s , s = 1,…,pN
j
j 1 n ,k
(6)
then one can use standard mathematical analysis arguments to get that the sequence
(4) converges weak* to  .
To determine the coefficients of the Haar-Riesz product (5) in terms of the p-adic
random walks we have to consider the following: Since the measure  is probability
from Proposition 1 we have
ts  aN , s aN 1,[ s / p ] ...a1,[ s / p N 1 ] , s = 1,…,pN , thus it
suffices to determine the coefficients from the following equations:

p
c hnj,k ( x )  an ,m (  ) , where m is mod
j
j 1 n ,k
([ xp n ]  1, p)
, n =1,2,…, k= 1,…,pn-1. In
order to obtain (5) one can easily solve the following system:
7
....
 cnp,k

a n 1,m (  )
 ( p  1)c
..............
....
....
p
n ,k
c
....

...
a n 1, pk 1 (  )
.....
 c1n ,k
 cn2,k
....  ( p  1)cnp,k1
 cnp,k

a n 1, pk  p 1 (  )
c
c
( p  1)c1n ,k
 cn2,k
c
....
1
n ,k
1
n ,k
2
n ,k
2
n ,k
.........  c
p 1
n ,k
c
p
n ,k
 1  s 1 a n 1, pk  s (  )
p 1
5 Immune type computational applications
In this section we provide arguments that the tree transform on a set of strings is a
natural mathematical tool for immune type pattern recognition and computational
applications.
(a) Detectors determined by zero walks on the tree
We consider a non empty self set S with cardinality c and its p-adic tree transform.
Through this section we shall denote by b(r) the number of zero walks in generation
r, 1  r N , clearly 0  b(r) < pr . One can define the following set of detectors:
If b(1)  0, let (without loss of generality) a1,1 ( S )  ...  a1,b (1) ( S )  0 then any string
in base p with first digit from the set I (1)  { 1 ,...,  b (1) } is outside of S. It is trivial to
see that this set I (1) of detectors provides with probability c/(p-b(1))pN -1a detection
for any random string in base p and length N to be in S.
In general, if b(k)  0 , 1 < k, then the b(k)-b(k-1)p zero walks provides “new”
detectors for the k-th generation of the p-adic tree. Denote by I (k ) the set of new
detectors, I (k ) consists of strings of length k, such that any string with its first k
digits equal to a string in I (k ) is outside of S. The set of detectors
k

n 1
I (n ) provides
detection for the self set with probability c/( pk -b(k))p(N-k) .
Example 1. Consider the set S = {1,…,30} in a 5-adic tree with 4 generations. It is
not difficult to see that b(1)=4, b(2)=23, b(3)=119. Then with the b(1) digits we can
check 500 (= 4. 125) non self elements. In the second generation we have b(2) –
b(1).5 = 3 new (say) zero walks. With the 3 new detectors we may check other 75 (=
3 .25) non self elements. Finally in the 3-th generation we have 4 (=119-5.23) new
zero walks and so we can check the other 20 (= 4.5) non-zero elements. Note that the
probability to check whether a random string is in S is 30/125 and 3/5 in case one use
the detectors of the first and the second generation (respectively).
(b) The problem of detecting periodicity
The problem of detecting data’s periodicity is always a central problem in signal
processing and in mathematical analysis. Since our tree transform is invertible, it has
all data information and so one could detect the periodicity on this transform. Next we
examine some detectors for the periodicity of some simple periodic self sets. We shall
presend our results on the general problem in future publications.
8
Let S be a self set as in (2) having periodicity r i.e. tn+r = tn ,n=1,2,…,pN-r, we
suppose that the cardinality of S is c < pN , we observe that the numbers of zero walks
b(n), n = 1,…,N depend from the cardinality c, the number of the elements of S on a
period (i.e. the number of elements of S in the segment [1, r]) and the distribution of
the elements of S on each period.
Now we consider a periodic self set with two frequencies r and q as bellow.


1 n  i mod( r ), or, n  j mod( q)
T  tn : tn  
, n  1,..., p N , i  r, j  q 
otherwise
0


To find the frequencies r and q, we figure out the following formula:

 


 

n
n
p
p
b( n )  P n   N n    N n , n  2,3,..., N
 p   p 

 

 r   q 
Example 2
Let p=2, N= 6, r =17, q = 23, i =1 and q = 8.
In this case it is easy to see that b(6) = 57, b(5) = 25, b(4) = 9 and b(3) = 3.
Note that the last formula satisfied exactly for n = 4,5,6. Moreover using Mathematica
or MatLab one can easily get r =17 and q = 23, from a table of pairs (r,q) satisfying
the equations above for n = 6,5,4.
(c ) Detecting self-similar sets
Since a great variety of self-similar fractal sets (see [E]) is unclassified, the problem
of detecting by a standard method similarities, seems intractable. In this application
we shall examine detectors for fractal-type sets of constant ratio. Roughly speaking
whenever we get a self-similar set, its walks on a p-adic tree provide some indications
for similarities. In this case to allocate similarities we have to examine the integer
digits of this set on the suitable base.
In fractal theory we call thin sets with constant ratio r, 0 < r < 1, Cantor-type sets on
[0,1] constructed be the following iterated process: From the initial set J0=[0,1] we
eliminate an open interval (or intervals) of length r, we denote by J 1 the reminder
collection of subintervals of J0 , clearly (J0 - J1 )/J0 = r. From each subinterval of J1
we eliminate a subinterval of ratio’s r to obtain a set J2 such that (J1 – J2 )/J1 = r. With
this process we may construct a sequence {Jn, n =0,1,…} of closed sets whose
intersection is an infinite set, moreover note that parts of Jn+1 are dilation of Jn .
As an example recall that the usual Cantor set is constructed by eliminating the
middle third of each of its subintervals i.e. r = 1/3 , J 1 = [0,1/3][2/3,1], J2 =
[0,1/9][2/9,1/3] [2/3,7/9][8/9,1],…. There are several other ways to present the
middle third Cantor set, e.g. it is the set of all x in [0,1] such that
9

x    j 3 j ,  j {0,2}, j  1,... or
the
set
of
all
infinite
strings
j 1
{1 ,  2 ,...,  j ,...},  j  1, j  1,2,... in base 3.
On our set of integers [1,pN] we call self sets with constant ratio r, 0< r < 1, sets
constructed by the following finite iterated process: From the initial set J0 = [1,pN] we
eliminate a set (or sets) of successive integers (called integer interval) of cardinality
<r pN>, we denote by J1 the reminder collection of integer subintervals of J0 From
each integer subinterval of J1 we eliminate integer subintervals of ratio’s r to obtain
a set J2 . With this process we may construct a sequence {J n, n =0,1,…,N} of integer
intervals. We shall denote S = JN the self set produced by this process.
We shall examine whether a set S of integers (or strings) on a p-adic tree is a set of
constant ratio. Moreover we shall check possible self-similarities to determine S. For
the notion of similarity and self-similarity on the (topological) space of (infinite)
stings on an alphabet of p letters, see [E].
Our effort is based on the following 3 observations for the case of a self set S with
constant ratio r:
Observation 1. The number of distinct walks on any p-adic tree is restricted (see
examples bellow). This observation provides an indication for the fractal type
structure of S.
Observation 2. To detect the similarities of S one has to rummage the integer digits of
their elements on a suitable base (usually the denumerator of the rational number r) .
Observation 3. There is a natural relation between the ratio r and the numbers of zero
walks b(n), n=1,…,p.
The equations. We figured out that the following relations are satisfied:
b(1)   pr 
b(2)  p 2 r  b(1) pr  b(1) p  ( pr  b(1)) p



b(n )  p r  b(n  1) pr   b(n  1) p  (n  1) p r  b(n  1) p  r  b(k ) p
n 2
n
n

k 1
n k



where, n = 3,4,… . Notice that these equations with variable r are step functions and is
easily to check an approximate solution for r (e.g. by exploring their graphs).
According to these observations and the equations, given a set S and its walks on a padic tree we check the following:
 Whether the p-adic analysis of the self set has a restricted number of distinct
random walks.
 Whether for some ratio r the numbers b(n) satisfies the equations in
observation 3. Note that to get an approximate estimate of r, one can use
several simple numerical methods.
10

Provided that r = k/m, where k,m are natural numbers it is self-evident to
express each number in S by its integer digits expansion in base m. Thus one
has to check whether some integer digits of S in base m do not occur e.g. in
case of the usual Cantor set all integer digits in base 3 are
 i {0,2}, i  1,2,... .
Next we define some self sets of constant ratio. We shall see that using the
observations 1-3 and the equations above one can check their fractal-type structure.
Example 3. The usual 3-adic Cantor set on a 3-adic tree
Let p=3 and N=6, then pN =729; let S corresponds to the usual 3-adic Cantor set i.e.
S  {n  (1, 1,...,  6 ) :  i  1, i  1,...,6} , where  i {0,2}, i  1,...,6 , are the
integer digits in base 3.
1
2
We observe all non zero walks an ,k  , n  1,...,6, k  1,...,3n .
The number of zero walks are: b(1) =1, b(2) = 5, b(3) = 19, b(4) = 65, b(5) =211 and
b(6) = 665.
The equations satisfied with r =1/3 . Thus the natural bases for the integer digits
expansion of S is 3 It easy to detect that all strings (1 , 1 ,...,  6 ) of S satisfies
condition  i  1, i  1,...,6 ; moreover the similarities of S are the shift operators (see
[E]).
Example 4. The usual Cantor set on a 5-adic tree
Let p=5 and N=4, then pN =625; let
example 3.
S1  S  {1,2,...,625} ,
We observe that all non zero walks a4,k 
where S is as in
1
, or,1, k  1,...,54 .
2
The number of zero walks are: b(1) = 1, b(2) =13, b(3) = 94 and b(4) = 577.
The equation for n=3 satisfied with r 1/3 . Thus the natural base of integer digits
expansion is 3 and we follow example 3 for further analysis.
Example 5. A set with constant ratio on a 5-adic tree
Let p=5 and N=4, then pN =625; let S corresponds to a thin set with constant ratio
such
that
S  {n :  i (n)  1,  i (n)  3, i  1,...,4} ,
where
 i (n) {0,1,2,3,4}, i  1,...,4 , are the integer digits of n = 1,…,625 in base 5.
All non zero random walks are 1/3.
We have that b(1) = 2, b(2) =16, b(3) = 98 and b(4) = 544. The equations satisfied
with r =2/5 .
We expand S in base 5 and we easily determine its detectors and similarities.
(d) Entropy of information for immune type applications
Let S be a collection of strings of length N in an alphabet of p letters. As we
explained in the introduction, a significant immune type problem is measurement of
the diversity of S (in the set of all strings of length N). A standard method for this
11
measurement is the utilization of the usual information entropy formula (see below
and [B]). As we shall see this formula does not work for cases of sets S described by
an alphabet p1 where p1<p and p1 does not divide p. For this reasoning we propose an
entropy formula as the formulae used in previous works ([BK1], [BK2] and [BKM])
for fractals described by non-homogenous markovian processes.
Note that in a recent work [FMT] for immune type algorithms, the authors proposed a
method for estimates of the diversity of a collection of antibodies S written in an
alphabet of p letters and having length N. This method is based on the usual
information entropy formula H(S) defined as follows:
1 N
H ( S ) :  H m ( S ) ,
N m1
p 1
where H m ( S ) :  pm, j log p ( pm, j ), m  1,..., N and pm,j is the probability of
j 0
occurrence the digit j in generation m of the strings S .
Note that in terms of our tree transform analysis of data S, pm,j is the number of
nonzero walks am,k divided by (pm-b(j)), where k=j mod(p) and b(j) is as above,
j=1,…,p ; m=1,…,N.
This formula works perfectly for the Cantor set S in example 3, where p=3, pm,j = ½
for j=0,2 and pm,1 =0 , m=1,…,N, thus
H(S) = log(2)/log(3), but does
not work in case we consider a part of S (as in example 4), in a p-adic tree, such that p
is not a multiple of 3. In this case we get pm,j  1/p (large m) for all j = 0,…,p-1 and so
Hm(S)  1. See example 4, where p=5 , p4,j = 9/47 or 10/47 j = 0,…,4 i.e almost 1/5.
One can verify this argument for several Cantor type sets in base p1 embedded on padic trees, where p1 does not divide p.
Therefore the usual entropy formula could give measurement for the diversity for self
sets S with ratio k/p , provided that S is examined by its p-adic digits. But in case
where S is expressed in a different base, this formula is not useful, because H(S) is
approximately 1 for any S.
According to previous works [BK1], [BK2] and [BKM] for the entropy of non
homogenious markovian processes we propose a new entropy formula for a collection
of strings as follows.
Let S be a collection of strings of length M, written in an alphabet of p letters p>1. As
in (1) we express S as a collection of integers, let N be an integer such that the
maximum element of S is less than 2N , we may consider
T = {tn , n = 0,…, 2N -1: tn =1 if nS and tn =0 otherwise}
and the tree transform {an,k (T ), n  1,..., N ; k  1,...,2n }
We define the entropy H2(S) of S in the frame of the binary tree transform by :
2
N
1
H 2 ( S ) :
 (an,h ( m,k ) (T ) log an,h( m,k )  (1  an,h( m,k ) (T )) log( 1  an,h ( m,k ) (T )
N log( 2 c ) m1 n1
N
where c is the cardinality of S, h ( m, k ) : 2[
0log(0) =0.
12
m
2
N k 1
]  1 , with the assumption that
Examples 6
Since the ratio of example 5 is 2/5, it is well known that its Hausdorff dimension or
entropy is log(3)/log(5)  0.682606. We apply the entropy H2(S) in the frame of a
binary tree and we get H2(S)  0.667259 i.e. we have an error less than 2%.
Since the ratio of the usual Cantor set (example 3) is 1/3 its well known that its
entropy is log(2)/log(3)  0.63093. The entropy in the frame of a binary tree is H2(S)
 0..61885 i.e. we have an error less than 2%.
The analog fractal of the Sierpinski triangle in one dimension, is expressed by the
digits in a 4-dic tree (see [E]) as follows:
S  {n :  i (n)  3,  i (n)  {0,1,2,3}, i  1,2,...} .
Its ratio is ¼ and its entropy is log(3)/(log(4)  0.792481. It is remarkable that our
estimate of H(S) in the frame of binary tree gives the same number i.e.
H2(S) = 0.792481.
Note in the forthcoming work [BK3] we give a variety of entropies formulas for
pattern recognition of chaotic data.
Other-type problems
(e) Non linear denoising filter. Let T = {tn , n = 1,…, pN } be non negative data, and
W = {wn , n = 1,…, pN } be a white noise added on T. If the variability of T is c i.e.
max{| tn - tn+1 | : n = 1,…, pN -1} = c , then it is obvious that the walks of the tree
transform of T are related to c. If we call self data, all data with variability less than c
a replacement of some on the walks of the tree transform of T+W by other walks
could denoise T+W making it a self data. The authors prepare a work for signal
denoising using the tree transform.
(f) Edge detection of images
As an application of the local variability on 2-dimensions one could consider an
image of 3n x 3n pixels as a 3-adic tree. Using the multiresolution structure of the tree
transform one could make easily edge detection of images by erasing the internal
square of the nine squares appearing on this 3-adic division of an image .
13
REFERENCES
[B] P. Bilingsley. "Ergodic Theory and Information" New York, Wiley, 1965.
[BK1] A. Bisbas and C. Karanikas. “On the Hausdorff dimension of Rademacher
Riesz Products", Monatshefte Fur Mathematik ,110,15-21 (1990).
[BK2] A. Bisbas and C. Karanikas. "Dimension and entropy of a non ergodic
Markovian Process and it's relation to Rademacher Riesz products". Monatshefte Fur
Mathematik 118,21-32 (1994).
[BK3] A. Bisbas and C. Karanikas. "Dimension and entropy for pattern recognition
of chaotic data" (under preperation)
[BKM] A. Bisbas C. Karanikas and W. Moran. "Tameness for the Distribution os
sums of Markov Random Variables" Math. Proc. Cambrdge Phil. Soc vol 121
(1),1997115-128.
[BKP] A. Bisbas, C. Karanikas and G. Proios. "On the distribution of digits dyadic
expansions " Results Math. (1998), no 3-4, 330-341.
[E] G.A. Erger. “Measure,Topology and Fractal Geometry”. Springer-Verlag, New
York, 1990.
[DA] D. Dasgupta and N. Attoch-Okine. Immunity-Based Systems: A Survey. In
Procceedings of the IEEE Conference, Man and Cybernetics, Orlando 1997.
[DFP] P. D’haeseleer, S. Forrest and P. Helman. An Immunological approach to
change detections: algorithms, analysis, and implications. In the Procceedings of the
IEEE Symbosium on Research in security and Privacy. Oakland, CA, May 1996.
[FKP] R. Feferman, C. Kenigand and J. Pipher. "The theory of weights and the
Dirichlet problem of elliptic equations". Annals of Math. 134 (1991),65-124.
[FMT T. Fukuda, K. Mori and M. Tsukiyama. "Parallel Search for Multi-Model
Function Optimization with Diversity and Learning of Immune Algorithm", pp.210219. Artificial Immune Systems and Their Applications, D. Dasgupta (Editor),
Springer, (1998).
[K] C. Karanikas “The Hausdorff Dimension of fractals with very weak selfsimilarity” Chaos Solitons Fractals, 11,(2000), no.1-3, 275-280.
[S] A. Shiryayev. "Probability". Berlin-Heidelberg-New York: Springer 1984.
Download