orthogonal arrays application to pseudorandom numbers generation

advertisement
ORTHOGONAL ARRAYS APPLICATION TO PSEUDORANDOM NUMBERS
GENERATION AND OPTIMIZATION PROBLEMS
A.G.Chefranov †‡, T.A.Mazurova ‡, I.D.Sidorov ‡, T.S.Letia
†Eastern Mediterranean University, Gazimagusa, North Cyprus
‡Taganrog State University of Radio-Engineering, Taganrog, Russia
Technical University of Cluj-Napoca, Cluj-Napoca, Romania
Abstract: Orthogonal arrays based algorithms for generation of pseudorandom numbers
are proposed, results of comparative testing are presented. Effective permutations
enumeration used for their implementation is introduced. Uniformity of orthogonal
arrays rows distribution is estimated. Orthogonal arrays based optimization algorithms
are proposed and tested.
Keywords: orthogonal arrays, pseudorandom numbers generators, optimization
1. INTRODUCTION
Orthogonal arrays (OA) are widely used in many
areas (Hedyat, et al., 1999). In (Mazurova and
Chefranov, 2001; Mazurova and Chefranov, 2002;
Chefranov and Mazurova, 2001) there was
formulated algorithmic approach for generation of
orthogonal arrays of arbitrary strength requiring
number of possible values of their elements to be
prime (number of levels L). In OA of strength t with
L levels having Lt rows and L+1 columns with
elements from {0,..,L-1} each combination of t
columns contains without repetition all Lt
combinations of numbers {0,..,L-1}. In (Chefranov,
et al., 2002) there were suggested approaches for
generation of pseudorandom sequences using
enumeration of combinations of columns and writing
their rows in line. This results in very long not
repeated sequences of numbers even in the case of
rather small values of L and t. Thus, for L=7 and t=4
period length is equal to 107537. This approach
requires generation of full OA; in the mentioned
above case it contains 74 rows each of 7 numbers
from {0,..,6}. For effective enumeration of
permutations for this algorithm special partially
factorial numbers representation is introduced. As an
alternative saving memory there may be used
approach to pseudorandom numbers generation using
directly formula for generating elements of OA but in
some defined by user sequence. Pseudorandom
sequence may be obtained as a linear sequence of its
rows, sequence of rows or first row being a key for
such generation. Section 1 contains general formula
for OA elements building, which is used in proposed
pseudorandom numbers OA-based generators,
algorithms description, and results of comparative
testing of these approaches and other generators.
Effective enumeration of permutations based on
special partially factorial numbers representation was
introduced for implementation of OA-based
generators. Section 2 presents results on estimation
of distances between OA rows, uniformity
distribution of these vectors is shown, which may
serve as basis for search enumeration processes in
optimization problems; results of application OAbased optimization algorithms are considered.
Conclusion contains discussion of work done.
2. OA-BASED PSEUDORANDOM NUMBERS
GENERATORS
Let’s assume that number of levels L used in OA is a
prime number. Elements of OA with
columns may be obtained as follows:
Lt rows and L
t 1
N   N  j Lj ,
j 0
t 1
OAL ,t ( N , k )  ( N  j k j ) mod L,
(1)
j 0
N  0, Lt  1,
k  0, L  1
It may be shown that rows of this matrix are unique,
and each combination of t columns non repeatedly
t
contains all possible L combinations of values from
{0,..,L-1}. Let’s introduce OA-based pseudorandom
numbers generating algorithms OA-PRNGA1 and
OA-PRNGA2.
OA-PRNGA1:
1. Choose L,t and some permutation of OA rows and
permutation of some t columns as a seed of
algorithm
2. Run through OA rows inside selected columns
outputting OA elements as pseudorandom
numbers. To improve statistical characteristics of
generated sequence some transformation may be
applied to it (for example, 0-s from this sequence
are deleted, values are taken modulus 2, resulting
bit sequence is transformed by grouping 2
neighboring bits, 00-s and 11-s are deleted, 01-s
and 10-s are converted into 0-s and 1-s
respectively, and then XORing of neighboring
bytes).
3. Take next combination of this or next t columns,
repeat step 2
4. Take next permutation of OA rows, repeat steps
2,3
Algorithm OA-PRNGA1 assumes preliminary
building of OA and keeping it fully in memory. It
leads to the selection of not large values of L,t in that
algorithm and necessity of having efficient way of
permutations enumeration. For this sake there may be
used special partially factorial-based numbers
representation:
n 1
x   ( xi 
i 0
0  xi  i,
n
 j ),
j i  2
n
by the following recurrent algorithm enumerating all
possible permutations of n elements if all
permutations of n-1 elements are already built.
Iteration of this algorithm follows: let we have
permutation a1a2…an-1. Substituting n serially on all
possible (numerated as 0,.., n-1) positions in this
initial permutation we get n new permutations of n
elements: na1a2…an-1, a1na2…an-1, …, a1a2…nan-1,
a1a2…an-1n. For example, we enumerate all
permutations of 3 elements 0,1,2: 0 (all permutations
of 1 element are ready), 10, 01 (all permutations of 2
elements are got), 210, 120, 102 (got by substitution
of 2 into 1st permutation of size 2), 201, 021, 012 (all
permutations are got). Value xi may be defined as
place for insertion chosen in this algorithm for i-th
element while getting particular permutation, for
example, if we take permutation 012 then x0 = 0, x1
=1, x2 =2, and according to (2) x=3*x1+1*x2=5, for
permutation 201 we get x0 = 0, x1 =1, x2 =0, x=
3*x1+1*x2=3, value of x0 will always be equal to 0.
Next algorithm doesn’t assume before head full OA
calculations but each time for producing of the next
number formula (1) is to be applied.
OA-PRNGA2:
1. Choose L,t, some OA row number N and some
combination of 2L numbers from {0,..,L-1}
mix[2,L] as a seed.
2. Calculate next L OA row numbers and mix them
with array mix, outputting L numbers taken as L
outputs of generator. This mixture may be taken,
for
example,
as
out[i ]  (OArow[i ] * mix[1, i ]  mix[2, i ])
mod L,
i  0, L  1, where out[i] stands
for i-th output pseudorandom number obtained
from current OA row, i-th element of which is
denoted as OArow[i], i  0, L  1 .
t
3. Take next row cyclically by L , repeat step 2.
This leads to sequences with period estimated by
Lt*(L+1) with possible number of keys equal to
Lt*(2L)L.
These algorithms together with widely used alleged
RC4 generator (Pudovkina, 2002) and standard
pseudorandom generator implemented in Delphi 5.0
were tested using the following tests (Varfolomeev,
et al., 2000): 1) number of 1-s in generated bit
sequence (length>=20000 bit), statistics is
1 k Qs2
(
)  n , where k=2, ps is estimated
n s 1 p s
probability of appearance of s-th value, Qs is really
F1 
(2)
 j 1
j  n 1
where n is a length of permutation, xi – numbers,
i  0, n  1, which can be uniquely obtained from
permutation, and which uniquely define respective
permutation. For example, permutation is obtained
counted number of appearances of s-th value in the
sequence of n bits, s  0,1 ; must have asymptotic
distribution as  with k-1 levels of freedom; 2)
number of m-bit vectors, statistics is given by:
2
2 m 2 1 2
F2 
(  N i )  k , where Ni is a frequency of
k i 0
m
between rows of array (1). Distance |u-v|L for 2 ndimensional vectors with integer components from
SL ={0,..,L-1} is introduced as
appearance of i-th vector in their sequence of k
vectors; must have asymptotic distribution as
n
u  v L =  d L (u j , v j ) ,
 2 with 2m  1
sequences
d L (a, b)  min{ abs(a  b),
k
( B  Ei ) 2
(G  Ei ) 2
is F3   i
, where
 i
Ei
Ei
i 1
i 1
k
Bi is number of series of 0-s of length i, Gi is number
of series of 1-s of length i, Ei=(n-i+3)/(2i+2) is
expected number of series, k is maximal integer i for
which Ei > 5; must have asymptotic distribution as
 2 with
2k-2 levels of freedom; 4) coefficient of
sequential correlation showing dependency of the
next symbol on the previous one, calculated as
n2
F4 
n 1
n( U iU i 1  U n 1U 0 )  ( U i ) 2
i 0
i 0
n 1
n U  ( U i )
i 0
2
i
L  abs(a  b)},
a, b  SL
2
Value of this statistics must be in the range
[  n  2 n ,  n  2 n ] in 95% of occasions,
(5)
oai  oak L  d L (i1 , k1 ) 
L 1
d
L
((i1 j  i0 ) mod L, (k1 j  k0 ) mod L) 
d L (i1 , k 1 ) 
L 1
i 0
.
Distance (5) between 2 numbers reflects circular
character of considered set SL. From (3)-(5) we can
write:
j 0
.
n 1
(4)
j 1
levels of freedom; 3) number of
of
1-s
and
0-s,
statistics
 min{|
j 0
((i1  k 1 ) j  i0  k 0 ) mod L |, (6)
L  | ((i1  k 1 ) j  i0  k 0 ) mod L |}
where
1
,
n  
(n  1)
n 
1
n(n  3)
, n>2; 5) size Sz after
(n  1)
n 1
numbers. In (6) we have expressions of the form
(aj+b)modL, where |a|,|b| are from SL, and j runs
from 0 to L-1.
compression of the sequence by archive utilities
WinZip and WinRar as a ratio of resulting size to the
initial one, measured in percents. It shows level of
predictability of the sequence. Good sequence must
not be compressed by such utilities.
Statement 1. (aj+b)modL, where |a|,|b| are from SL,
and j runs from 0 to L-1, takes all possible values
from SL.
Results of testing using Pentium 1.7 GHz, 256 Mb,
are given in Table 1. No compression by archive
utilities in the given examples was achieved but there
occurred situations with up to 1/3 compression. From
Table 1 it follows that characteristics of investigated
algorithms are practically the same but suggested
OA-based algorithms guarantee very long period of
generated sequences.
3. UNIFORMITY OF OA ROWS DISTRIBUTION
AND OA APPLICATION TO OPTIMIZATION
PROBLEMS
where
i  i1 L  i0 , k  k 1 L  k 0 are OA rows
Proof. Let’s assume contrary that there exist j1<>j2
such that
(aj1+b)modL=(aj2+b)modL.
Then we obtain that
aj1+b=c1L+r
aj2+b=c2L+r
(7)
where c1,c2 are some integers, |a|,|b|,j1,j2,r are from
SL. Subtracting second equality from the 1st one in
(7) we get
a(j1-j2)=cL,
(8)
Let’s consider orthogonal arrays OA of strength t=2,
having L+1 columns, L2 rows, and elements of which
according to (1) may be calculated as
where c=c1-c2. Expression (8) means that product of
two values each of which is less than prime L is
divided by L that is impossible. Obtained
contradiction (8) proves Statement 1.
oaij  (i1 j  i0 ) mod L, i  i1 L  i0 ,
Let’s determine possible values for
i  0, L2  1, j   1, L  1
(3)
For this particular case of t=2 additional column (-1)
may be used without violation of the resulting array
to be OA. Let’s investigate Hamming-like distances
d L (a, b) .
Statement 2. If a,b are from SL, then d L(a,b) are from
{0,1,..,(L-1)/2}.
Proof. If |a-b|>L/2 then by (5) dL(a,b)=L-|a-b|. L is a
prime number, hence, it is odd, and if |a-b|>L/2, it
means that |a-b| are from {(L-1)/2+1,..,L-1}. For such
a,b we have that dL(a,b) belongs to {1,..,(L-1)/2}.So,
we have that for both cases: |a-b|>L/2 and not greater,
dL(a,b) values are from {0,..,(L-1)/2}.
Corollary. If |a-b| runs through 0,..,L-1 then dL(a,b)
runs through 0,1,..,(L-1)/2,(L-1)/2,..,1.
Proof. At first distance grows with growing of
absolute value of difference, but then begins to fall
down to 1.
oai  oak
Theorem. Distance
L
between any two
rows of OA with L levels, L is a prime, L+1
columns,
strength
t=2,
where
i  i1 L  i0 , k  k 1 L  k 0 , is
CL 
L
i 1
greatest
integer
less
than
25
1
1


f 4 ( x) 500 j 1
or
equal
1
,
2
j   ( xi  aij )
xi ,
to
where
6
i 1
a1 j  32  (( j  1) mod 5) *16 ,
a 2 j  32  (( j  1)div 5) *16 , j  1,25 , which
L 1
 d L ((i1 j  i0 ) mod L,
were used in (Karaboga, et al., 2001), and
j 0
Case
n 1
f 3 ( x)   || xi || , where || xi || represents the
(9)
 d L (i1 , k 1 ) 
d L (i1 , k 1 ) 
i 1
(“Rosenbrock’s Saddle”), and its generalization
i 1
L 1
2
d L (i1 , k 1 )  2(1  .. 
n
f1 ( x)   xi2 , f 2 ( x)  100( x12  x 2 ) 2  (1  x1 )
n
2
(k 1 j  k0 ) mod L) 
OA-based optimization algorithms were applied to
optimization
of
such
functions
as
2
Proof. Case i1  k 1 . From (4), Statements 1, 2 and
Corollary we obtain that
oai  oak
Obtained results (9) allow explaining why OA are
applicable in optimization and statistical experiments
– because their rows are points scattered uniformly in
the investigated space as torus-like structure. Such
estimations allow choosing appropriate number of
OA levels taking into account maximal frequency of
spatial oscillations of being investigated function.
f 21 ( x)  (1  x1 )  100( xi2  xi 1 ) 2 ,
oai  oak L =
d (i , k )  CL , i1  k1
,
{ L 1 1
Ld L (i0 , k0 ), i1  k1
them comprised of L vectors, uniformly distributed
in (L+1)-dimensional space.
f 5 ( x)  ( x12  x 2  11) 2  ( x1  x 22  7) 2
(10)
(“Himmelblau's Function”)
generalizations - symmetrical
L 1
)
2
its
f 51 ( x)   ( xi2  xi mod n 1  a ( i 1) mod 21 ) 2 ,
i 1
a1  11, a2  7 ,
and
asymmetrical
-
n
f 52 ( x)   (( xi2  xi mod n 1  a ( i 1) mod 21 ) 2
i1  k 1 . From (4) we get
i 1
L 1
j 0
1996),
n
L2  1
2
oai  oak L   d L (i0 , k0 ) 
(Gold,
( 1) i 2
(11)
Ld L (i0 , k0 )
So, theorem is proved for both cases.
Formula (9) shows that vectors represented by OA
rows form regular structure in (L+1)-dimensional
space. Rows with the same value of (i div L), where
i is a row’s number, div is an integer division, form a
ring of vectors (call them 1st order rings) with
Hamming-like distance (4) between neighbors equal
to L (see (11)). These rings form a ring of the 2 nd
order: distance between any 2 points of adjacent 1st
order rings is equal to 1+CL (see (10)). So, whole
structure represented by OA may be viewed as toruslike structure having L rings of the 1st order each of
a ( i1) mod21 ) . Essence of these algorithms, as in (Gold,
1996), is in mapping of initial ranges for each
variable onto set of levels, calculation of function in
the points shown by respective columns of OA,
selection of the best point, and remapping of twice
shrunk range on the set of levels. There may be used
different approaches to select next point: 1) by
averaging of function’s values along each of
dimensions separately, and choosing the optimal one
as combination of separate optimums (AvgOA), or
find best function value and choose corresponding set
of variables as next optimum (DirOA); 2) for
remapping, next optimal point components may
retain their relative positions as in the previous act of
mapping (AvgOA, DirOA), or they become the
center of new mapped range (CenOA). Such variants
2
together with Monte-Carlo method (MCM) which
after calculation of Nmc randomly chosen function
values using current range of search selects the best
point, takes this point as a centre of twice shrunk
range, and repeats this until required precision of
determining of variables will be achieved. Number of
runs in the given range is defined by L2 for OA-based
algorithms and by Nmc for MCM. Results of
application of mentioned above algorithms to
respective tasks are given in Table 2 (Pentium 1.7
GHz, 256 Mb). They show that for little number of
variables OA-based optimization algorithms have
nearly the same productivity as
MCM, but for
larger number of variables (5-19) OA-based
algorithms give significantly better results in quality
and in timings as well. From OA-based algorithms
work best in situations with multiple variables and
asymmetrical behavior of optimized function, best
results showed DirOA. There was also tried approach
to use OA arrays of strength greater than 2 for
optimization, but it has shown significantly worse
speed characteristics without significant growth of
solutions quality.
CONCLUSION
Chefranov, A.G., and T.A. Mazurova (2001).
Support facilities of optimization of technological
and organizational processes. Proc. 3rd Int. Workshop
on Computer Science and Information Technologies,
Ufa, Yangantau, Russia, Sept. 21-26, 2001, Ufa:
USATU, 2, p. 45-49.
Chefranov, A.G.,
T.A. Mazurova and L.K.
Babenko (2002). About application of orthogonal
arrays for generating of pseudorandom sequences.
Proceedings of the 4th International Workshop on
Computer Science and Information Technologies
CSIT’2002, Patras, Greece.
Gold, S. (1996). Comparison of global optimization
methods,http://www.ecs.umass.edu/mie/labs/mda/me
chanism/steve/global/paper.html
Hedayat, A.S., N.J.A. Sloane and J. Stufken.
(1999). Orthogonal arrays: theory and applications,
405 p., Springer-Verlag, New York.
Karaboga N., A. Kalinli, and D. Karaboga (2001).
An immune algorithm for numeric function
optimization. In: Proc. 10th Turkish Symposium on
Artificial Intelligence and Neural Networking,
Gazimagusa, TRNC, p. 111-119
Applicability of two proposed OA-based algorithms
for pseudorandom numbers generating with
extremely large periods is shown by comparison with
well known algorithms. Special partially factorial
numbers representation was introduced for effective
enumeration of permutations for generated
pseudorandom sequences. Such sequences may be
used as stream ciphers for encryption in networks
and for random number generation while modeling.
Uniformity of scattering of points represented by OA
rows is estimated. OA-based optimization algorithms
iteratively reducing scope of search with the help of
OA are considered. Relatively low computational
complexity and sufficient accuracy especially in the
case of large number of variables allows using them
for optimization in control processes.
Mazurova, T.A., and A.G. Chefranov (2001).
Algorithm for generation of orthogonal arrays of
strength 3 and its backing. In: Proc. 4th all-Russia
Sci. Conf. «New information technologies,
development and aspects of application”, p.120-122,
TSURE, Taganrog (in Russian).
Work was supported by grants 03-07-90075-в, 0107-90211-ск of Russian Fund for Fundamental
Research.
Varfolomeev, A.A., A.E. Zukov and M.A.
Pudovkina (2000). Stream cryptosystems. The basic
properties and methods of cryptanalysis resistance.
PAIMS, Moscow (in Russian).
Mazurova, T.A., and A.G. Chefranov (2002).
About orthogonal arrays of arbitrary strength
generating. Izvestiya TRTU, Special issue, Taganrog,
1, p.101-103 (in Russian).
Pudovkina, M. (2002). Statistical weaknesses in the
alleged RC4 keystream generator. Proceedings of the
4th International Workshop on Computer Science and
Information Technologies CSIT’2002, Patras,
Greece.
REFERENCES
Table 1. Results of comparative testing of pseudorandom numbers generators
Characteristics\
Algorithms
OA-PRNGA1
Number of bytes
2500
Number of 1s; F1 ;
probability
of
such
value
appearance
10070;
0.98;
[0.5;0.75]
F2 ; value of m; 26.5;
OA-PRNGA2 (L=257,
t=43)
RC4
250000
2500
250000
2500
250000
2500
250000
1001057;
10165;
9995;
0.996e6;
5.45;
0.996e6;
32.9;
9995;
1.117;
1.011e6;
236.3;
0.05;
25.3;
[0.05;
[0.999;1]
[0.75;0.95]
[0.95;
[0.999;1]
0.98]
4;
265.4;8;
301.7; 8;
0.05;
[0.05;
0.25]
6566;8;
235.5;8;
Delphi
[0.999;
1]
249;8;
0.25]
291.8;8;
1231;8;
probabilities
[0.75;0.95]
[0.5;
[0.95;
0.75]
0.99]
F3 ; value of k; 22.95;9;
39.4;
17.59;9;
probabilities
16;
[0.20;
[0.75;0.95]
0.80]
F4 ; acceptable 0.004;
4e-4;
0.009;
range
[-1.49e-2;
1.41e-2]
[-1.4e-3;
1.4e-3]
Sz, %
100
Duration, sec
0.04
[0.80; 0.90]
[0.99;1]
[0.05;
[0.25;
[0.75;
[0.99;
0.25]
0.5]
0.95]
1]
406;16;
21.55;9;
36;16;
10.57;9;
371.8;
[0.99;1]
[0.80;
[0.75;
[0.10;
16;
0.90]
0.95]
0.20]
[0.99;1]
-1e-3;
-9e-3;
1e-4;
3e-3;
-5e-4;
[-1.49e-2;
1.41e-2]
[-1.4e-3;
1.4e-3]
[-1.49e-2;
1.41e-2]
[-1.4e-3;
1.4e-3]
[-1.49e-2;
1.41e-2]
[-1.4e-3;
1.4e-3]
100
102
100
102
100
102
100
2.73
0.01
0.741
0.01
0.671
0.01
0.671
Table 2. Results of comparative testing of OA-based optimization algorithms. Each cell has tuples of the
form (obtained optimum, initial interval, number of levels for OA-based algorithms/number of runs per
iteration for MCM, duration in seconds) for respective algorithm and function(n), n is a number of
function’s arguments. Precision is taken as 10-6, initial intervals: i1=[0,10[, i2=[-100,100], mep=m10p
Function(n)\Algorithm
AvgOA
DirOA
CenOA
MCM
f1(2)
(0;i1;3;0)
(0;i1;3;0)
(0;i1;3;0)
(5e-15;i1;1e2;0)
(3e-3;i1;10;0)
f1(19)
(0;i1;19;1e-2)
(0;i1;19;1e-2)
(0;i1;19;5e-2)
(32.3;i1;1e6;39.5)
(25;i1;1e5;3.99)
(141;i1;1e3;3e-2)
f2(2)
(3.3;i1;3;0)
(0.21;i1;3;0)
(0.18;i1;3;0)
(4e-8;i1;1e3;1e-2)
(5e-2;i1;1e2;0)
f21(19)
(1.3e3;i1;19;2e-2)
(1;i1;19;1e-2)
(1;i1;19;5e-2)
(4.3e2;i1;1e6;40.5)
(1.6e4;i1;1e5;4.05)
f3(2)
(0;i1;3;0)
(0;i1;3;0)
(0;i1;3;0)
(0;i1;1e2;0)
(1;i1;10;0)
f3(19)
(0;i1;19;3e-2)
(0;i1;19;2e-2)
(0;i1;19;6e-2)
(14;i1;1e6;70.5)
(18;i1;1e5;7)
f4(2)
(12.7;i2;3;0)
(12.7;i2;3;0)
(12.7;i2;3;0)
(1;i2;1e6;83.5)
(1;i2;1e4;0.84)
(12,7;i2;10;0)
f51(3)
(0.37;i1;3;0)
(0.16;i1;3;0)
(9e-2;i1;3;0)
(3e-6;i1;100;0)
(354;i1;10;0)
f51(19)
(0.96;i1;19;0.02)
(44.3;i1;19;1e-2)
(8.3e2;i1;19;1e-2)
(9.1;i1;1e6;52.6);
(182.3;i1; 1e5;5.25);
(70.94;i1; 1e4;0.5);
f52(3)
(6.3;i1;3;0)
(78.7;i1;3;0)
(6.1;i1;3;0)
(5e-2;i1;100;0)
(721;i1;10;0)
f52(19)
(52.4;i1;19;1e-2)
(3.3;i1;19;2e-2)
(1.1e4;i1;19;8e-2)
(163;i1;1e6;64.9)
(640;i1;1e5;6.51)
(1.3e4;i1;1e3;6e-2)
Download