ORTHOGONAL ARRAYS APPLICATION TO PSEUDORANDOM NUMBERS GENERATION AND OPTIMIZATION PROBLEMS A.G.Chefranov †‡, T.A.Mazurova ‡, I.D.Sidorov ‡, T.S.Letia †Eastern Mediterranean University, Gazimagusa, North Cyprus ‡Taganrog State University of Radio-Engineering, Taganrog, Russia Technical University of Cluj-Napoca, Cluj-Napoca, Romania Abstract: Orthogonal arrays based algorithms for generation of pseudorandom numbers are proposed, results of comparative testing are presented. Effective permutations enumeration used for their implementation is introduced. Uniformity of orthogonal arrays rows distribution is estimated. Orthogonal arrays based optimization algorithms are proposed and tested. Keywords: orthogonal arrays, pseudorandom numbers generators, optimization 1. INTRODUCTION Orthogonal arrays (OA) are widely used in many areas (Hedyat, et al., 1999). In (Mazurova and Chefranov, 2001; Mazurova and Chefranov, 2002; Chefranov and Mazurova, 2001) there was formulated algorithmic approach for generation of orthogonal arrays of arbitrary strength requiring number of possible values of their elements to be prime (number of levels L). In OA of strength t with L levels having Lt rows and L+1 columns with elements from {0,..,L-1} each combination of t columns contains without repetition all Lt combinations of numbers {0,..,L-1}. In (Chefranov, et al., 2002) there were suggested approaches for generation of pseudorandom sequences using enumeration of combinations of columns and writing their rows in line. This results in very long not repeated sequences of numbers even in the case of rather small values of L and t. Thus, for L=7 and t=4 period length is equal to 107537. This approach requires generation of full OA; in the mentioned above case it contains 74 rows each of 7 numbers from {0,..,6}. For effective enumeration of permutations for this algorithm special partially factorial numbers representation is introduced. As an alternative saving memory there may be used approach to pseudorandom numbers generation using directly formula for generating elements of OA but in some defined by user sequence. Pseudorandom sequence may be obtained as a linear sequence of its rows, sequence of rows or first row being a key for such generation. Section 1 contains general formula for OA elements building, which is used in proposed pseudorandom numbers OA-based generators, algorithms description, and results of comparative testing of these approaches and other generators. Effective enumeration of permutations based on special partially factorial numbers representation was introduced for implementation of OA-based generators. Section 2 presents results on estimation of distances between OA rows, uniformity distribution of these vectors is shown, which may serve as basis for search enumeration processes in optimization problems; results of application OAbased optimization algorithms are considered. Conclusion contains discussion of work done. 2. OA-BASED PSEUDORANDOM NUMBERS GENERATORS Let’s assume that number of levels L used in OA is a prime number. Elements of OA with columns may be obtained as follows: Lt rows and L t 1 N N j Lj , j 0 t 1 OAL ,t ( N , k ) ( N j k j ) mod L, (1) j 0 N 0, Lt 1, k 0, L 1 It may be shown that rows of this matrix are unique, and each combination of t columns non repeatedly t contains all possible L combinations of values from {0,..,L-1}. Let’s introduce OA-based pseudorandom numbers generating algorithms OA-PRNGA1 and OA-PRNGA2. OA-PRNGA1: 1. Choose L,t and some permutation of OA rows and permutation of some t columns as a seed of algorithm 2. Run through OA rows inside selected columns outputting OA elements as pseudorandom numbers. To improve statistical characteristics of generated sequence some transformation may be applied to it (for example, 0-s from this sequence are deleted, values are taken modulus 2, resulting bit sequence is transformed by grouping 2 neighboring bits, 00-s and 11-s are deleted, 01-s and 10-s are converted into 0-s and 1-s respectively, and then XORing of neighboring bytes). 3. Take next combination of this or next t columns, repeat step 2 4. Take next permutation of OA rows, repeat steps 2,3 Algorithm OA-PRNGA1 assumes preliminary building of OA and keeping it fully in memory. It leads to the selection of not large values of L,t in that algorithm and necessity of having efficient way of permutations enumeration. For this sake there may be used special partially factorial-based numbers representation: n 1 x ( xi i 0 0 xi i, n j ), j i 2 n by the following recurrent algorithm enumerating all possible permutations of n elements if all permutations of n-1 elements are already built. Iteration of this algorithm follows: let we have permutation a1a2…an-1. Substituting n serially on all possible (numerated as 0,.., n-1) positions in this initial permutation we get n new permutations of n elements: na1a2…an-1, a1na2…an-1, …, a1a2…nan-1, a1a2…an-1n. For example, we enumerate all permutations of 3 elements 0,1,2: 0 (all permutations of 1 element are ready), 10, 01 (all permutations of 2 elements are got), 210, 120, 102 (got by substitution of 2 into 1st permutation of size 2), 201, 021, 012 (all permutations are got). Value xi may be defined as place for insertion chosen in this algorithm for i-th element while getting particular permutation, for example, if we take permutation 012 then x0 = 0, x1 =1, x2 =2, and according to (2) x=3*x1+1*x2=5, for permutation 201 we get x0 = 0, x1 =1, x2 =0, x= 3*x1+1*x2=3, value of x0 will always be equal to 0. Next algorithm doesn’t assume before head full OA calculations but each time for producing of the next number formula (1) is to be applied. OA-PRNGA2: 1. Choose L,t, some OA row number N and some combination of 2L numbers from {0,..,L-1} mix[2,L] as a seed. 2. Calculate next L OA row numbers and mix them with array mix, outputting L numbers taken as L outputs of generator. This mixture may be taken, for example, as out[i ] (OArow[i ] * mix[1, i ] mix[2, i ]) mod L, i 0, L 1, where out[i] stands for i-th output pseudorandom number obtained from current OA row, i-th element of which is denoted as OArow[i], i 0, L 1 . t 3. Take next row cyclically by L , repeat step 2. This leads to sequences with period estimated by Lt*(L+1) with possible number of keys equal to Lt*(2L)L. These algorithms together with widely used alleged RC4 generator (Pudovkina, 2002) and standard pseudorandom generator implemented in Delphi 5.0 were tested using the following tests (Varfolomeev, et al., 2000): 1) number of 1-s in generated bit sequence (length>=20000 bit), statistics is 1 k Qs2 ( ) n , where k=2, ps is estimated n s 1 p s probability of appearance of s-th value, Qs is really F1 (2) j 1 j n 1 where n is a length of permutation, xi – numbers, i 0, n 1, which can be uniquely obtained from permutation, and which uniquely define respective permutation. For example, permutation is obtained counted number of appearances of s-th value in the sequence of n bits, s 0,1 ; must have asymptotic distribution as with k-1 levels of freedom; 2) number of m-bit vectors, statistics is given by: 2 2 m 2 1 2 F2 ( N i ) k , where Ni is a frequency of k i 0 m between rows of array (1). Distance |u-v|L for 2 ndimensional vectors with integer components from SL ={0,..,L-1} is introduced as appearance of i-th vector in their sequence of k vectors; must have asymptotic distribution as n u v L = d L (u j , v j ) , 2 with 2m 1 sequences d L (a, b) min{ abs(a b), k ( B Ei ) 2 (G Ei ) 2 is F3 i , where i Ei Ei i 1 i 1 k Bi is number of series of 0-s of length i, Gi is number of series of 1-s of length i, Ei=(n-i+3)/(2i+2) is expected number of series, k is maximal integer i for which Ei > 5; must have asymptotic distribution as 2 with 2k-2 levels of freedom; 4) coefficient of sequential correlation showing dependency of the next symbol on the previous one, calculated as n2 F4 n 1 n( U iU i 1 U n 1U 0 ) ( U i ) 2 i 0 i 0 n 1 n U ( U i ) i 0 2 i L abs(a b)}, a, b SL 2 Value of this statistics must be in the range [ n 2 n , n 2 n ] in 95% of occasions, (5) oai oak L d L (i1 , k1 ) L 1 d L ((i1 j i0 ) mod L, (k1 j k0 ) mod L) d L (i1 , k 1 ) L 1 i 0 . Distance (5) between 2 numbers reflects circular character of considered set SL. From (3)-(5) we can write: j 0 . n 1 (4) j 1 levels of freedom; 3) number of of 1-s and 0-s, statistics min{| j 0 ((i1 k 1 ) j i0 k 0 ) mod L |, (6) L | ((i1 k 1 ) j i0 k 0 ) mod L |} where 1 , n (n 1) n 1 n(n 3) , n>2; 5) size Sz after (n 1) n 1 numbers. In (6) we have expressions of the form (aj+b)modL, where |a|,|b| are from SL, and j runs from 0 to L-1. compression of the sequence by archive utilities WinZip and WinRar as a ratio of resulting size to the initial one, measured in percents. It shows level of predictability of the sequence. Good sequence must not be compressed by such utilities. Statement 1. (aj+b)modL, where |a|,|b| are from SL, and j runs from 0 to L-1, takes all possible values from SL. Results of testing using Pentium 1.7 GHz, 256 Mb, are given in Table 1. No compression by archive utilities in the given examples was achieved but there occurred situations with up to 1/3 compression. From Table 1 it follows that characteristics of investigated algorithms are practically the same but suggested OA-based algorithms guarantee very long period of generated sequences. 3. UNIFORMITY OF OA ROWS DISTRIBUTION AND OA APPLICATION TO OPTIMIZATION PROBLEMS where i i1 L i0 , k k 1 L k 0 are OA rows Proof. Let’s assume contrary that there exist j1<>j2 such that (aj1+b)modL=(aj2+b)modL. Then we obtain that aj1+b=c1L+r aj2+b=c2L+r (7) where c1,c2 are some integers, |a|,|b|,j1,j2,r are from SL. Subtracting second equality from the 1st one in (7) we get a(j1-j2)=cL, (8) Let’s consider orthogonal arrays OA of strength t=2, having L+1 columns, L2 rows, and elements of which according to (1) may be calculated as where c=c1-c2. Expression (8) means that product of two values each of which is less than prime L is divided by L that is impossible. Obtained contradiction (8) proves Statement 1. oaij (i1 j i0 ) mod L, i i1 L i0 , Let’s determine possible values for i 0, L2 1, j 1, L 1 (3) For this particular case of t=2 additional column (-1) may be used without violation of the resulting array to be OA. Let’s investigate Hamming-like distances d L (a, b) . Statement 2. If a,b are from SL, then d L(a,b) are from {0,1,..,(L-1)/2}. Proof. If |a-b|>L/2 then by (5) dL(a,b)=L-|a-b|. L is a prime number, hence, it is odd, and if |a-b|>L/2, it means that |a-b| are from {(L-1)/2+1,..,L-1}. For such a,b we have that dL(a,b) belongs to {1,..,(L-1)/2}.So, we have that for both cases: |a-b|>L/2 and not greater, dL(a,b) values are from {0,..,(L-1)/2}. Corollary. If |a-b| runs through 0,..,L-1 then dL(a,b) runs through 0,1,..,(L-1)/2,(L-1)/2,..,1. Proof. At first distance grows with growing of absolute value of difference, but then begins to fall down to 1. oai oak Theorem. Distance L between any two rows of OA with L levels, L is a prime, L+1 columns, strength t=2, where i i1 L i0 , k k 1 L k 0 , is CL L i 1 greatest integer less than 25 1 1 f 4 ( x) 500 j 1 or equal 1 , 2 j ( xi aij ) xi , to where 6 i 1 a1 j 32 (( j 1) mod 5) *16 , a 2 j 32 (( j 1)div 5) *16 , j 1,25 , which L 1 d L ((i1 j i0 ) mod L, were used in (Karaboga, et al., 2001), and j 0 Case n 1 f 3 ( x) || xi || , where || xi || represents the (9) d L (i1 , k 1 ) d L (i1 , k 1 ) i 1 (“Rosenbrock’s Saddle”), and its generalization i 1 L 1 2 d L (i1 , k 1 ) 2(1 .. n f1 ( x) xi2 , f 2 ( x) 100( x12 x 2 ) 2 (1 x1 ) n 2 (k 1 j k0 ) mod L) OA-based optimization algorithms were applied to optimization of such functions as 2 Proof. Case i1 k 1 . From (4), Statements 1, 2 and Corollary we obtain that oai oak Obtained results (9) allow explaining why OA are applicable in optimization and statistical experiments – because their rows are points scattered uniformly in the investigated space as torus-like structure. Such estimations allow choosing appropriate number of OA levels taking into account maximal frequency of spatial oscillations of being investigated function. f 21 ( x) (1 x1 ) 100( xi2 xi 1 ) 2 , oai oak L = d (i , k ) CL , i1 k1 , { L 1 1 Ld L (i0 , k0 ), i1 k1 them comprised of L vectors, uniformly distributed in (L+1)-dimensional space. f 5 ( x) ( x12 x 2 11) 2 ( x1 x 22 7) 2 (10) (“Himmelblau's Function”) generalizations - symmetrical L 1 ) 2 its f 51 ( x) ( xi2 xi mod n 1 a ( i 1) mod 21 ) 2 , i 1 a1 11, a2 7 , and asymmetrical - n f 52 ( x) (( xi2 xi mod n 1 a ( i 1) mod 21 ) 2 i1 k 1 . From (4) we get i 1 L 1 j 0 1996), n L2 1 2 oai oak L d L (i0 , k0 ) (Gold, ( 1) i 2 (11) Ld L (i0 , k0 ) So, theorem is proved for both cases. Formula (9) shows that vectors represented by OA rows form regular structure in (L+1)-dimensional space. Rows with the same value of (i div L), where i is a row’s number, div is an integer division, form a ring of vectors (call them 1st order rings) with Hamming-like distance (4) between neighbors equal to L (see (11)). These rings form a ring of the 2 nd order: distance between any 2 points of adjacent 1st order rings is equal to 1+CL (see (10)). So, whole structure represented by OA may be viewed as toruslike structure having L rings of the 1st order each of a ( i1) mod21 ) . Essence of these algorithms, as in (Gold, 1996), is in mapping of initial ranges for each variable onto set of levels, calculation of function in the points shown by respective columns of OA, selection of the best point, and remapping of twice shrunk range on the set of levels. There may be used different approaches to select next point: 1) by averaging of function’s values along each of dimensions separately, and choosing the optimal one as combination of separate optimums (AvgOA), or find best function value and choose corresponding set of variables as next optimum (DirOA); 2) for remapping, next optimal point components may retain their relative positions as in the previous act of mapping (AvgOA, DirOA), or they become the center of new mapped range (CenOA). Such variants 2 together with Monte-Carlo method (MCM) which after calculation of Nmc randomly chosen function values using current range of search selects the best point, takes this point as a centre of twice shrunk range, and repeats this until required precision of determining of variables will be achieved. Number of runs in the given range is defined by L2 for OA-based algorithms and by Nmc for MCM. Results of application of mentioned above algorithms to respective tasks are given in Table 2 (Pentium 1.7 GHz, 256 Mb). They show that for little number of variables OA-based optimization algorithms have nearly the same productivity as MCM, but for larger number of variables (5-19) OA-based algorithms give significantly better results in quality and in timings as well. From OA-based algorithms work best in situations with multiple variables and asymmetrical behavior of optimized function, best results showed DirOA. There was also tried approach to use OA arrays of strength greater than 2 for optimization, but it has shown significantly worse speed characteristics without significant growth of solutions quality. CONCLUSION Chefranov, A.G., and T.A. Mazurova (2001). Support facilities of optimization of technological and organizational processes. Proc. 3rd Int. Workshop on Computer Science and Information Technologies, Ufa, Yangantau, Russia, Sept. 21-26, 2001, Ufa: USATU, 2, p. 45-49. Chefranov, A.G., T.A. Mazurova and L.K. Babenko (2002). About application of orthogonal arrays for generating of pseudorandom sequences. Proceedings of the 4th International Workshop on Computer Science and Information Technologies CSIT’2002, Patras, Greece. Gold, S. (1996). Comparison of global optimization methods,http://www.ecs.umass.edu/mie/labs/mda/me chanism/steve/global/paper.html Hedayat, A.S., N.J.A. Sloane and J. Stufken. (1999). Orthogonal arrays: theory and applications, 405 p., Springer-Verlag, New York. Karaboga N., A. Kalinli, and D. Karaboga (2001). An immune algorithm for numeric function optimization. In: Proc. 10th Turkish Symposium on Artificial Intelligence and Neural Networking, Gazimagusa, TRNC, p. 111-119 Applicability of two proposed OA-based algorithms for pseudorandom numbers generating with extremely large periods is shown by comparison with well known algorithms. Special partially factorial numbers representation was introduced for effective enumeration of permutations for generated pseudorandom sequences. Such sequences may be used as stream ciphers for encryption in networks and for random number generation while modeling. Uniformity of scattering of points represented by OA rows is estimated. OA-based optimization algorithms iteratively reducing scope of search with the help of OA are considered. Relatively low computational complexity and sufficient accuracy especially in the case of large number of variables allows using them for optimization in control processes. Mazurova, T.A., and A.G. Chefranov (2001). Algorithm for generation of orthogonal arrays of strength 3 and its backing. In: Proc. 4th all-Russia Sci. Conf. «New information technologies, development and aspects of application”, p.120-122, TSURE, Taganrog (in Russian). Work was supported by grants 03-07-90075-в, 0107-90211-ск of Russian Fund for Fundamental Research. Varfolomeev, A.A., A.E. Zukov and M.A. Pudovkina (2000). Stream cryptosystems. The basic properties and methods of cryptanalysis resistance. PAIMS, Moscow (in Russian). Mazurova, T.A., and A.G. Chefranov (2002). About orthogonal arrays of arbitrary strength generating. Izvestiya TRTU, Special issue, Taganrog, 1, p.101-103 (in Russian). Pudovkina, M. (2002). Statistical weaknesses in the alleged RC4 keystream generator. Proceedings of the 4th International Workshop on Computer Science and Information Technologies CSIT’2002, Patras, Greece. REFERENCES Table 1. Results of comparative testing of pseudorandom numbers generators Characteristics\ Algorithms OA-PRNGA1 Number of bytes 2500 Number of 1s; F1 ; probability of such value appearance 10070; 0.98; [0.5;0.75] F2 ; value of m; 26.5; OA-PRNGA2 (L=257, t=43) RC4 250000 2500 250000 2500 250000 2500 250000 1001057; 10165; 9995; 0.996e6; 5.45; 0.996e6; 32.9; 9995; 1.117; 1.011e6; 236.3; 0.05; 25.3; [0.05; [0.999;1] [0.75;0.95] [0.95; [0.999;1] 0.98] 4; 265.4;8; 301.7; 8; 0.05; [0.05; 0.25] 6566;8; 235.5;8; Delphi [0.999; 1] 249;8; 0.25] 291.8;8; 1231;8; probabilities [0.75;0.95] [0.5; [0.95; 0.75] 0.99] F3 ; value of k; 22.95;9; 39.4; 17.59;9; probabilities 16; [0.20; [0.75;0.95] 0.80] F4 ; acceptable 0.004; 4e-4; 0.009; range [-1.49e-2; 1.41e-2] [-1.4e-3; 1.4e-3] Sz, % 100 Duration, sec 0.04 [0.80; 0.90] [0.99;1] [0.05; [0.25; [0.75; [0.99; 0.25] 0.5] 0.95] 1] 406;16; 21.55;9; 36;16; 10.57;9; 371.8; [0.99;1] [0.80; [0.75; [0.10; 16; 0.90] 0.95] 0.20] [0.99;1] -1e-3; -9e-3; 1e-4; 3e-3; -5e-4; [-1.49e-2; 1.41e-2] [-1.4e-3; 1.4e-3] [-1.49e-2; 1.41e-2] [-1.4e-3; 1.4e-3] [-1.49e-2; 1.41e-2] [-1.4e-3; 1.4e-3] 100 102 100 102 100 102 100 2.73 0.01 0.741 0.01 0.671 0.01 0.671 Table 2. Results of comparative testing of OA-based optimization algorithms. Each cell has tuples of the form (obtained optimum, initial interval, number of levels for OA-based algorithms/number of runs per iteration for MCM, duration in seconds) for respective algorithm and function(n), n is a number of function’s arguments. Precision is taken as 10-6, initial intervals: i1=[0,10[, i2=[-100,100], mep=m10p Function(n)\Algorithm AvgOA DirOA CenOA MCM f1(2) (0;i1;3;0) (0;i1;3;0) (0;i1;3;0) (5e-15;i1;1e2;0) (3e-3;i1;10;0) f1(19) (0;i1;19;1e-2) (0;i1;19;1e-2) (0;i1;19;5e-2) (32.3;i1;1e6;39.5) (25;i1;1e5;3.99) (141;i1;1e3;3e-2) f2(2) (3.3;i1;3;0) (0.21;i1;3;0) (0.18;i1;3;0) (4e-8;i1;1e3;1e-2) (5e-2;i1;1e2;0) f21(19) (1.3e3;i1;19;2e-2) (1;i1;19;1e-2) (1;i1;19;5e-2) (4.3e2;i1;1e6;40.5) (1.6e4;i1;1e5;4.05) f3(2) (0;i1;3;0) (0;i1;3;0) (0;i1;3;0) (0;i1;1e2;0) (1;i1;10;0) f3(19) (0;i1;19;3e-2) (0;i1;19;2e-2) (0;i1;19;6e-2) (14;i1;1e6;70.5) (18;i1;1e5;7) f4(2) (12.7;i2;3;0) (12.7;i2;3;0) (12.7;i2;3;0) (1;i2;1e6;83.5) (1;i2;1e4;0.84) (12,7;i2;10;0) f51(3) (0.37;i1;3;0) (0.16;i1;3;0) (9e-2;i1;3;0) (3e-6;i1;100;0) (354;i1;10;0) f51(19) (0.96;i1;19;0.02) (44.3;i1;19;1e-2) (8.3e2;i1;19;1e-2) (9.1;i1;1e6;52.6); (182.3;i1; 1e5;5.25); (70.94;i1; 1e4;0.5); f52(3) (6.3;i1;3;0) (78.7;i1;3;0) (6.1;i1;3;0) (5e-2;i1;100;0) (721;i1;10;0) f52(19) (52.4;i1;19;1e-2) (3.3;i1;19;2e-2) (1.1e4;i1;19;8e-2) (163;i1;1e6;64.9) (640;i1;1e5;6.51) (1.3e4;i1;1e3;6e-2)