Balázs SZIKLAI - Central European University

advertisement

MS Research Proposal

On the symmetry of finite binary pseudorandom sequences

Sziklai Balázs

Supervisor: Katalin Gyarmati

June, 2009

Central European University, Budapest

Department of Mathematics and its Applications

Motivation

The generation of pseudorandom numbers plays an important role in many fields of mathematics, in particular in the problems of cryptography or numerical analysis. Without the intent of completeness, here are a few examples, where pseudorandom numbers are used: a) Simulation. When a computer is being used to simulate natural phenomena, random numbers are required to make things realistic. b) Sampling. It is often impractical to examine all possible cases, but a random sample will provide insight into what constitutes ‘typical’ behavior. c) Numerical analysis. Ingenious techniques for solving complicated numerical problems have been devised using random numbers. d) Computer programming. Random values make a good source of data for testing the effectiveness of computer algorithms. e) Cryptography (see below).

An example

The one-time pad 1 (OTP) is one of the most secure encrypting algorithms in cryptography. It was invented by Gilbert Vernam and Joseph Mauborne in 1917 and used widely in the second World War and the Cold War [1]. The OTP encrypts the message by assigning to each letter of the original plaintext a random one. The way to do this is to convert the plaintext to a binary sequence. Then one needs to generate a random sequence of the same length (called key-stream) and add those together bitwise modulo 2 (XOR function). With the help of the key-stream its very easy to find out what the original message was, however

1 The ‘pad’ part of the name comes from early implementations where the key material was distributed as a pad of paper, so the top sheet could be easily torn off and destroyed after use.

1

without that its nearly impossible. It was proved by Shannon in 1949 that even with infinite computing capacity one can not decypher the text. Why is that so? Lets see an example! For convinience sake I will use decimal numbers and addition modulo 26. Suppose Anita would like to send a message ‘HELLO’ to Barnabás.

A – 1

B – 2

...

X – 24

Y – 25

Z – 26

So HELLO looks like this: 8 5 12 12 15 and our randomly generated string is: 23 7 21

20 12 (WGUTL by letters). The result of the encryption is 8 + 23 = 5 (E); 5 + 7 = 12 (L); 12 +

21 = 7 (G); 12 + 20 = 6 (F); 15 + 12 = 1 (A). So Barnabás gets ELGFA and with the keystream he can succesfully decypher the message. If Cecilía intercepts the encrypted message, she will not find out the original message even with the help of infinite computational power.

Surely after a few seconds she would find that the key-stream WGUTL would produce the plaintext HELLO, but also she would find that KJFFJ would produce SORRY! In fact, it is possible to "decrypt" out of the ciphertext any message whatsoever with the same number of characters, simply by using a different key, and there is no information in the ciphertext which will allow Cecilia to choose among the various possible readings of the ciphertext. However this nice property relies entirely on the fact, that the key was generated randomly, and that is how the history of random number generators (RNG) begins.

2

History

The idea of the very first RNG came from John von Neumann who invented the computer itself. He suggested the following process to produce n long random integers:

1) Take any 2n long integer

2) Cut the middle part (which is an n long integer)

3) Square it

4) Repeat from 2)

The algorithm yields n long integers which are more or less independent from each other. However the proceedure is quite unreliable. There are two basic problems with it. First it can easily degenerate to zero, second it often ends up in a (short) cycle.

In 1949 Lehmer introduced the linear congruential method (LCM). The random numbers are constructed via multiplication over a large modulus [2]. The generalized modell looks like the following:

X n+1

= aX n

+ c (mod m) n = 0,1,2,3,…

Where X

0

is the starting value, a is the multiplyer, c is called the increment and m is the modulus. The process depends entirely on the these 4 factors. Hull&Dobell proved that it is indeed an effective way to produce random integers if one is choosing the components carefully i.e.

THM (Hull&Dobell 1962): The LCM has a period lenght m if and only if, i) c is relatively prime to m ii) a-1 is a multiple of p , for every prime p dividing m iii) a-1 is a multiple of 4 if m is a multiple of 4

Eventually every algorithm that produces pseudorandom numbers gets into a loop sooner or later. The above theorem guarantees however that the LCM can avoid this problem since m can be chosen arbitrarily large.

3

Another widespread method is the linear feedback shift register (LFSR). It is a shift register whose input is a linear function of the previous state, where the linear function works with exclusive ors and shifts. It was developed for binary number system. For instance lets produce 4 digit long binary numbers with the following LFSR:

1) Combine the 4 th

and 3 rd

bits by exclusive-or

2) The register is shifted one step to the left (so the 4 th

bit disappears)

3) The result of the exclusive-or is entered into the first position

So if we had the starting value 0110, we would get 1101, then 1010 and so on. This particular LFSR has a period 15, which we can consider maximal since obviously the zero cannot appear anywhere in the process and if we set the zero as the starting value then we would get the zero sequence. This example shows that one can achieve long periods with

LFSR which is a basic requirement for pseudorandom number generators.

4

Open questions

Mathematicians develop newer and newer methods to produce pseudorandom numbers. There are several ways to compare these methods. For practical purposes one may use statistical tests such as ‘frequency’ or ‘poker test’ [3]. Testing of this type called aposteriori testing by Knuth [2]. From theoretical point of view one needs a definition to classify RNGs.

Definition 1: A pseudorandom bit generator is said to pass all polynomial time statistical tests if no polynomial time algorithm can correctly distinguish between an output sequence of the generator and a ‘truly’ random sequence of the same length with probability significantly greater than 1/2.

This definition could be disputed. For example, when we would like to generate a finite pseudorandom binary sequence, say, of length N, then this definition can not be used: it says nothing about the polynomial in the ‘polynomial time algorithm’, there is no restriction for the degree or the coefficients of the polynomial. How do they depend on N? Another problem with this definition is that the criterion measures only the quality of the pseudorandom bit generator, but not that of the sequences constructed. Finally, the nonexistence of a polynomial time algorithm has never been shown yet. Consequently no unconditional proof for the cryptographical security of a pseudorandom bit generator has been given yet.

Kolmogorov and Chaitin introduced the notion of so-called Turing-Kolmogorov-

Chaitin complexity [4]. While this complexity measure is of theoretical interest, there is no algorithm known for computing it, hence it has no apparent practical significance. More interesting is:

Definition 2: The linear complexity of a finite binary sequence E

N

, denoted by L(E

N

), is the length of the shortest linear feedback shift register that generates a sequence having E

N

as its first N terms.

Recently Mauduit and Sárközy [4] proposed new measures of pseudorandomness of binary sequences (called well-distribution and correlation measure). It was shown that the

5

Legendre symbols form a good pseudorandom sequence. Numerous other binary sequences have been tested for pseudorandomness by J. Cassaigne, S. Ferenczi, C. Mauduit, J. Rivat and A. Sárközy. However, these constructions produce only ‘few’ pseudorandom sequences, while in many applications, e.g. in cryptography one needs ‘large’ families of ‘good’ pseudorandom sequences.

Pseudorandom number generation is a fresh field in computer science, but as the above examples show it has many applications. A possible research goal could be generating another large family of pseudorandom numbers, which satisfy the criterias established by

Mauduit and Sárközy i.e. they have small well distribution and correlation measure.

6

References

[1] Wikipedia – One-Time Pad article

[2] Donald E. Knuth: The Art of Computer Programming, Vol. 2. Addison-Wesley Publishing

Company (1969)

[3] George Marsaglia: Monkey Tests for Random Number Generators; Compters &

Mathematics with applications (1993)

[4] Rudolf Ahlswede, Levon Khachatrian, Christian Mauduit, and András Sárközy: A complexity measure for families of binary sequences; Period. Math. Hungar. 46 (2003)

7

Download