A Survey on Random Number Testing Algorithms and Software Shuaiyuan Zhou Abstract This paper surveyed random number testing algorithms and software. Marsaglia’s Diehard Tests were discussed in detail, including description of all 12 testing algorithms. Also, a library – TestU01, was studied and some experiments were done with it. Keywords Randomness testing, Diehard Tests, TestU01 1. Introduction Random number is widely used in many applications, thus, a lot of work has been done regarding to random number generation. Various random number generators were proposed by researchers such as Knuth [1] during years of study. Another concern in study of random number generation is how to evaluate a certain random number generator: whether the sequence it produced is random. This is called randomness testing and a branch of algorithms were developed as well as implemented before. This paper gave a survey on random number testing algorithms and software done in my term project; and was organized as following: section 2 provided an overview of random number testing algorithms and some concrete example of testing methods; section 3 focused on Diehard Tests developed by Marsaglia; section 4 summarized different current software for testing random number generators; and in section 5, a software library - TestU01 was discussed in detail; section 6 concluded this paper. 2. Random Number Testing Methods Randomness tests, in data evaluation, as discussed in [2], are used to analyze the distribution pattern of a set of data. One practical measure of randomness based on statistical randomness is called statistical tests. According to [3], a numeric sequence is said to be statistically random when it contains no recognizable patterns or regularities; but follows a certain probability distribution. As we expected, statistical randomness in many cases is pseudo-randomness. [3] stated that statistical test is a method of making decisions of randomness by looking at the performance of experimental data or more specifically speaking, the distribution of experimental data. The basic procedure of a random number testing algorithms can be described as following: given a random number sequence produced by the generator being tested, we did some calculation and examine whether the resulted sequence exhibits a certain distribution. If it does, we may claim that the randomness of sequence is good enough and thus, the generator pass the test. Researchers proposed a variety of algorithms for randomness testing. The first tests for random numbers were published by M.G. Kendall and Bernard Babington Smith in 1938, which were listed in table 1 and discussed in [3]. In 1995, George Marsaglia published his Diehard Tests, which we will discuss in detail later in this paper. And in 2001, Institute of Standards and Technology (NIST) published another set of tests in [4] and listed in table 2. Table 1 Early Days Tests 1 2 3 4 Frequency test Serial test Poker test Gap test Table 2 NIST Tests 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Frequency (monobit) Frequency (block) Runs test Longest run of ones in a block Binary matrix rank Discrete Fourier transform (spectral) Non-overlapping template matching Overlapping template matching Maurer’s universal statistical Linear complexity Serial Approximate entropy Cumulative sums Random excursions Random excursions variants 3. Diehard Tests Diehard tests are a battery of statistical tests for measuring the quality of a set of random numbers. They were developed by George Marsaglia and first published in 1995 on a CD-ROM. I suspect that Marsaglia named his work “diehard” because these tests seemed more stringent than the existing ones at the time they were proposed, thus provided a better measurement for randomness. During years of development, Marsaglia adopted ideas of randomness testing from previous researches and designed some tests by himself, and composed this battery of Diehard tests. In Diehard, Marsaglia include 12 tests, which are summarized in the following table and would be discussed in detail later. Table 3 Diehard Tests 1 2 3 4 5 6 7 8 9 10 11 12 Birthday spacing Overlapping permutation Ranks of matrices Monkey test Count the 1s Parking lot test Minimum distance test Random sphere test The squeeze test Overlapping sums test Runs test The craps test 3.1 Birthday spacing Birthday spacing is the discrete version of Iterated-Spacing Test, which is described in [5] in detail. Marsaglia named this test “birthday spacing” since it was somehow related to the famous birthday paradox in probability theory, which stated that in a group of at least 23 randomly chosen people, there is more than 50% probability that some pair of them would have the same birthday; and for 57 or more people, this probability could be more than 99%. In [5], Marsaglia stated that duplicate birthday is not stringent enough, so he used duplicate spacing instead. The idea is as following: firstly, choose m integers I(1), I(2), …, I(m) from a random sequence produced by the RNG to be tested, whose range is from 1 to n; and treat them as m birthdays in a year of n days; secondly, sort these numbers to get I(1) ≤ I(2)≤ … I(m) and then calculate m spacings between each pair: I(2) - I(1), I(3) I(2), …, I(m-2) - I(m-1), I(1) + m - I(m); thirdly, count duplicate spacings, that is, values which appear more than once in the spacings. After we got the count of duplicate spacings, the test was done by showing that this count should asymptotically follow a Poisson distribution with parameter λ = m3/(4n). 3.2 Overlapping permutation Overlapping permutation test is an improvement to overlapping m-tuple tests discussed in [5]. The motivation lies in the idea of adopting overlapping m-tuple tests to random sequences whose elements are not independent. Since the elements are not independent, we need to take into consideration the states of overlapping tuples in the sequence, and use this newly formed sequence of states as the test bed because of the independence of occurrence of different states. The procedure for this overlapping permutation test can be illustrated by a concrete example as following. Let u1,u2, …, un be uniform variables produced by a RNG; then each of the overlapping 3-tuples (u1, u2, u3), (u2, u3, u4), … could be in one of six possible states. Thus, overlapping triples of u’s lead to a sequence of states whose elements are chosen from interval [1, 6]. After having this sequence of states, a overlapping m-tuple test was executed. More specifically, we could let wi,j,k = number of times i,j,k appears in the state sequence, and calculate µi,j,k as mean of wi,j,k and denote C as the covariance matrix and C- as the weak inverse of C; then the test is based on the fact that the equation provides a χ2-distribution. 3.3 Ranks of matrices If a matrix is viewed as a collection of row vectors, then the rank of matrix is defined as the number of independent row vectors. The rank of a matrix can vary from 0 to N; and the matrix is said to be of full rank if its rank is N. The binary rank test for matrices was discussed in [6]. Given the sequences produced by a random number generator, a binary matrix was formed by using the leftmost 32 bits of 32 variants as row vectors. In practical, the generated variables can be viewed as a sequence of computer words, and the matrix was constructed by choosing 32 distinct words. After construction of matrices, the ranks were calculated and counted. Although ranks of a matrix could be in the range of [0, 32], mathematical researches stated that a matrix with rank smaller than 29 appeared rarely. An experiment in [6] showed that of 40,000 matrices, the expected value for matrices with rank less than 29 is only 211.5, or the percentage is merely 0.529%. Given this observation, when counting the number of different matrix ranks, the matrices with ranks less than or equal to 29 were accumulated; in other words, we got the counts for matrices with ranks of 32, 31, 30 and 29 or less. After we had these counts, a chi-squared test was performed to determine the randomness of the generated sequence and to evaluate the generator. 3.4 Monkey test According to [7], the name of this test comes from infinite monkey theorem, which states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare. Monkey test is also called overlapping m-tuple test described in detail in [5]. And the basic idea of this test is as following: treat m consecutive elements in the sequence of a random number generator as “words”; count overlapping words in a stream; and the number of “words” that don’t appear should follow a certain distribution. In [8], three kinds of monkey test: Overlapping-pairs-sparse-occupancy (OPSO) test, Overlappingquadruples-sparse-occupancy (OQSO) test and DNA test were discussed. For demonstration, we may take a look at OPSO test as an example. In this test, twoletter “words” are formed from an alphabet of 210 letters. And the letters in a word are determined by a specified 10 bits from a 32-bit integer in the sequence of a random number generator. Then, we count the number of missing “words”, or in other words, letter combinations that do not appear in the entire sequences to be tested. Finally, we expected this count to have an asymptotic Poisson distribution. The OQSQ and DNA tests are similar to OPSO test; and the difference lies in the composition of alphabet as well as the combination method of words. More specifically, in OQSQ test, we form 4-letter words from an alphabet of 25 letters; and in DNA test, the alphabet is of 4-letters, somehow like the structure of DNA, and the words are composed of 10 letters. In [8], the author stated that OPSO, OQSO and DNA tests are the most effective among monkey tests by the experiment of applying various monkey tests to many different random number generators. 3.5 Count the 1s Two kinds of count the 1s tests are included in Diehard, namely count the 1’s in a stream of bytes and count the 1’s in specific bytes. The difference is how to sample the sequence to be tested – to choose successive bits or to jump around with predefined phase. The process of this test is discussed in [5] and would be summarized here. For initialization, overlapping sequences of length 5 are formed using bits chosen from a sequence generated by a random number generator; and counts of 1’s are grouped into 5 sets: {0, 1, 2}, {3}, {4}, {5} and {6, 7, 8}. Then, the counts of the appearance of each of the 55 combinations are obtained and a chi-squared test is performed on the counts to validate the randomness of the generator. 3.6 Parking lot test This test is designed to access the uniformity of points in m-space when the coordinates of the points are taken from successive elements of a sequence of the random number generator being tested. In this test, each number produced by a random number generator was treated as a point in m-space, and also as the center of a cubic “car” of specified diameter. Given the cube of the parking lot, and suppose that we park by ear, the test is done by parking randomly in the lot – we try to park arbitrarily until we succeed with a car that does not hit any of those already parked. Given this parking strategy, out of n attempts, we would have a list of k cars which parked successfully. And in [5], it is stated that after a relatively large times of attempts, the number of successful parking would follow a certain normal distribution. Also, it is mentioned that the dimension of space does not have any influence on the performance, meaning that the only thing needed consideration is the size and shape of “cars”. 3.7 Minimum distance test In this test, each variant of a generated random sequence was viewed as a point in a square of certain scale; and the subject of testing was the distance between points. Let us look at a concrete example. From a sequence generated by the random number generator being tested, we randomly picked 8,000 items and treated them as points and placed them randomly in a square. Then, the minimum distance between each pair of points could be calculated; and the test was based on the observation that the square of the distance should be exponentially distributed with a certain mean. One relevant problem is how to calculate the minimum distance. Euler distance might be the most widely used measurement. 3.8 Random sphere test This is similar to minimum distance test discussed above. The difference is that the subject to be considered is volume of spheres rather than minimum distance between each pair of points. We treat the elements of sequence being tested as points, and randomly choose 4,000 points in a cube of edge 1,000. Then, a sphere is centered on each point, and the radius used is the minimum distance from the center to another point. For testing, we rank the volume of spheres and use the smallest one as the subject. [9] stated that the smallest sphere’s volume should be exponentially distributed. The issues regarding to various methods for calculating minimum distance discussed in previous section also need to be token into consideration here when choosing the radius of each sphere. 3.9 The squeeze test The squeeze test was designed for random number generators that produce random floats on [0,1). For generators whose sequences are integers, we could apply a mapping, or, the normalization process to project the sequences being tested to floats on [0,1). This linear operation would not affect the correctness of result. The basic idea in designing this test is denoted as “squeezing” in the name, which is, given an item of the sequence being tested, a process to reduce it to 1 was performed in the test. More specifically speaking, the processing is done as following: starting with k = 231, the test finds j, which is the number of iterations necessary to reduce k to 1, using the reduction , with U from the sequence being tested. In the test, such j’s are found 100,000 times, and counts of the number of times j was ≤ 6, 7, …, 47, ≥ 48 are obtained. We accumulate j’s that are too small (less than 6) or too large (greater than 48) and count them together since empirical experience told us that the number of times j fell into these ranges is relatively small. After having these counts, a chi-squared test could be performed on them to indicate the fitness of the generator being tested. 3.10 Overlapping sums test Overlapping sums test discussed in [6] and [10] takes floats on [0, 1] as the input, meaning that, at the very beginning, sequences being tested need to be normalized and floated to get uniform, U[0,1], variables, namely: U1, U2, …, UN. Then overlapping sums are constructed such that S1 = U1+ U2+…+ U100; S2 = U2+ U3+…+ U101; …; Si = Ui+ Ui+1+…+ Ui+100 etc. The result of the central limit theorem showed that these S’s are approximately multivariate normal. The next step is taking 100 of S’s and converting them into standard normal variables, by multiplying (S - µ) by a matrix A, where µ is mean value of S and A satisfies . A further manipulation is to convert these standard normal variants into uniform, U[0,1], variables by the inverse normal cumulative distribution function. Finally, we carry out a chi-squared test on these uniform variables. 3.11 Runs test Runs test, also called Wald-Wolfowitz test in [11] is similar to the longest run within a block test discussed in [10]. The focus of this test is the total number of runs in the sequence generated by the random number generator being tested. And the sequence can be viewed as being composed of 0’s and 1’s. A run, as in [11], is defined as a maximal non-empty segment of the sequence consisting of adjacent equal elements; or, in other words, an uninterrupted sequence of identical bits bounded before and after with a bit of opposite value. In a sequence of length N, there are N0 occurrences of 0 and N1 occurrences of 1. (so N = N0 + N1). And the test is done by showing that N0 and N1 are normally distributed with mean and variance . 3.12 The craps test This test comes from the game of craps, in which players place wagers on the outcome of the roll or a series of rolls of two dice. In this test, we play 200,000 games of craps, and count the wins and the numbers of throws per game. Then we perfume a chi-squared test to each count. 4. Software for Testing RNGs Quite a few suits of software for testing random number generators are developed by various research groups by implementing the testing algorithms discussed above, making some modification in purpose of special applications, or incorporating some new testing algorithms. Some widely used testing suits are TestU01, DIEHARD, SPRNG, NIST, ENT and Crypt-X. In this section, we overview all these test suits and in next section TestU01 will be examined in detail. 4.1 TestU01 TestU01 is software library, implemented by Pierre L’Ecuyer from Université de Montréal in the ANSI C language, and offering a collection of utilities for the empirical statistical testing of uniform random number generators. TestU01 is free to download and the latest version as well as user’s guide could be found in [12]. 4.2 DIEHARD DIEHARD is the implementation of Marsaglia’s battery of Diehard tests discussed in Section 3. More information regarding to this software could be found in [14]. The author provided source code in both C and Fortran version, the complied program as well as user’s guide in the CD-ROM. 4.3 SPRNG The SPRNG library [15] is another public-domain software that implements the classical tests for random number generators given by Knuth in his book, plus a few others. Although tests proposed by Knuth are now seen to be quite mild, allowing known “bad” generators to pass the tests, SPRNG is of value since the quality of tests depends on the application and in certain cases, Knuth’s tests perform fairly well. 4.4 NIST Statistical Test Suite (NIST) NIST Statistical Test Suite [4] is a statistical package consisting of 16 tests released by National Institute of Standards and Technology in 2001. The testing algorithms are NIST tests mentioned in Section 2. Since it is a result of collaborations between the Computer Security Division and the Statistical Engineering Division at NIST, this test suit is especially suitable for testing the randomness of arbitrary long binary sequences produced by either hardware or software based cryptographic random or pseudo-random number generators. Thus, NIST test suite is usually used to test and certificate random number generators used in cryptographic applications. 4.5 ENT ENT Program developed by John Walker in 1998 discussed in [16] is a set of statistical tests to test for non-randomness in sequences. It is described as being useful for those evaluating pseudo-random number generators for encryption and statistical sampling applications, compression algorithms, and other applications where the information density of a file is of interest. 4.6 Crypt-X Crypt-X suite of statistical test was developed by researchers at the Information Security Research Centre at Queensland University of Technology in Australia and is a commercial software package. Crypt-X supports steam ciphers, block ciphers and key stream generators. Information could be found in [17]. 5. TestU01 In this section, we will focus on TestU01 – a software library implementing a series of classical statistical testing algorithms as well as several types of random number generators. Started in 1981 as a Pascal program to implement the test suggested by Knuth, TestU01 kept developing over years by adding new testing algorithms and new generators as well. Now, the latest version - TestU01-1.2.3, written in ANSI C language, was released on August 18th, 2009. In the following, the organization or the architecture of the library will be discussed and some practical issues such as installation and usage of the library will be addressed. 5.1 Organization According to user’s guide [13], the software tools of TestU01 are organized in four classes of modules: those implementing random number generators, those implementing statistical tests, those implementing pre-defined batteries of tests, and those implementing tools for applying tests to entire families of generators. The modules are named u, s, b, and f respectively; and within each module, the types, variables and functions are prefixed by a letter identical to the corresponding module. 5.1.1 The generator implementations The u module provides the basic tools for defining and implementing uniform random number generators. A set of uniform generators were already programmed in this library, such as the linear congruential generators in module ulcg, the multiple recursive generators in umrg, the inversive generators in uinv, etc. Besides these intrinsic generators, functions to combine two or more generators are also provided in this module, such as unif01_CreateCombAdd2() or unif01_CreateCombXor2(). What is more, some manipulation could be done with various functions: for example, getting the current state of a generator by unif01_WriteState(), filtering the output in different ways such as combining successive values in the sequence with function unif01_CreateDoubleGen(), and testing the speed of a particular generator by creating a timer unif01_TimerGen() and then getting the result unif01_WriteTimerRec(). 5.1.2 The statistical tests The statistical tests are implemented in the s modules. To apply a test to a particular generator, the generator must first be created by a proper function in u module, and then passed as a parameter to the function implementing the required test in s module. The test results would be printed automatically to the standard output. Various tests are organized in different modules. For example, in module sknuth, the classical statistical tests for random number generators described in Knuth’s book are implemented and in module smarsa, the battery of diehard tests discussed in Section 3 is implemented. There are other modules of implementation of tests which are grouped together according to their similarity. 5.1.3 Batteries of tests In b module, some predefined suites of standard statistical test, with fixed parameters are included. This module provides convenience for users since different types of tests are incorporated in the test suites and what users need is to pass generators to be tested as a parameter to these test suites. A number of predefined batteries of tests are available in TestU01, such as the battery Pseudo-DIEHARD applies most of the tests in Marsaglia’s diehard suite and the battery FIPS_140_2 implements the small suite of tests of the NIST standard. 5.1.4 Tools for testing families of generators The f modules provide a set of tools, which are built on top of the modules that implement the generator families and the test, and designed to perform systematic studies of the interaction between certain types of tests and the structure of the point sets produced by given families of random number generators. For each family to be tested, a random number generator of period of length near 2i has been selected, for all integers i in some interval. Typically, a given random number generator would fail a test as the sample size increases. Given this fact, the test is done by figuring out the sample size at which the test starts to reject the random number generator as a function of the period of the generator being tested. 5.2 Installation In [12], both the source code and the compiled binary files are all provided. Also, an installation guide is available, talking about compilation and installation under different platforms. Based on my experience in this term project, I would like to say something more about the installation under Microsoft Windows. In my project, I used TestU01 under Microsoft Windows in Cygwin, which provides a complete UNIX-like environment, specifically, the GCC compiler, the Bourne shell and a lot of other tools. Although I used the precompiled binary files containing the compiled TestU01 library, I found that the GCC compiler is still needed when I tried to link executables against the library. Since I used compiled version, the installation is simply to unzip the library in the root directory, which could be done by following commands: cd / uzip bin-cygwin.zip After successful execution of the above command, the files were extracted in usr/include, /usr/lib and /usr/share/TestU01. Another thing I would like to mention here is the setting of environmental variables. Before running TestU01, we need to set the PATH in Cygwin with the following command: export PATH = /usr/bin: $ {PATH}. 5.3 Experiments and results For experiment, I will demonstrate the usage of TestU01 library by an existing example of birthday spacing test. This sample code could be found in /examples: birth1.c and is cited here: #include #include #include #include "unif01.h" "ulcg.h" "smarsa.h" <stddef.h> int main (void) { unif01_Gen *gen; gen = ulcg_CreateLCG (2147483647, 397204094, 0, 12345); smarsa_BirthdaySpacings (gen, NULL, 1, 1000, 0, 10000, 2, 1); smarsa_BirthdaySpacings (gen, NULL, 1, 10000, 0, 1000000, 2, 1); ulcg_DeleteGen (gen); return 0; } As we can see, we could use the library simply by calling some functions. In this particular example, the functions of u module and s module were utilized. The two functions from u module dealt with the initialization and deletion of a particular linear congruential generator whose modulus, multiplier, additive constant and initial state are specified as parameters in the creation function. What is more, the birthday spacing test is done by the function smarsa_BirthdaySpacings(). Two tests were carried out on the same generator with different parameters and the difference could be seen from the output of the program: ***************************************************************** ulcg_CreateLCG: m = 2147483647, a = 397204094, c = 0, s = 12345 smarsa_BirthdaySpacings test: ----------------------------------------------------------------N = 1, n = 1000, r = 0, d = 10000, t = 2, Order = 1 Number of cells = d^t = 100000000 Lambda = Poisson mean = 2.5000 -----------------------------------------------------------------Total expected number = N*Lambda : 2.50 Total observed number : Significance level of test 6 : 0.04 ---------------------------------------------------------------CPU time used : 00:00:00.00 Generator state: s = 1858647048 ***************************************************************** ulcg_CreateLCG: m = 2147483647, a = 397204094, c = 0, s = 12345 smarsa_BirthdaySpacings test: -----------------------------------------------------------------N = 1, n = 10000, r = 0, d = 1000000, t = 2, Order = 1 Number of cells = d^t = 1000000000000 Lambda = Poisson mean = 0.2500 ------------------------------------------------------------------Total expected number = N*Lambda Total observed number : : Significance level of test 0.25 44 : eps ***** ------------------------------------------------------------------CPU time used : 00:00:00.01 Generator state: s = 731506484 From the output, we could see that in both test, N = 1, r = 0 and t = 2; but the sample size are different. When n = 103, the expected number of collision λ is 2.5 and when n = 104 this number would be 0.25. And in the first case, the test showed a Poisson distribution with mean λ and in the second case, it seemed that this birthday spacing test failed. 6. Conclusion This paper summarized random number testing algorithms as well as software, and examined Diehard Tests and TestU01 more thoroughly. We could not find a test that performs well for all random number generators, thus, we need various tests for different generators. As the development of computer hardware, the criteria for randomness may become more stringent and new well designed test methods are needed. References [1] Knuth, D. E. The Art of Computer Programming, V2: Semi-numerical Algorithms, 2nd Edition, Addison-Wesley, Reading, Mass., 1981. [2] Randomness test: http://en.wikipedia.org/wiki/Randomness_tests [3] Statistical randomness: http://en.wikipedia.org/wiki/Statistical_randomness [4] NIST: http://csrc.nist.gov/rng/ [5] George Marsaglia. A Current View of Random Number Generators, Computer Science and Statistics, 1985. [6] The Distributed Systems Group, Computer Science Department, TCD. Analysis of an On-line Random Number Generator, April 2001. [7] infinite monkey theorem: http://en.wikipedia.org/wiki/Infinite_monkey_theorem [8] Monkey Tests for Random Number Generators, Computers and Mathematics with Applications, 9, 1-10, 1993. [9] James E. Gentle. Random number generation and Monte Carlo methods Chapt2. Quality of Random Number Generators, 2nd Edition, Springer, 1998. [10] The Distributed Systems Group, Computer Science Department, TCD. Random Number Generators: An Evaluation and Comparison of Random.org and Some Commonly Used Generators, April 2005. [11] runs test: http://en.wikipedia.org/wiki/Wald%E2%80%93Wolfowitz_runs_test [12] TestU01: http://www.iro.umontreal.ca/~simardr/testu01/tu01.html [13] Pierre L’Ecuyer and Richard Simard. TestU01: A Software Library in ANSI C for Empirical Testing of Random Number Generators. [14] DIEHARD: http://stat.fsu.edu/~geo/diehard.html [15] SPRNG: http://sprng.cs.fsu.edu/ [16] ENT: http://www.fourmilab.ch/random/ [17] Crypt-X: http://www.isrc.qut.edu.au/resource/cryptx/