Brownian Motion Testing Random Numbers: Theory and Practice Prof. Michael Mascagni Applied and Computational Mathematics Division, Information Technology Laboratory National Institute of Standards and Technology, Gaithersburg, MD 20899-8910 USA AND Department of Computer Science Department of Mathematics Department of Scientific Computing Graduate Program in Molecular Biophysics Florida State University, Tallahassee, FL 32306 USA E-mail: mascagni@fsu.eduor mascagni@nist.gov URL: http://www.cs.fsu.edu/∼mascagni April 13, 2016 Brownian Motion Overview Chi-Square Test The Kolmogorov - Smirnov Test Empirical Tests Equidistribution Test (Frequecy Test) Serial Test Gap Test Poker Test Coupon Collector’s Test Permutation Test Run Test Maximum of t Test Collision Test Serial Correlation Test The Spectral Test Brownian Motion Chi-Square Test Chi-Square Test Eg. "Throwing 2 dice" s ps : : Value of the sum of the 2 dice. Probability. s ps 2 3 4 5 6 7 8 9 10 11 12 1 36 1 18 1 12 1 9 5 36 1 6 5 36 1 9 1 12 1 18 1 36 4 10 12 5 12 16 If we throw dice 144 times: s Observed: Ys Expected: nps 2 2 4 3 4 8 6 22 20 7 29 24 8 21 20 9 15 16 10 14 12 11 9 8 12 6 4 Brownian Motion Chi-Square Test Chi-Square Test Is a pair of dice loaded? We can’t make a definite yes/no statement, but we can give a probabilistic answer. We can form the Chi-Square Statistic. X (Ys − nps )2 nps 1≤s≤k 2 Ys 1 X = −n n ps V= 1≤s≤k V=k-1: degree of freedom k: n: Number of categories Number of observances Brownian Motion Chi-Square Test Table of Chi-Square Distribution Entry in row ν under column p is x, which means “The quantity V will be less than or equal to x, with approximate probability p, if n is large enough.” Example: Value of s Experiment 1, Ys Experiment 2, Ys 2 4 3 V1 = 29 3 10 7 59 120 4 10 11 5 13 15 6 20 19 7 18 24 8 18 21 V2 = 1 9 11 17 11 120 10 13 13 11 14 9 12 13 5 Brownian Motion Chi-Square Test Table of Chi-Square Distribution V1 = 29 59 120 V2 = 1 11 120 Discussion: V 1 is too high, V 0.1% of the time. V 2 is too low, V 0.01% of the time. Both represent x with a significant departure from randomness. To use Chi-Square distribution table, n should be large. How large should n be? Rule of thumb: n should be large enough to make each nps be 5 or greater. Brownian Motion Chi-Square Test Chi-Square Test 1. Large number n of independent observations. 2. Count the number of observations on k categories. 3. Compute V. 4. Look up Chi-Square distribution table. V<1% or V>99% 1%<V<5% or 95%<V<99% 5%<V<10% or 90%<V<95% otherwise reject suspect almost suspect accept Brownian Motion The Kolmogorov - Smirnov Test The Kolmogorov - Smirnov Test Chi-Square Test Kolmogorov - Smirnov Test : : for discrete random data for continuous random data Def: F(x) = probability that (X ≤ x) n independent observations of the random quantity X X1 , X2 , . . . , Xn Def: Empirical distribution function Fn (x) Fn (x) = numbers of X1 , X2 , . . . , Xn that ≤ x n Brownian Motion The Kolmogorov - Smirnov Test The Kolmogorov - Smirnov Test The Kolmogorov - Smirnov Test is based on the difference between F(x) and Fn (x). √ Kn+ = n max (Fn (x) − F (x)) −∞<x<+∞ maximum deviation when Fn is greater than F. √ Kn− = n max (F (x) − Fn (x)) −∞<x<+∞ maximum deviation when Fn is less than F. We get a table similar to the Chi-Square to find the percentile. Unlike χ2 , the table fits any size of n. Brownian Motion The Kolmogorov - Smirnov Test The Kolmogorov - Smirnov Test Simple procedure to obtain Kn + , Kn − . 1. Obtain observations X1 , X2 , . . . , Xn . 2. Rearrange into ascending order. X1 ≤ X2 ≤ . . . ≤ Xn 3. Calculate Kn + , Kn − Kn+ = Kn− = √ √ n max 1≤j≤n J − F (Xj ) n j −1 n max F (Xj ) − n 1≤j≤n Brownian Motion The Kolmogorov - Smirnov Test The Kolmogorov - Smirnov Test Dilemma: We need a large n to differentiate Fn and F. Large n will average out local random behavior. Compromise: Consider a moderate size for n, say 1000. Make a fairly large number of K+ 1000 on different parts of the random sequence K+ 1000 (1), K+ 1000 (2), . . ., K+ 1000 (r). Apply another KS Test. The distribution of Kn + is approximated. F∞ (x) = 1 − e−2x 2 Significance: Detects both local and global random behavior. Brownian Motion Empirical Tests Empirical Tests Empirical Tests: 10 tests Test of real number sequence < Un >= U0 , U1 , U2 . . . Test of integer number sequence < Yn >= Y0 , Y1 , Y2 . . . Yn = bdUn c Yn : integers[0, d − 1] Brownian Motion Empirical Tests Equidistribution Test (Frequecy Test) A. Equidistribution Test (Frequecy Test) Two ways: 1. Use χ2 test Figure: * d intervals Count the number of sequence <Y n >= Y 0 , Y 1 , Y 2 , . . . falling into each interval k=d ps = d1 2. Use KS Test Test <U n >= U 0 , U 1 , U 2 , . . . F(x) = x for 0 ≤ x ≤ 1 Brownian Motion Empirical Tests Serial Test B. Serial Test I I Pairs of successive numbers to be uniformly distributed. d2 intervals are used. Figure: * I I k = d2 , ps = 1/d2 Serial Test can be regarded as 2-D frequency test. Can be generalized to triples, quadruples, . . . Brownian Motion Empirical Tests Gap Test C. Gap Test Examine the length of “gaps” between occurrences of U j in a certain range 0≤α<β≤1. gap: Length of consecutive subsequences U j , U j+1 , . . ., U j+r lies between α and β. Algorithm: 1. Initialize: j← -1, s ← 0 2. r ← 0 3. if (α ≤ U j ≤ β), j ← j+1 else goto 5. 4. r ← r+1, goto 3. 5. record gap length. if r ≥ t, COUN T [t] ← COU N T [t]+1 else COUN T [r] ← COU N T [r]+1 6. Repeat until n gaps are found. Brownian Motion Empirical Tests Gap Test C. Gap Test COU N T [0], COUN T [1], . . ., COUN T [t] should have the following probability: I p0 =p, p1 =p(1-p), p2 =p(1-p)2 , . . ., pt−1 =p(1-p)t−1 , pt =p(1-p)t I p=β-α Now, we can apply the χ2 test. Special cases: I (α,β) = (0, 12 ) ← runs above the mean I (α,β) = ( 12 ,1) ← runs below the mean Brownian Motion Empirical Tests Poker Test D. Poker Test Consider 5 successive integers (Y sj , Y sj+1 , Y sj+2 , Y sj+3 , Y sj+4 ) Pattern All different One Pair Two Pairs Three of a kind Example abcde aabcd aabbc aaabc Pattern Full house Four of a kind Five of a kind Example aaabb aaaab aaaaa Simplify: 5 different 4 different 3 different 2 different 5 same numbers all different one pair two pairs or three of a kind full house or four of a kind five of a kind Brownian Motion Empirical Tests Poker Test D. Poker Test Generalized: n groups of k successive numbers (k - tuples) with r different values. d(d − 1) . . . (d − r + 1) k pr = r dk d = number of categories Then, the χ2 test can be applied. Brownian Motion Empirical Tests Poker Test Stirling Numbers of the Second Kind I I I I I Notation: S(n, k ) or { kn } Definition: count the number of ways to partition a set of n labelled objects into k nonempty unlabelled subsets Equivalently, they count the number of different equivalence relations with precisely k equivalence classes that can be defined on an n element set. In fact, there is a bijection between the set of partitions and the set of equivalence relations on a given set. n Obviously, n = 1 and for n ≥ 1, n1 = 1: the only way to partition an ”n”-element set into ”n” parts is to put each element of the set into its own part, and the only way to partition a nonempty set into one part is to put all of the elements in the same part. They can be calculated using the following explicit formula: k nno 1 X k −j k n Brownian Motion Empirical Tests Coupon Collector’s Test E.Coupon Collector’s Test In the sequence Y 0 , Y 1 , . . ., the lengths of the segments Y j+1 , Y j+2 , . . ., Y j+r are collected to get a complete set of integers from 0 to d-1. Algorithm: 1. Initialize j ← -1, s ← 0, COUN T [r] ← 0 for d ≤ r <t. 2. q ← r ← 0, OCCURS[k] ← 0 for 0 ≤ k <d. 3. r ← r+1, j ← j+1 4. Complete Set? OCCU RS[ Y j ] ← 1 and q ← q+1 if q=d, a complete set q <d, goto 3. 5. Record the length. if r ≥ t, COUN T [t] ← COUN T [t]+1 else COUN T [r] ← COU N T [r]+1 6. Repeat until n values are found. Brownian Motion Empirical Tests Coupon Collector’s Test E. Coupon Collector’s Test Chi-Square Test can be applied to COUN T [d], COU N T [d+1], . . ., COU N T [t] d! r − 1 pr = r ,d ≤ r < t d d −1 d! t − 1 pt = 1 − t−1 d d Brownian Motion Empirical Tests Permutation Test F. Permutation Test A t-tuple (U jt , U jt+1 , . . . , U jt+t−1 ) can have t! possible relative orderings. For Example: t=3 There should be 3! = 6 categories 1 <2 <3 1 <3 <2 k = t! We can apply χ2 test now. 2 <1 <3 3 <1 <2 2 <3 <1 3 <2 <1 ps = 1 t! Brownian Motion Empirical Tests Run Test G. Run Test Examine the length of monotone subsequences. “Runs up”: increasing “Runs down”: decreasing For Run i, the length of the run is COUN T [i]. 8 | |{z} 5 |3 6 7} | |{z} 0 4| |1 2 9} | |{z} | {z | {z 3 1 1 3 2 Note: χ2 test cannot be directly applied because of lack of independence (each segment depends on previous segment). Then, we need to calculate 1 X V= (COU N T [i] − nbi ) COUN T [j] − nbj aij n 1≤i,j≤6 Brownian Motion Empirical Tests Run Test G. Run Test a11 a21 a31 a41 a51 a61 a12 a22 a32 a42 a52 a62 a13 a23 a33 a43 a53 a63 a14 a24 a34 a44 a54 a64 a15 a25 a35 a45 a55 a65 a16 4529.4 9044.9 13568 18091 22615 2789 a26 9044.9 18097 27139 36187 45234 5578 a36 = 13568 27139 40721 54281 67852 8368 a46 10891 36187 54281 72414 90470 11158 a56 22615 45234 67852 90470 113262 13947 a66 27892 55789 83685 111580 139476 17286 b1 b2 b3 b4 b5 b6 = 1 6 5 24 11 120 19 720 Then, V should have the same χ2 distribution with degree 6. 29 5040 1 890 Brownian Motion Empirical Tests Maximum of t Test H. Maximum-of-t Test Examine the maximum value. Let V j = max(U tj , U tj+1 , . . ., U tj+t−1 ). The distribution is F(x) = Xt Then, we can apply the Kolmogorov - Smirnov Test here. Brownian Motion Empirical Tests Collision Test I. Collision Test Suppose we have m urns and n balls, m «n. Most of the balls will fall in an empty urn. If a ball falls in an urn that already has a ball, we call it a “collision”. A generator passes the collision test only if it doesn’t induce too many or too few collisions. Probability of c collisions occurring: m(m − 1) . . . (m − n + c + 1) n mn n−c Brownian Motion Empirical Tests Serial Correlation Test J. Serial Correlation Test Consider the observations (U 0 , U 1 , . . ., U n−1 ) and (U 1 , . . ., U n−1 , U 0 ) Test the correlation between these two tuples. We compute: C= n(U0 U1 + U1 U2 + . . . + Un−2 Un−1 + Un−1 U0 ) − (U0 + U1 + . . . + Un−1 )2 2 ) − (U + U + . . . + U 2 n(U02 + U12 + . . . + Un−1 0 1 n−1 ) A “good” C should be between µn - 2δ n and µn + 2δ n . r −1 1 n(n − 3) µn = , δn = n−1 n−1 n+1 , n>2 Brownian Motion The Spectral Test The Spectral Test Idea underlying the test: Congruential Generators generate random numbers in grids! In t-dimensional space, {(U n , U n+1 , . . ., U n+t−1 )} Compute the distance between lines (2D), planes (3D), parallel hyperplanes (>3D). 1/V 2 : 1/V 3 : 1/V t : Maximum distance between lines. Two dimensional accuracy. Maximum distance between planes. Three dimensional accuracy. Maximum distance between hyperplanes. t - dimensional accuracy. Brownian Motion The Spectral Test The Spectral Test Differentiate between truly random sequences and periodic sequences. Truly random sequences: Periodic sequences: accuracy remains same in all dimensions accuracy decreases as t increases Spectral Test is by far the most powerful test. I All “good” generators pass it. I All known “bad” generators fail it. Brownian Motion The Spectral Test Summary 1. Basic idea of empirical tests: The combination of random numbers is expected to conform to a specific distribution. 1.1 Build the combination. 1.2 Use χ2 or KS test to test the deviation from the expected distribution. 2. We can perform an infinite number of tests. 3. We might be able to construct a test to “kill” a specific generator. Brownian Motion The Spectral Test Other resources for RNG Testing 1. FFT, Metropolis, Wolfgang Tests (spectrum). 2. Diehard (http://www.stat.fsu.edu/pub/diehard) 3. SPRNG (implements most of the empirical tests and spectrum tests). http://sprng.cs.fsu.edu