MRG8

advertisement

TTI 2008@ Tuscany, Italy.

Some New Random Number Generators

(Multi Recursive Generator)

Tested on CASINO

Ryo Maezono

Japan Advanced Institute of Science and Technology,

Kanazawa, Japan.

TIF F Åià•

QuickTimeý Dz

èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ

ÅB

Collaborators…

Quic k Ti meý Dz

êLí£É vÉ ç É OÉ âÉ Ä

Dr. Kenta Hongo

(Maezono group)

Prof. Ken-ichi Miura

(National Institute of Informatics)

TIF F Åià•

QuickTimeý Dz

èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ

ÅB

Summary

- A new Random Number Generator (RNG)

“MRG8” developed by L’Eucuyer and Miura.

- Much Simpler (only 33 lines!) than recent fancy RNG.

Theoretically clear as well.

It Shows good performance...

- Seems equivalent to Ranlux4 with less cost than Ranlux3 .

TIF F Åià•

QuickTimeý Dz

èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ

ÅB

Random Number Generator

(RNG)

- A kernel component of QMC.

- What is the point for us?

- Auto-correlation length and Bias , mostly.

No “serious” RNG required such as,

- Physical RNG (based on thermal noise)

- Cryptographic RNG (based on thermal noise)

... These matter only on Cryptograph , Gambles etc.

© 前園涼・本郷研太2008

Points for us

- Auto-correlation Length

Better RNG has less correlation.

→ Shorter Blocking Length.

→ Effectively costless accumulation.

- Bias

Worse RNG has less homogeneous distribution

(Sparse Lattice Structure)

(3N-dim in our simulation)

→ Sampling would be biased.

→ Biased results.

© 前園涼・本郷研太2008

Representative RNG

(pseudo RNG)

- Linear Congruential Method

X n

 a

1

X n

1 mod p

:-) Simple, Costless and Fast.

:-( Sparse Hyper plane → Biased sampling

1) Recursive generation is just 1st. order .

2) Famous story about “RANDU” on IBM360/370.

... because of further reason of bad choice of a

1

16807

(N.B., Not the genuine drawback of Congruential method in general)

- Feedback Shift Register Method

... using with more than two   n k

Generalization for Higher Order

... using

(Topics of the study here) with more than two   n k

© 前園涼・本郷研太2008

(Not the target of the study here, brief overview)

Further developments (1)

on Feedback Shift Register Method

... using with more than two   n

 k

Long period sequence with practically easy implementation.

In practical situation, however, ...

Part of it is used.

Wrong “period” appears...

Further improved GFSR s...

(Generalized Feedback Shift Register) include “ Mersenne Twister ”

(Matsumoto&Nishimura, 1996)

© 前園涼・本郷研太2008

Further developments(2)

on Higher order congruential-based.

Include... ... using with more than two   n

 k

- “Fibonacci-based”

- “Subtract-with-Borrow” and its relatives

Further ‘tune-up’-ed → “Ranlux”

(implemented in CASINO)

(Main target of the study here)

- “Multiple Recursive Generator (MRG)”

X n

 a

1

X n

1

 a

2

X n

2

L

 a k

X n

 k

 mod p

How to choose coeffs has been known as a simple/powerful RNG, but...

  k

? (Knuth’s criteria) it had been the practical obstacle.

© 前園涼・本郷研太2008

Desired properties of RNG

“ Multiplicative Recursive Generator ” has several

Desired properties :

- Costless and Fast.

N.B.) “Mersenne Twister” is said to be ‘Good’ as well.

- Theoretically simple

N.B.) Non-linear congruential methods have no firm theoretical background.

- Easy to be accelerated (Vector/Parallel, discussed later).

© 前園涼・本郷研太2008

Choice of coefficients

“Knuth’s criteria”

X n

 a

1

X n

1

 a

2

X n

2

L

 a k

X n

 k

 mod p

Choosing coeffs so that it has

  k the Longest possible Period

P

 p k 

1

Choose so that

Characteristic Polynomial

   z k 

 a

1 z k

1 

L

 a k

1 z

 a k

Galois Field is a primitive polynomial on GF(p)

  can be factorized (mod p) in proper way.

Random Search for (Pierre L’Ecuyer@Univ. Montreal)

~ Factorization of r

 p k 

1 p

1

© 前園涼・本郷研太2008

勘所は ...

- 計算機による數値演算とは、いわば有限體上での元操作である。

- 疑似乱數生成とは、

有限體上での元から元への射影操作を生成する漸化式のうち

周期が恐ろしく長いものを實現するという事である。

- したがって其の本質は所与の有限體の代數構造で決まっている。

© 前園涼・本郷研太2008

MRG8

X n

 a

1

X n

1

 a

2

X n

2

L

 a k

X n

 k

 mod p

A good choice obtained for k=8 and p

2

31

1 a1= 1089656042 a2= 1906537547 a3= 1764115693 a4= 1304127872 a5= 189748160 a6= 1984088114 a7= 626062218 a8= 1927846343

(Found/Chosen by P. L’Ecuyer@Univ. Montreal)

A RNG named “ MRG8 ” (implemented/tested by Prof. Miura)

P

2

31 

1

8 

4.5

10

74 Only 33 lines !

© 前園涼・本郷研太2008

Point (1)

Quality of RNG is critically depending on Coef. Choice .

(Sparse Lattice structure) a1= 1089656042 a2= 1906537547 a3= 1764115693 a4= 1304127872 a5= 189748160 a6= 1984088114 a7= 626062218 a8= 1927846343

L’Ecuyer’s group carefully choose/test them.

Again...

Famous horror about “RANDU” on IBM360/370.

- Recursive generation is just 1st. order.

Bad choice of coefficients .

© 前園涼・本郷研太2008

Point (2)

X n

 a

1

X n

1

 a

2

X n

2

L

 a k

X n

 k

 mod p

On 64-bit architecture

The choice of p

2

31 

1 makes the implementation quite simple/fast!

( No dividing operation required )

Because...

z

Z

63

L Z

32

Z

31

L Z

2

Z

1

2

: z

1

·2

31  z

2

(binary description on 64-bit architecture) then, z

 z

1

·2

31  z

1

 z

1

 z

2

 z

1

2

31 

1

 z

1

 z

2

)

∴ z

31 

1

 z

1

 z

2

 

,

31

 

, 2

31

1

(No dividing operation required) z

1 z

2

0

0

L

L

0

0

Z

Z

63

31

L Z

32

L Z

2

Z

1

2

2

 

,

31

Z

63

L Z

32

Z

31

L Z

2

Z

1

2

.

and .

0 L 0 1 L 1

 

, 2

31 

1

© 前園涼・本郷研太2008

Statistical Tests

Done by Miura or L’Ecuyer…

(To be completed)

© 前園涼・本郷研太2008

QMC Test

(combined with CASINO-v1.8)

Tests using G1 set

1.02604152729604E- 04

Ground St. Ene. (a.u.)

-548.205

-548.206

-548.207

-548.208

-548.209

SO

2

(VMC) (# of step = 300 million)

(Blocking length) (16)

RANLUX-0

(4)

-1

(16)

(1)

-3

MRG8 -2

(2)

(2)

RANLUX-4

© 前園涼・本郷研太2008

Notes on Ranlux

“Tuned-up” version of SwB algorithm (Martin & Luescher, 1993)

Plucking of sequence to reduce auto-correlation

“RANLUX-0” = “Subtract with Borrow”

“RANLUX-1”

“RANLUX-2” More plucking, better performance in Spectrum Test

“RANLUX-3”

N.B.) “RANLUX-0” = “Subtract with Borrow”

~ “ 1st. order Linear congruential method”

(an effective implementation with very large prime number)

(Tezuka & L’Ecuyer, 1992)

Consistent with D.P. Landau’s work by Monte Carlo.

© 前園涼・本郷研太2008

Tests using G1 set

1.02604152729604E- 04

Ground St. Ene. (a.u.)

-548.205

SO

2

(VMC) (# of step = 300 million)

-548.206

-548.207

-548.208

-548.209

(16)

RANLUX-0

(4)

-1

(16)

(1)

-3

MRG8 -2

(2)

(2)

RANLUX-4

(Blocking length)

... can be viewed as 1st. order linear congruential RNG as well as the ‘Subtract with Borrow’ RNG.

© 前園涼・本郷研太2008

1.02604152729604E- 04

SO

2

( DMC ) # of step = 40,000

-548.560

(256)

(512)

-548.562

-548.564

-548.566

(512)

RANLUX-0

(2048)

(8192)

MRG8

-1

-2

-3

(2048)

# of config. = 10,000

(Blocking length)

RANLUX-4

© 前園涼・本郷研太2008

Source of Different Bias in DMC/VMC

1) Generating Random Walk ( VMC / DMC )

2) Metropolis reject/accept ( VMC / DMC )

3) Branching

( DMC )

© 前園涼・本郷研太2008

User

MRG8

RANLUX-0

RANLUX-1

RANLUX-2

RANLUX-3

RANLUX-4

176.0448

101.2084

127.7751

176.5535

304.1154

470.0464

Timing Info

SO

2

, DMC.

# of step = 1,000

System

# of config. = 10,000

Total CPU % CPU

189.3554

190.2169

192.9557

183.0453

182.4181

182.1101

365.6764

290.9649

321.1352

359.7706

486.7478

652.6842

0.42

0.33

0.36

0.42

0.57

0.76

(second)

© 前園涼・本郷研太2008

He & PH

2

(VMC)

1.02604152729604E- 04

-2.903686

-2.903688

-2.903690

-2.903692

He

# of step = 1000 milion

(2) (1)

MRG8

-0

(1)

(1)

(1)

-1 -2 -3

RANLUX

(1)

-4

(Blocking length)

1.02604152729604E- 04

-342.235

-342.236

-342.237

-342.238

-342.239

-342.240

PH

2

# of step = 150 milion

(16)

(1)

(1)

(4) (4) (2)

MRG8

-0 -1 -2 -3

RANLUX

-4

© 前園涼・本郷研太2008

He & PH

2

(DMC)

1.02604152729604E- 04

-2.90371

-2.90372

He

# of step = 50,000

# of config. = 10,000

(1024)

(1024)

(2048)

-2.90373

-2.90374

MRG8

-0

(1024)

(512)

(4096)

-1 -2 -3

RANLUX

-4

1.02604152729604E- 04

(Blocking length)

-342.473

-342.474

-342.475

-342.476

-342.478

(512)

PH

2

# of step = 30,000

# of config. = 10,000

(1024)

(1024) (512)(1024)

(2048)

MRG8

-0 -1 -2 -3

RANLUX

-4

© 前園涼・本郷研太2008

System dependence

- ( Bias of results) ~ ( Homogeneity of RNG) gets worse in higher dim. of sampling space…

(3N-dim’ in our case)

Systems with larger # of electrons are interesting.

- Sampling space has nodal structure.

“ All-electron systems” and “ Pseudo Potential systems” differ in its character.

Should examine Both.

© 前園涼・本郷研太2008

SH

4

(VMC/DMC, Pseudo Potential calc.)

-6.28928

-6.28936

-6.28944

-6.28952

-6.28960

-6.28968

-6.28976

-6.28984

(VMC)

(Blocking length)

(32)

(16)

(64)

(32)

(32) (16)

-6.3055

-6.3060

-6.3065

-6.3070

MRG8

-0 -1 -2

RANLUX

-3

# of step = 10 million

-4

-6.3075

-6.3080

-6.3085

(1024)

(2048)

(DMC)

(512)

(512)

(512)

MRG8

-0

(2048)

-1 -2

RANLUX

-3

# of step = 30,000

# of config. = 10,000

-4

© 前園涼・本郷研太2008

QMC Test

(combined with CASINO-v1.8)

“A Research Background”

RNG research people are interested in it.

c.f.) D.P. Landau’s test of RNG by QMC (Ising model)

Better RNG in spectrum tests gives not always better performance on application.

RNG people start to consider

“ harmony of RNG ” depending on applications”

Discussions

1) MRG8 always give the same answer as Ranlux4

2) MRG8 is costless than Ranlux3.

3) MRG8 gives no significant improvement on Blocking length.

4) Difference of Bias appeared in DMC/VMC

1) Generating Random Walk

2) Metropolis reject/accept

3) Branching

( VMC / DMC )

( VMC / DMC )

( DMC )

© 前園涼・本郷研太2008

Acceleration of RNG

X n

 a

1

X n

1

 a

2

X n

2

L

 a k

X n

 k

 mod p can be written as

Generating Sequence

X

1

, L , X

8

X

2

X

, L , X

X

8

2

, X

9

9

, L , X

9

X

10

, X

10

X

11

X n

X n

1

M

X n

 k

1

 a

1 a

2

L a k

0

M

1

0

0 L 0

O M

M

X n

1

X n

2

M

X n

 k

0 0 0 L 1

: A i.e., 

X

8

X

7

M

A

X

9

A

X

10

L

X

1

(Normal Sequence)

© 前園涼・本郷研太2008

Acceleration (cont’d)

X n

X n

1

M

X n

 k

1

 a

1 a

2

L a k

0

M

1

0

0 L 0

O M

M

X n

1

X n

2

M

X n

 k

0 0 0 L 1

: A

Having evaluated

A n in advance...

X

X

10

9

M

A

X

9

X

8

M

X

3

X

2

A

2

X

8

X

7

M

X

1

X

8

 n

X

8

 n

1

M

A n

X

8

X

7

M

X

1

 n

X

1

© 前園涼・本郷研太2008

Acceleration (cont’d)

Evaluation of

A n can be made faster by ...

- Contemporary Compiler (Vectorization)

- Intra-node parallelization (“thread parallel”)

Then...

X

8

X

7

M

X

1

A

X

9

A

X

10

A

X

11

A

X

12

L

(normal ‘serial’ generation)

A

A

2

A

3

A

4

X

9

X

10

X

11

X

12

A

A

2

A

3

A

4

X

13

X

14

X

15

X

16

A

A

2

A

3

A

4

X

17

X

18

X

19

X

20

L

(Accelerated generation)

© 前園涼・本郷研太2008

Download