TTI 2008@ Tuscany, Italy.
(Multi Recursive Generator)
Ryo Maezono
Japan Advanced Institute of Science and Technology,
Kanazawa, Japan.
TIF F Åià•
QuickTimeý Dz
èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ
ÅB
Quic k Ti meý Dz
êLí£É vÉ ç É OÉ âÉ Ä
Dr. Kenta Hongo
(Maezono group)
Prof. Ken-ichi Miura
(National Institute of Informatics)
TIF F Åià•
QuickTimeý Dz
èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ
ÅB
- A new Random Number Generator (RNG)
“MRG8” developed by L’Eucuyer and Miura.
- Much Simpler (only 33 lines!) than recent fancy RNG.
Theoretically clear as well.
It Shows good performance...
- Seems equivalent to Ranlux4 with less cost than Ranlux3 .
TIF F Åià•
QuickTimeý Dz
èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ
ÅB
- A kernel component of QMC.
- What is the point for us?
- Auto-correlation length and Bias , mostly.
No “serious” RNG required such as,
- Physical RNG (based on thermal noise)
- Cryptographic RNG (based on thermal noise)
... These matter only on Cryptograph , Gambles etc.
© 前園涼・本郷研太2008
Better RNG has less correlation.
→ Shorter Blocking Length.
→ Effectively costless accumulation.
Worse RNG has less homogeneous distribution
(Sparse Lattice Structure)
(3N-dim in our simulation)
→ Sampling would be biased.
→ Biased results.
© 前園涼・本郷研太2008
(pseudo RNG)
- Linear Congruential Method
X n
a
1
X n
1 mod p
:-) Simple, Costless and Fast.
:-( Sparse Hyper plane → Biased sampling
1) Recursive generation is just 1st. order .
2) Famous story about “RANDU” on IBM360/370.
... because of further reason of bad choice of a
1
16807
(N.B., Not the genuine drawback of Congruential method in general)
- Feedback Shift Register Method
... using with more than two n k
Generalization for Higher Order
... using
(Topics of the study here) with more than two n k
© 前園涼・本郷研太2008
(Not the target of the study here, brief overview)
on Feedback Shift Register Method
... using with more than two n
k
Long period sequence with practically easy implementation.
In practical situation, however, ...
Part of it is used.
Wrong “period” appears...
Further improved GFSR s...
(Generalized Feedback Shift Register) include “ Mersenne Twister ”
(Matsumoto&Nishimura, 1996)
© 前園涼・本郷研太2008
on Higher order congruential-based.
Include... ... using with more than two n
k
- “Fibonacci-based”
- “Subtract-with-Borrow” and its relatives
Further ‘tune-up’-ed → “Ranlux”
(implemented in CASINO)
(Main target of the study here)
- “Multiple Recursive Generator (MRG)”
X n
a
1
X n
1
a
2
X n
2
L
a k
X n
k
mod p
How to choose coeffs has been known as a simple/powerful RNG, but...
k
? (Knuth’s criteria) it had been the practical obstacle.
© 前園涼・本郷研太2008
“ Multiplicative Recursive Generator ” has several
Desired properties :
- Costless and Fast.
N.B.) “Mersenne Twister” is said to be ‘Good’ as well.
- Theoretically simple
N.B.) Non-linear congruential methods have no firm theoretical background.
- Easy to be accelerated (Vector/Parallel, discussed later).
© 前園涼・本郷研太2008
“Knuth’s criteria”
X n
a
1
X n
1
a
2
X n
2
L
a k
X n
k
mod p
Choosing coeffs so that it has
k the Longest possible Period
P
p k
1
Choose so that
Characteristic Polynomial
z k
a
1 z k
1
L
a k
1 z
a k
Galois Field is a primitive polynomial on GF(p)
can be factorized (mod p) in proper way.
Random Search for (Pierre L’Ecuyer@Univ. Montreal)
~ Factorization of r
p k
1 p
1
© 前園涼・本郷研太2008
勘所は ...
- 計算機による數値演算とは、いわば有限體上での元操作である。
- 疑似乱數生成とは、
有限體上での元から元への射影操作を生成する漸化式のうち
周期が恐ろしく長いものを實現するという事である。
- したがって其の本質は所与の有限體の代數構造で決まっている。
© 前園涼・本郷研太2008
X n
a
1
X n
1
a
2
X n
2
L
a k
X n
k
mod p
A good choice obtained for k=8 and p
2
31
1 a1= 1089656042 a2= 1906537547 a3= 1764115693 a4= 1304127872 a5= 189748160 a6= 1984088114 a7= 626062218 a8= 1927846343
(Found/Chosen by P. L’Ecuyer@Univ. Montreal)
A RNG named “ MRG8 ” (implemented/tested by Prof. Miura)
P
2
31
1
8
4.5
10
74 Only 33 lines !
© 前園涼・本郷研太2008
Quality of RNG is critically depending on Coef. Choice .
(Sparse Lattice structure) a1= 1089656042 a2= 1906537547 a3= 1764115693 a4= 1304127872 a5= 189748160 a6= 1984088114 a7= 626062218 a8= 1927846343
L’Ecuyer’s group carefully choose/test them.
Again...
Famous horror about “RANDU” on IBM360/370.
- Recursive generation is just 1st. order.
Bad choice of coefficients .
© 前園涼・本郷研太2008
X n
a
1
X n
1
a
2
X n
2
L
a k
X n
k
mod p
On 64-bit architecture
The choice of p
2
31
1 makes the implementation quite simple/fast!
( No dividing operation required )
Because...
z
Z
63
L Z
32
Z
31
L Z
2
Z
1
2
: z
1
·2
31 z
2
(binary description on 64-bit architecture) then, z
z
1
·2
31 z
1
z
1
z
2
z
1
2
31
1
z
1
z
2
∵
)
∴ z
31
1
z
1
z
2
,
31
, 2
31
1
(No dividing operation required) z
1 z
2
0
0
L
L
0
0
Z
Z
63
31
L Z
32
L Z
2
Z
1
2
2
,
31
Z
63
L Z
32
Z
31
L Z
2
Z
1
2
.
and .
0 L 0 1 L 1
, 2
31
1
© 前園涼・本郷研太2008
Done by Miura or L’Ecuyer…
(To be completed)
© 前園涼・本郷研太2008
(combined with CASINO-v1.8)
1.02604152729604E- 04
Ground St. Ene. (a.u.)
-548.205
-548.206
-548.207
-548.208
-548.209
SO
2
(VMC) (# of step = 300 million)
(Blocking length) (16)
RANLUX-0
(4)
-1
(16)
(1)
-3
MRG8 -2
(2)
(2)
RANLUX-4
© 前園涼・本郷研太2008
“Tuned-up” version of SwB algorithm (Martin & Luescher, 1993)
Plucking of sequence to reduce auto-correlation
“RANLUX-0” = “Subtract with Borrow”
“RANLUX-1”
“RANLUX-2” More plucking, better performance in Spectrum Test
“RANLUX-3”
N.B.) “RANLUX-0” = “Subtract with Borrow”
~ “ 1st. order Linear congruential method”
(an effective implementation with very large prime number)
(Tezuka & L’Ecuyer, 1992)
Consistent with D.P. Landau’s work by Monte Carlo.
© 前園涼・本郷研太2008
1.02604152729604E- 04
Ground St. Ene. (a.u.)
-548.205
SO
2
(VMC) (# of step = 300 million)
-548.206
-548.207
-548.208
-548.209
(16)
RANLUX-0
(4)
-1
(16)
(1)
-3
MRG8 -2
(2)
(2)
RANLUX-4
(Blocking length)
... can be viewed as 1st. order linear congruential RNG as well as the ‘Subtract with Borrow’ RNG.
© 前園涼・本郷研太2008
1.02604152729604E- 04
SO
2
( DMC ) # of step = 40,000
-548.560
(256)
(512)
-548.562
-548.564
-548.566
(512)
RANLUX-0
(2048)
(8192)
MRG8
-1
-2
-3
(2048)
# of config. = 10,000
(Blocking length)
RANLUX-4
© 前園涼・本郷研太2008
1) Generating Random Walk ( VMC / DMC )
2) Metropolis reject/accept ( VMC / DMC )
3) Branching
( DMC )
© 前園涼・本郷研太2008
User
MRG8
RANLUX-0
RANLUX-1
RANLUX-2
RANLUX-3
RANLUX-4
176.0448
101.2084
127.7751
176.5535
304.1154
470.0464
SO
2
, DMC.
# of step = 1,000
System
# of config. = 10,000
Total CPU % CPU
189.3554
190.2169
192.9557
183.0453
182.4181
182.1101
365.6764
290.9649
321.1352
359.7706
486.7478
652.6842
0.42
0.33
0.36
0.42
0.57
0.76
(second)
© 前園涼・本郷研太2008
He & PH
2
(VMC)
1.02604152729604E- 04
-2.903686
-2.903688
-2.903690
-2.903692
He
# of step = 1000 milion
(2) (1)
MRG8
-0
(1)
(1)
(1)
-1 -2 -3
RANLUX
(1)
-4
(Blocking length)
1.02604152729604E- 04
-342.235
-342.236
-342.237
-342.238
-342.239
-342.240
PH
2
# of step = 150 milion
(16)
(1)
(1)
(4) (4) (2)
MRG8
-0 -1 -2 -3
RANLUX
-4
© 前園涼・本郷研太2008
He & PH
2
(DMC)
1.02604152729604E- 04
-2.90371
-2.90372
He
# of step = 50,000
# of config. = 10,000
(1024)
(1024)
(2048)
-2.90373
-2.90374
MRG8
-0
(1024)
(512)
(4096)
-1 -2 -3
RANLUX
-4
1.02604152729604E- 04
(Blocking length)
-342.473
-342.474
-342.475
-342.476
-342.478
(512)
PH
2
# of step = 30,000
# of config. = 10,000
(1024)
(1024) (512)(1024)
(2048)
MRG8
-0 -1 -2 -3
RANLUX
-4
© 前園涼・本郷研太2008
- ( Bias of results) ~ ( Homogeneity of RNG) gets worse in higher dim. of sampling space…
(3N-dim’ in our case)
Systems with larger # of electrons are interesting.
- Sampling space has nodal structure.
“ All-electron systems” and “ Pseudo Potential systems” differ in its character.
Should examine Both.
© 前園涼・本郷研太2008
SH
4
(VMC/DMC, Pseudo Potential calc.)
-6.28928
-6.28936
-6.28944
-6.28952
-6.28960
-6.28968
-6.28976
-6.28984
(VMC)
(Blocking length)
(32)
(16)
(64)
(32)
(32) (16)
-6.3055
-6.3060
-6.3065
-6.3070
MRG8
-0 -1 -2
RANLUX
-3
# of step = 10 million
-4
-6.3075
-6.3080
-6.3085
(1024)
(2048)
(DMC)
(512)
(512)
(512)
MRG8
-0
(2048)
-1 -2
RANLUX
-3
# of step = 30,000
# of config. = 10,000
-4
© 前園涼・本郷研太2008
(combined with CASINO-v1.8)
“A Research Background”
RNG research people are interested in it.
c.f.) D.P. Landau’s test of RNG by QMC (Ising model)
Better RNG in spectrum tests gives not always better performance on application.
RNG people start to consider
“ harmony of RNG ” depending on applications”
1) MRG8 always give the same answer as Ranlux4
2) MRG8 is costless than Ranlux3.
3) MRG8 gives no significant improvement on Blocking length.
4) Difference of Bias appeared in DMC/VMC
1) Generating Random Walk
2) Metropolis reject/accept
3) Branching
( VMC / DMC )
( VMC / DMC )
( DMC )
© 前園涼・本郷研太2008
X n
a
1
X n
1
a
2
X n
2
L
a k
X n
k
mod p can be written as
Generating Sequence
X
1
, L , X
8
X
2
X
, L , X
X
8
2
, X
9
9
, L , X
9
X
10
, X
10
X
11
X n
X n
1
M
X n
k
1
a
1 a
2
L a k
0
M
1
0
0 L 0
O M
M
X n
1
X n
2
M
X n
k
0 0 0 L 1
: A i.e.,
X
8
X
7
M
A
X
9
A
X
10
L
X
1
(Normal Sequence)
© 前園涼・本郷研太2008
X n
X n
1
M
X n
k
1
a
1 a
2
L a k
0
M
1
0
0 L 0
O M
M
X n
1
X n
2
M
X n
k
0 0 0 L 1
: A
Having evaluated
A n in advance...
X
X
10
9
M
A
X
9
X
8
M
X
3
X
2
A
2
X
8
X
7
M
X
1
X
8
n
X
8
n
1
M
A n
X
8
X
7
M
X
1
n
X
1
© 前園涼・本郷研太2008
Evaluation of
A n can be made faster by ...
- Contemporary Compiler (Vectorization)
- Intra-node parallelization (“thread parallel”)
Then...
X
8
X
7
M
X
1
A
X
9
A
X
10
A
X
11
A
X
12
L
(normal ‘serial’ generation)
A
A
2
A
3
A
4
X
9
X
10
X
11
X
12
A
A
2
A
3
A
4
X
13
X
14
X
15
X
16
A
A
2
A
3
A
4
X
17
X
18
X
19
X
20
L
(Accelerated generation)
© 前園涼・本郷研太2008