[#ROOT-4826] TRandom3 poorly seeded from TUUID

advertisement
[ROOT-4826] TRandom3 poorly seeded from TUUID Created: 18/Dec/12
Updated:
21/Dec/12 Resolved: 18/Dec/12
Status:
Project:
Component/s:
Affects
Version/s:
Fix Version/s:
Closed
ROOT
Core Libraries
None
Type:
Reporter:
Resolution:
Environment:
Bug
None (Inactive)
Fixed
All
External issue
ID:
External issue
URL:
bugs99516
None
Priority:
Assignee:
Votes:
High
Lorenzo Moneta
0
https://savannah.cern.ch/bugs/?99516
Description
The macro below generates 100 numbers from 0-1000 in a row from a newly created
TRandom3. It then checks if it's seen this sequence before.
The chance of a collision should be 1 in 1000^100. Even taking into account that there are
~N^2 chances to collide, this should never happen. Instead, I get a collision after a few
thousand trials.
Debugging reveals that TUUID does create a unique seed each time, and TRandom3 does turn
that into a unique state, but the seeding algorithm means that there are only small differences in
the early part of the state. (UUID only changes in the part corresponding to the time, and
TRandom3's extension to the rest of the state only depends on the 8th value, which doesn't
change.
With insufficient cycles of the Mersenne twister, that small difference doesn't get fully mixed
in.
The expectation is that a newly created generator should be giving good quality, unique,
random numbers immediately.
I suggest adding this to the end of TRandom3::SetSeed():
if(seed == 0){
UInt_t mask = fMt[0];
for(int i = 1; i < 624; ++i) mask ^= fMt[i];
for(int i = 0; i < 624; ++i) fMt[i] ^= mask;
}
This ensures that every part of the UID contributes to every part of the state.
With this change I've run several million cycles without collision.
I haven't looked at the seeding of TRandom or TRandom2. Perhaps they have similar problems?
#include <iostream>
#include <set>
#include "TRandom3.h"
void seeding()
{
std::set<std::vector<int> > seen;
while(true){
TRandom3 rnd(0); // zero uses UUID seed
std::vector<int> vec;
for(int n = 0; n < 100; ++n) vec.push_back(rnd.Integer(1e3));
if(seen.find(vec) != seen.end()){
std::cout << "Failed after " << seen.size() << std::endl;
for(int n = 0; n < 100; ++n) std::cout << vec[n] << " ";
std::cout << std::endl;
for(int n = 0; n < 100; ++n) std::cout << (*seen.find(vec))[n] << " ";
std:::cout << endl;
return;
}
seen.insert(vec);
if(seen.size()%10000==0) std::cout << seen.size() << " OK" <<
std::endl;
}
}
Comments
Comment by Lorenzo Moneta [ 18/Dec/12 ]
Hi,
Thank you for reporting this problem. I can reproduce it, and as you suggested it is caused by
the fact that the UUID are too similar. One thing is also known that if two states of TRandom3
(Mersenne-Twister) are very close, the sequences will overlap.
Generating independent sequence is a tricky thing and the correct solution is probably to use
appropriate generators (parallel random number generators).
Your patch seems to fix the issue, but should be tested well. How do you came out with this
patch ?
The same tests fails of course for TRandom, but also for TRandom1 (runLux), while it works
for TRandom2. I think the reason is that in this case I warm up the generator after seeding it.
Best Regards
Lorenzo
Comment by Christopher Backhouse [ 18/Dec/12 ]
Hi,
I know that if I move the TRandom3 initialization outside the loop, for example, then I can
avoid the problem. But it's often easier to create the generator just before use. I know there's a
lot of code out there that looks like my testcase (I wrote some yesterday, which is how I found
this).
My patch is just ensuring that every byte of the seed has an effect on every byte of the state, in a
pretty dumb way. Empirically it seems to avoid the collisions, as you saw.
Maybe better would be to run your seed expansion function on each entry in the seed
individually, and then xor all the resulting sequences together to get the initial state.
Chris
Comment by Lorenzo Moneta [ 21/Dec/12 ]
Hi,
I have fixed this issue now in both trunk and 5.34 patches. I now use the UUID to generate 3
independent seeds for TRandom2. Then I use TRandom2 to generate the 624 seed values for
seeding TRandom3. I have also changed in a similar way TRandom1.
The fix is http://root.cern.ch/viewvc?view=rev&revision=48161
All the generators now pass seem your test, for the maximum time I can run it.
Thank for reporting this important bug
Best Regards
Lorenzo
Comment by Christopher Backhouse [ 21/Dec/12 ]
Looks good. Thanks.
Generated at Tue Feb 09 22:18:06 CET 2016 using JIRA 6.4.9#64024sha1:1f1084e06c9893c77549621edbccfecfaa68be5d.
Download