transcript

advertisement

Being Random

[Title Slide] What does it mean to be random? How do you do it on a computer.

[Slide 1] Randomness is a weird kind of thing because it's more defined by what it isn't than what it is. If you see something happening and it's got some kind of pattern to it, then you say "That's not random". If you could predict what's happening next, if you could even just better than chance what's going to happen next and say well that's not random. So something that's random is supposed to not have a pattern, not be predictable. Well that's a real problem for a computer because the entire way that we have designed, architected and built computers is so that they will be deterministic. Serial determinism, typical computer the way it's designed. It does one thing at a time step by step. Each step is guaranteed as good a quality as the physical computer we can make. To be completely predictable. To be completely determined, the inputs determine the outputs. So in a computer program, if you can get all of the same inputs you're going to get the same output and it might not seem that way these days when you go to your social networking and you click on it and everything's constantly changing. You hit reload and it changes.

That's because there's all kinds of inputs going into that program that's producing that output that we're not necessarily aware of. If we can actually control all of the inputs to the program it would have to produce the same output and if it didn't, the computer is broke. If that's what computing is like, step by step, same input, same output, then how could we ever get a computer to be random? How could we ever get it so that it would be unpredictable? If we know the program, if we know the inputs, we could totally predict. We can run another computer with the same program, the same inputs. We have to get the

[00:02:00] same output. In computers, we end up using what's called pseudorandomness. It's not really random in the sense that if you know all of the inputs, you can predict the output. The unpredictability part is unclear, but the idea that there's no pattern is what pseudorandomness tries to get at. Let's look at a demo. [Slide 2] All right. Here's a machine. I submit to you its amazing

Magical Random Number Generator Machine. We've got a button down here that we can ...

All right, we feed in the spark of execution and there. It produced a number. 90.

Just a random number. What it says ... It's a random number. Let's make another one. Actually, let's just lock the switch down so it will make a bunch of them. The first thing is ... Did you expect ... ? 53. Did you expect 90 to be a random number as opposed to like heads or 6 or a 52-card or something like that? You can't even talk about randomness unless you have some universe of possible events, some set of things that might happen. You can't even ask give me a uniformly selected random number between 0 and infinity. It doesn't make sense. You already have to have some set of events in mind. This looks like 2-digit decimal numbers

Being_Random_Dave_Ackley Page 1 of 6

perhaps. It might not be what you need. If you really just wanted a coin flip, heads or tails, what are you going to do here? You could say "I'll call even numbers heads and odd numbers tails" or something like that. It's not up to me to use the whole number that the thing has given me.

If it's really random, if it's really unpredictable, if it's really got no pattern ... 90, that's even, that would be heads. 53, that's odd, that would be tails. Heads, tails,

74 is even. 81 is odd [00:04:00]. 50 is even. Wait a minute. 73 odd. 14 even.

These are just alternating. Even, odd, even, odd. How can that be random? It's a pattern. This isn't random. Big surprise. It's running in a computer. Let's pop the hood. [Slide 3] What we actually have under is not a random number generator in a real randomness sense, whatever that means. What we have is a pseudo random number generator (PRNG). This particular one is called a linear congruential generator and this specific specific one is actually a particularly bad linear congruential generator, but really the LCGs are important, at least historically, because they've been used for decades. They were the basis of much of what has been done for computer simulations of all sorts of things that could have led to policy decisions, could have led to people thinking some things work certain ways under the assumption that these were decent random numbers.

Wait a minute. 90, 53, 74. Look at this. The first number was 90. Here's another

90. Second number 53, 74, 74, 50. Where's the 50? 73. The next one has got to be 14. There it is. The next one has got to be 61. This has not only got a pattern to it, it's predictable. I know the sequence that's coming here and that's why this is such a terrible random number generator. Still, let's pop the hood again and actually see what's going on inside here. [Slide 4] All right. Some single stepping it now [00:06:00], so this is confusing looking, but it's really very simple. This red star says what's going to happen next. The first thing is we're going to do a multiplication. What do we multiply? This number 4567 times this number 93.

The result is going to be ... I don't know what it's going to be. The result is

124,731. Okay? Whatever. Then move onto the next step. This is serial determinism, step by step by step. The next step is to add this number, 23, to that number. That's going to give me 424,754, right? There it is. Execution moves on. Next step. Do clock arithmetic with 100 o'clock here. Modulus operation with

... This number is going to end up taking the last 2 digits, so it's going to give us a

54.

That 54 gets written back to this state box and that's key. What we have here is a cycle. The state gets read out, multiplied, added, modded and written back. Then as the final step, that number gets output as our random number. That state variable did not get reset between each cycle of producing a random number and that's how it works. A computer is deterministic, same inputs, same outputs.

We thought this program didn't have any inputs. We thought we just sort of made it go and then it produced an endless stream of numbers, but really the

Being_Random_Dave_Ackley Page 2 of 6

way we need to think about it is that that state variable is an additional input that it writes and then reads back in [00:08:00]. As a result, we get a sequence of numbers that changes over time, although in this case, it's not a very random sequence of numbers. Number one, it alternates between even and odd. If we had different multipliers and increments, we could get that to be different, like we could get them to all be even, as if that's better. In fact, it loops. It has a period.

No matter where it starts, we can put it back at the beginning. That's what this reset button does, which shows by the way that the only thing that this thing has got going for it is that state. Once we reset it, it's called reseeding, putting a seed because the beginning number that starts this whole process is called the seed.

Setting the seed back to a known quantity is called reseeding. Now we're getting the same thing: 90, 53 and so on. We could make this a little bit better if we increased this limit. Something like that. Because the state was limited to only 2 digits, 0 to 99, before ... Now it can go up to at least 1,000. Normally, when we make a linear congruential generator, we let the modulus be something like in the billions, so we get these big things out that look "Woo, it must be really random", but then they have these same flaws, the last bit of it even, odd, even, odd and so on. These are not great. Let's turn this off. These random numbers, these are terrible, so let's get rid of them.

[Slide 5] The state of the art in pseudorandomness number generation has moved on quite a bit beyond the linear congruential generator and today, you really do not want to be using one [00:10:00]. What do we use numbers for?

There's a least 2 whole categories of uses of random numbers in computers.

They're quite different and we need to not confuse them. One group is for models and simulations, computer games, shuffling a deck, Solitaire, making a simulation of traffic flow, where when you come to a light, is the light going to be green or red? It depends on ... We can throw a random number. It's red 60% of the time, green 40% of the time. Whatever it is. In that case, we're using random numbers to express essentially our ignorance. If we had made a more detailed model of that traffic simulation, whether the light was red or green is not actually random. It depends on its circuitry, the timing and who pressed the buttons and so on. We could build a more complicated model that would not have used randomness there, but in fact that's the way our brains work. When we have stuff that's sort of extremely large, we tend to assume it's constant.

When we have stuff that's incredibly tiny and rapidly changing relative to what we care about, we tend to think it's random. Then this stuff in the middle, about the size and the time scales that we're interested in, we tend to think there's causes for it. One thing causes the next. The big and slow stuff, constant small fast stuff, random and then causality in the middle. Randomness is deeply tied to how we look at the world. When we build models in computers, using

Being_Random_Dave_Ackley Page 3 of 6

pseudorandomn numbers, we're doing essentially that same process. We're deciding to say "Okay, well, this is causal here. I walked down to the corner because I'm walking to work. This is the light. Whether the light is walk or wait is random because I'm not really modeling that in detail and so on". The second category of uses of randomness is for [00:12:00] secrecy, for encryption. In that case, we have a very different set of needs. When we're dealing with a deck of cards, the stakes are low. When playing Solitaire on the screen. On the other hand, when we're playing Solitaire in a video poker game in a casino, it's very different. If we could predict what was going to happen, we might be able to make money; the casino wouldn't like that.

Similarly, if we're making keys for our encryption, to send messages to secure our documents, if anybody can predict what those keys are going to be, they can break our encryption. In essence, there's 2 views of randomness. One view says a random sequence must be completely unpredictable. If it can be predicted, it can be broken. The casino wants unpredictable. Encryption wants unpredictable.

Then the bigger use, the more common use, all we really care about is that whatever determines the sequence of numbers, it be uncorrelated with the purpose that we're putting the numbers to. Even though there were patterns in that LCG that we looked at. It wasn't a good one. If we use those numbers to cause the simulated traffic lights to turn red and green according to some odds, it's not clear that that would actually cause our simulation to be inaccurate in the long run on average.

This is the whole crazy business of pseudorandomness. It's not my words. Van

[inaudible 00:13:42] said that essentially if you ever try to produce random numbers in software, you're in a state of sin. What did he mean? He meant ...

Because the whole point of software is to be deterministic, is to be not random.

Pseudorandomness is all about that hidden state [00:14:00]. [Slide 6] One of the reasons why this random number generator is so bad, this pseudorandom number generator, is because it has a very little bit of state. Once the state has repeated itself, the whole sequence has to be repeating because that's the only extra component it's got. [Slide 7] Here's a picture of a more modern pseudorandom number generator called the Mersenne Twister. It's been around for 20 years or something now and it's pretty good. I'm not going to show the whole step by step of the algorithm because it's a little involved, but let's just look at how much state it's got. We'll pop the hood here.

[Slide 8] This is the state of the Mersenne Twister, represented with a white square for a 0 bit and a black square for a 1 bit. Here, we're producing these numbers. You might see this colors moving through here. Those are saying stuff about how the algorithm is working internally. The weird part is we're getting all these numbers, 2 billion minus 100 billion or whatever it is ... Not 100 billion; that's too big. It doesn't look like the state is actually changing anywhere except

Being_Random_Dave_Ackley Page 4 of 6

if you look up at the very beginning here. You can see some stuff changing. What really goes on is that's a counter up there and all of this state is 625 random number that the Mersenne Twister has precomputed, so when you ask for the next number, it doesn't actually change any of the state. We're about to get to the end. When the green bar gets to the end there, boom, then the whole state changes. It's all being bashed up. What that allows us to do ... It makes it a little weird because when you call for the random numbers [00:16:00], you get 624 of them very rapidly and then all of a sudden bluh and there's this big wait while it goes through and mixes all the bits.

The reason it keeps so much state is when it's mixing the bits, we want to be able to mix them from far apart, not just use the previous value of the state. I'm going to use the current number with the number 397 calls to the random number generator ago or in the future in the batch. One of the things we can do here is we can kind of cheat. Again, this is a pseudorandom number generator. If we set the seed to a known quantity and get an output 1.7bn minus 12m. If we set the seed back to the beginning, 1.7bn minus 12m and so on, so reseed the

[inaudible 00:17:03] behavior is completely predictable. We can cheat the

Mersenne Twister is designed so that you can't really set the seed to anything that isn't like this, sort of a hash or black and white, but I went in and I cheated.

[Slide 9] Here's where I initalized the entire state erray to be all zeros except for one 1 right in middle here. Now, again, regular Mersenne Twister will never do this, but we abused it and now we're getting this output which is 0,0,0,0, which is obviously not very random at all.

The reason I do this is ... Let's speed it up. When it rebuilds itself, it gradually starts to smear out the bits and it takes quite a while, but let's speed it up still more. If you look at this, you might be able to see [00:18:00]. There's this pattern where it looks like it's kind of moving up into the right. Can you see it? That's because what the Mersenne Twister is essentially doing it's using all of those nearly 20,000 bits as like a giant shift register where the bits move out of this guy, up into the next one, up to the top of the last column, down to the first and so on. It's like a giant multiplication where it takes a bunch of stuff and adds it in.

It really is in some loose sense quite similar to the multiply and the add of the simple linear congruential generator. It just has much, much more state and more carefully thought out state as well. [Slide 10] That's it. Randomness is this weird thing because it's about what it isn't. When you're using random number generators, you've got to avoid the bug of seeding the number generator over and over again. The whole point of seeding the generator is to get the initial value of the state and then you ask for numbers and let the state inside the random number generator keep track.

There's also random number generators I haven't talked about that are in computers that claim to be real random number generators. The way they work

Being_Random_Dave_Ackley Page 5 of 6

is by using timings of mouse clicks and keyboard types and packets arriving and stuff coming from the disc. They mix it all together under the assumption that there's no pattern to all of that stuff put together. It's probably true. It may be true, but who knows. That's just saying there's state in the outside world, there's state in the user's head about the timing of when they do things that we can exploit to make numbers that are unpredictable. Finally, the bottom line: if you're in a position where you're building a model of some sort and you're using random numbers, make sure you've the Mersenne Twister or at least something reasonably modern [00:20:00]. There are still bad random number generators out there. Don't use them. Okay. That's it.

Being_Random_Dave_Ackley Page 6 of 6

Download