Lesson 7. Does God play the dice? Objective: Coin tossing and dice rolling are the familiar means to simulate events occurring randomly. They are however not a practical means to generate a long sequence of random events. Hence, a pseudo-random number generator is used to quickly simulate several thousands of coin tossing (Prog#7a), dice rolling (Prog#7b – 7d), and random tree branching (Prog#7e – 7g). For one die roll, the chances for facing up any side are the same and equal to one out of six, so that they have a uniform distribution But, the sums of faced-up pips of two dice favor the median value 7, and obey a triangle distribution. Moreover, the distribution of sums of three dice gets rounded in the middle and falls off at the edges. As more and more dice are rolled, the distribution of their sums becomes a bell-shaped curve. This is the Central Limit Theorem. The branch scaling factor and spreading angle controls the growth of binary tree introduced in Lesson 1. When we assign only the scaling factor randomly the binary tree grows into all different sizes. But, when both the scaling factor and spreading angle are chosen randomly, there appear more middle-sized binary trees and less of the extreme small and large ones, as suggested by the Central Limit Theorem. 1 Scary words: Randomizer, pseudo-random number generator, random walk, event, sample space, probability distribution, uniform distribution, bell-shaped (normal) distribution, random variable, Central Limit Theorem. Last update: 8 Jan 2003 2 Lesson Randomizer We encounter the laws of chance in our daily life; for example, tossing a coin at the beginning of a football game, throwing dice for a board game, picking a winning lottery ticket, and forecasting the probability of rain. Of these, coin tossing is the simplest with two possible outcomes of head (H) and tail (T). It is the randomizer to generate either H or T in an unpredictable manner. Given a fair coin, our intuition tells us the chances for H or T are the same and hence equal to one in two, i.e., 12 . After 10, 50, and 100 throws of a coin, you record the numbers of H and T in table 1 and then compute the ratio of H or T to the total number of throws. Are the ratios close to 12 ? Do they approach 1 2 , as the number of coin throws increases? Table 1. Coin tossing Throws Number of occurrences Compute the ratio 10 50 100 H T H T H /10 = /10 = /50 = /50 = /100 = T /100 = It must be noted that the long-term probability of 12 is one of many tests to guarantee a random coin tossing. For instance, {H, T, H, T, H, T,…} is obviously not random, though the probability for H or T turns out exactly 12 . Now, a die throw generates one out of six numbers {1, 2, 3, 4, 5, 6}. To test the randomness of 60 die throws, you record in table 2 the number of occurrences of (1, 2, 3, 4, 5, 6) and then compute the ratio of occurrences to the total number of throws. In theory, for a true die the odds for any side to face up are one out of six, i.e., 16 . But, are the ratios equal to 16 in table 2? Table 2. 60 die throws Pip Number of occurrences Compute the ratio 1 2 3 4 5 6 /60 = /60 = /60 = /60 = /60 = /60 = Although coin tossing and dice throwing are the familiar randomizers, it is difficult to visualize how a winning ticket can be chosen randomly out of millions of 3 lottery tickets. Perhaps, randomness is seriously lacking if one imagines a large metal cage containing millions of lottery tickets being tumbled, prior to pulling out a winning ticket. Moreover, the uncertainties in weather forecasting are due to the random variations in geophysical flow computations over several days, for which there are no simple randomizers to simulate the meteorological fluctuations. For this reason, the realworld randomizers are based on pseudo-random number generator. First of all, a random number generator is a recipe to come up with a number that is not in an obvious manner related to all the numbers that it has previously generated. Since it is difficult to prove that such numbers are truly random, we simply call them pseudo-random numbers. Random walk by coin tossing Suppose that you are standing in the middle of a room facing north, as indicated by the green head with a nose pointing to north. If head, you turn right 90 and take a step forward, or turn left 90 and take a step forward if tail, as denoted by the orange heads with H or T, respectively Figure 1 shows the path of {H, T, T, H, T} and you can extend it for the next 10 steps by coin tossing. Did you return to the starting position in figure 1? How about after 20, 30 or more steps? Figure 1. Five steps by {H, T, T, H, T} 4 This is called a random walk to simulate the zigzag path of a drunken person. When a ray of sun pierces through a window, we sometimes can see the dust particles floating in the air and their path of movement resembles a random walk. A pseudorandom number generator coded into Prog#7a simulates coin tossing. To facilitate visualization, Prog#7a displays the first 25% of random walk path in red, the second 25% in blue, the third 25% in green, and the last 25% in red. Also, the final location of random walk is indicated by a red . You record the final positions after 1,000, 2,000, 3,000, 4,000, and 5,000 steps of random walk. Are you getting closer to or moving away from the starting point as the number of random steps increases? Uniform distribution For one die roll, the chances for any one of the six sides to face up are the same and equal to 16 0.167. This is indicated in figure 2 by six bars of the equal height 16 . It is called the probability distribution, indicating how the probabilities of each event are distributed over the sample space of outcomes {1, 2, 3, 4, 5, 6}. It is a uniform distribution of equal probabilities. You may also indicate in figure 2 the six ratios of table 2 by vertical bars for {1, 2, 3, 4, 5, 6}. Do they obey a uniform probability distribution after 60 die throws? Figure 2. One die roll A pseudo-random number generator simulates the dice roll in Prog#7b, so that you can quickly generate thousands of die throws. For instance, beginning from 500 die throws you can test for the equal probability of 16 by incrementing the number of die rolls by 500 with Prog#7b. 5 Bell-shaped distribution Two dice roll: We now roll two dice of white and blue, and add up the pips of faced-up sides. The smallest sum ‘2’ is given by ( ( , ) and the largest sum is 12 when ), so that {2, 3, 4, …, 11, 12} is the sample space. , But, sum ‘3’ can result from two events ( , ) and ( , ), and similarly ( , ) and ( , ) give sum ‘11.’ In table 3, we list all possible events for the sums of {2, 3, 4, …, 11, 12} of two dice roll. Table 3. Sum of the facing-up pips of two dice roll Pips 2 3 4 5 6 7 8 9 10 11 12 Events ( , ) ( , ), ( , ) ( , ), ( , ), ( , ) ( , ), ( , ), ( , ), ( , ) ( , ), ( , ), ( , ), ( , ), ( , ) ( , ), ( , ), ( , ), ( , ), ( , ), ( ( , ), ( , ), ( , ), ( , ), ( , ) ( , ), ( , ), ( , ), ( , ) ( , ), ( , ), ( , ) ( , ), ( , ) ( , ) , ) Since there are in all 36 events in table 3, the probabilities of sum ‘2’ and ‘12’ are 1/36, sum ‘3’ and ‘11’ are 2/36, sum ‘4’ and ‘10’ are 3/36, and so on, as they appear in the bar graph of figure 3 over the sample space {2, 3, 4, …, 11, 12}. Here comes a surprise. The uniform distribution (figure 2) of one die roll has become a triangle distribution (figure 3) when two dice are rolled and the faced-up pips are added up. That is, you are more likely to get sum ‘7’ than any other sums in two dice throw. With Prog#7c you 6 can check how closely a triangle distribution is realized by increasing the number of dice throws up to 5,000. Figure 3. Two dice throw Three dice roll: Let’s now roll three dice of white and blue and red. The smallest sum ‘3’ is given by ( , , ) and the largest sum ‘18’ occurs when ( , , ). Hence, the sums of three faced-up pips have the sample space of {3, 4, 5, …, 17, 18}. Similar to table 3, we can list the events of three dice roll. For instance, ( , , ) and ( , , ) and ( , , ) give sum ‘4.’ However, since it gets tedious to list all events, we summarize in table 4 the number of events of three dice rolls, the total of which is 216. Then, the probabilities for sum ‘3’ and ‘18’ are 1/216, sum ‘4’ and ‘17’ are 3/216, and finally sum ‘10’ and ‘11’ are 27/216 =1/8, as they are shown in figure 4 by vertical bars over the sample space of {3, 4, 5,…, 17, 18}. We notice that the distribution of figure 4 is more rounded in the middle and falls off more quickly at the low and high ends than the triangle distribution of figure 3. It begins to look like what is commonly called the bell-shaped distribution curve. You can also experiment with Prog#7d to see how closely the probability distribution of figure 4 is observed as the number of dice throws becomes large. Table 4. Three dice roll Pip sum 3 or 18 4 or17 5 or 16 6 or 15 7 or 14 8 or 13 9 or 12 10 or 11 Number of events 1 3 6 10 15 21 25 27 7 Figure 4. Three dice roll To sum up, the probability distributions of figures 2, 3, and 4 have the following message. Let us begin by defining a random variable, which varies from one instance to another in an unpredictable way, like the outcome of a coin tossing and die rolling. Other examples are the measurement errors in experiment, wind speed and temperature variations in weather forecasting, daily stock price fluctuations, etc. Although a single random variable obeys a constant distribution (figure 2), the sum of two random variables follows a quite different distribution of triangle shape (figure 3). Now, for the sum of three random variables we begin to see a bell-shaped distribution (figure 4). According to the Central Limit theorem, the distribution is truly bell-shaped or normal as the number of random variables becomes very large. Binary tree In Lesson 1, the growth of binary tree depends on two parameters, the scaling factor a and spreading angle . As shown in figure 5, the main trunk of, say, 1 meter gives rise to two shorter branches of ( 1a ) meter for the first branching, since a has a value in the range (1.2, 1.8). Also, the tree height is controlled by angle ; that is, the smaller , the taller the binary tree. Now, there are four branches of ( 1a ) ( 1a ) meter for the second branching, eight branches of ( 1a ) ( 1a ) ( 1a ) meter for the third branching, and so on. Here, we assign the values of a and randomly in a given range. 8 Figure 5. A binary tree One random variable: We first fix the spreading angle at 40 , but let a to vary randomly at each stage of branching. This is coded into Prog#7e by a pseudo-random number generator, which picks a value for a randomly in the range (1.2, 1.8). After running Prog#7e a dozen of times or more, you record in table 5 the number of small, medium, and large binary trees that are observed. Be as objective as possible, though no quantitative measures are provided here for the classification of binary trees. Table 5. Binary trees with random scaling factor Size Small Medium Large Number of observations Two random variables: Not only the random variation of a in the range (1.2, 1.8), Prog#7f also assigns angle randomly in the range ( 20 , 60 ). Again, after running Prog#7f a dozen of times or more, you record in table 6 the number of small, medium, and large binary trees that are observed. Table 6. Binary trees with random scaling factor and spreading angle Size Small Medium Large Number of observations Comparing tables 5 and 6, you may notice that there are more or less the same number of small and medium and large binary trees in table 5, whereas more medium binary trees are observed in table 6 than either the small or large ones. This is because randomizing 9 both a and decreases the small and large binary trees by the triangle distribution of figure 3. Pythagoras tree (Modification #2) The Pythagoras tree of modification #2 introduced in Lesson 3 has the motif which involves three parameters, the branching ratio r and two angles 1 and 2 , as shown in figure 6. Here, we assign the three parameters randomly at each stage of branching. That is, we choose r in the range (0.3, 0.5) and both 1 and 2 in the range ( 5 , 35 ), but angle 1 is always smaller than 2 . All this has been coded into Prog#7g. It appears that Prog#7g generates a more naturally looking tree than the original program Prog#3f, because the extreme small and large variations are being suppressed by a bell-shaped distribution of figure 4. This is a testament of the Central Limit theorem, favoring a median over the extremes. Figure 6. Motif for Pythagoras tree (modification #2) 10