Physics 301 Computer Assignment 1: The Random Walk Winter 2012 A “classic” problem in statistical physics is the random walk problem. It is directly related to many real world problem ranging from physics (diffusion of gases, electron motion in solids) to biology (various problems in genetics and ecology) to economics (price fluctuations). Although the basic random walk problem can be completely and exactly analyzed using probability theory, we’re going to study the problem “empirically” with the use of random numbers. Many physics problems that lack an exact solution can nonetheless be well understood by looking at the statistics of many randomly generated events. This statistical approach to understanding complex situations is called the Monte Carlo method, named for the Monte Carlo Casino in Monaco where fortunes are won and loss in games of chance. For those of you who return for Physics 401, we’ll use the Monte Carlo method to understand ferromagnetism. Part 1: The random walk in one dimension The traditional statement of the random walk problem refers to a drunk trying to make his way down a narrow street. He’s so inebriated that he’s equally likely to take a step to the right (one way along the street) or the left (the opposite way). If he starts underneath the lamp post, what can we say about where he’s likely to end up? (Other than “in the gutter.”) Let’s state the problem more precisely. Consider a particle that can move only on the x-axis, starting at x = 0. It can move only in steps of length 1, and for each step it has equal probability (50-50) of moving in the positive or negative direction. Where is it after N steps? Clearly x0 = 0 before any step is taken (N = 0). After N = 1, the particle has a 50% chance of being at x1 = +1 and a 50% chance of being at x1 = –1. From each of these, it has a 50% chance of increasing x by 1 and a 50% chance of decreasing x by 1. The particle at x1 = +1 can go to +2 or 0; the particle at x1 = –1 can go to –2 or 0. You should convince yourself that x2 can be only +2, 0, or –2, and the probabilities for these are 25%, 50%, and 25%, respectively. Note: Make sure you notice that even values of N allow only even values of xN. That is x100 might be 0, 2, 4, 6, ... but cannot be an odd number such as 7. While it’s straightforward to use pencil and paper to track the possibilities for the first few steps, that quickly becomes unfeasible as N increases. Hence the role of the computer. We will attack the random walk program using Excel. The basic computation task is to generate a series of xi values, i = 0 to N, which we’ll place in one column of the spreadsheet from row 1 (i = 0) to row N+1 (i = N). Each cell is calculated from the cell above it using (assuming Column B) B(i) = B(i–1) ±1 where a +1 and a –1 each have a 50% chance of occurring. 1 It’s easy to generate random numbers in Excel using the function RAND(). You have to have the parentheses, because this is a function, but there’s nothing inside the parentheses. Now RAND() generates a random number between 0 and 1. We need a random number that is +1 50% of the time and –1 50% of the time. First, note that 50% of the numbers generated by RAND are in the range 0.0–0.5, and 50% are between 0.5–1.0. Second, the function ROUND(x, 0) rounds whatever x is to the nearest integer (0 means zero decimal places – i.e., an integer). With a little thought, you should now be able use a combination of functions (that is, everything fits in 1 cell in the spreadsheet) to generate ±1 with 50% probability. You’ll want to generate something like 1000 random numbers this way and check that you really are generating equal numbers of 1 and –1. Now let’s do something with it. Generate the series of integers 0 to 1000 in Column A. (One way is to use the Fill/Series command, under the Edit menu). Put a 0 in B1 as the starting point of the random walk, then insert a formula in cells B2 to B1001 to randomly add ±1 to the cell above it. Now you have a 1000-step random walk. But looking at 1001 numbers isn’t easy. Further, this is just one of 21000 possible 1000-step walks. That’s ≈1000100 walks, so we can’t look at all of them. But we can look at a random sample of random walks. Copy Column B to the next 9 columns. Now you’ve got ten 1000-step random walks. And a nice way to look at the results is with a graph. So make an xy-graph with Column A as the x-axis and with 10 graphs displayed. It should look something like As a general rule of thumb for making Excel graphs for physics: Turn OFF the display of the legend; we hardly ever need it. Turn OFF the horizontal grid lines. Format the plot area to No Fill rather than the default gray background. For this graph, set the horizontal axis to end at 1000 and the vertical axis to go –100 to 100. 2 There’s one more thing to learn: How to recalculate the worksheet. With the calculations you might normally do in Excel, there’s only one answer. But with random numbers, you can get entirely different answers with different random numbers. You would like to recalculate the random walks with different random numbers, but without having to change anything in the worksheet. The command for doing that is Manually Calculate, which calculates a new random number every time it encounters RAND(). On a Mac, you recalculate the worksheet by pressing COMMAND + =. That is, pressing = while holding down COMMAND (the key with the 4-leaf-clover symbol). I don’t know on a PC, but find out by searching “manually calculate” in Excel Help. After finding the Manually Calculate key, keep recalculating while watching the graphs. The goal is to get some awareness of how much variability there is. What’s changing every time? What features are always present? Q1: Looking at the 10 graphs you get each time, and hitting Manually Calculate several times, do you see any graphs that end exactly at x1000 = 0? If so, does this occur frequently or rarely? Q2: Looking at the 10 graphs you get each time, what would you estimate the average value of x1000 is? Give your reasoning. (Note: Your reasoning should be based on your data, not on what you think “ought” to happen.) Can you also explain why this “ought” to be the average? Q3: As N increases, does the spread of the random walks – that is, the variation from most negative to most positive – increase, decrease, or stay about the same? Q4: Does the spread increase linearly with N? Faster than linear? Less than linear? To answer this question, compare the spread – over many hits of Manually Calculate – at N = 1000 to the spread at N = 100. Because there’s no spread at N = 0, a linear increase in spread would mean that the N = 1000 spread is 10 times the N = 100 spread. Note: Q1–Q4 are asking for your conclusions based simply on LOOKING at the graphs. You shouldn’t be doing any calculations. Drawing conclusions from graphs is an important skill. For Part 1, turn in 1 graph showing the ten random walks and the answers to questions Q1 – Q4. Part 2: The spread of random walks in one dimension In Part 1 you should have found that the average value of x, denoted x , is zero, which is not surprising, but that individual particles wander farther and farther from zero as N increases. This is the process of diffusion, where a substance initially highly concentrated at one point “spreads out” as time passes. What can we learn about diffusion from studying the random walk? To answer this question, we’re going to need lots more data. You’ve already used 10,000 cells (1000 x 10) to track 10 particles. It’s clearly not feasible to extend this to hundreds or thousands of particles. Fortunately, Excel has some powerful tools that will let us substantially increase the power of our calculations – and help make you a “power user” of Excel. 3 The calculation in Excel of a series of integers with B(i) = B(i–1) + 1 is an iteration, the same calculation done over and over. The random walk calculation of Part 1, B(i) = B(i–1) ±1, was also an iteration. In normal spreadsheet calculations, each iteration – each turn of the crank – gets its own cell. Thus very large iterative calculations require very large – often unfeasible – numbers of cells. Suppose we don’t want to know all the intermediate values, only the very last value of a series of iterations. As an example, suppose we want to know the sum of all the integers up to 50: 1 + 2 + 3 + ... + 50? You could easily do this in Excel, but taking up quite a few rows when all you really want to know is the very last value. Summing integers to 10,000 would be quite a pain. Open a new spreadsheet, and in cell A2 type the formula = A2 + 1. This instantly generates an error message that you have a circular reference, that the formula in A2 refers to A2. Interestingly, we can get around this restriction. I’ll explain the procedure for a Mac; a PC will be similar, but you PC users will have to figure it our. On a Mac, go to Excel Preferences (under the Excel menu) and select Calculation. You’ll see that Excel, by default, is set to Automatic, meaning that the spreadsheet automatically recomputes each time you change anything. Change this to Manual calculation, meaning that Excel will now not do any calculations until you press the Manually Calculate command you learned about in Part 1. Also check the Iteration box in the same window, and set Maximum Iterations to 50. Close the Preferences window. Back to your spreadsheet. First, put a 0 in cell A1. Then change the cell A2 formula to = IF(A1=0, 0 , A2+1) Cell A1 is now the switch. If A1=0, cell A2 will be initialized to 0. If A1=1 (or any other value), cell A2 executes “A2+1”. With the Iterate box checked, what was a circular reference will now be calculated as “take what’s in A2, add 1 to it, but the result back into A2 as a new or updated value.” It will do this for the number of times you’ve set as Maximum Iterations. To “run” this procedure, first type 0 in A1, then issue the Manually Calculate command. A2 should be 0. Change A1 to 1, and Manually Calculate again. By iterating, you’ve added 1 to the initial 0 fifty times, so A2 should now be 50. You’ve made an Iteration Counter! Change Maximum Iterations to 500, reset the counter with A1=0 and Calculate Now, then run it with A1=1 and Calculate Now. A2 will count to 500 faster than you can see. Note: If you know a programming language, you’ll recognize that we’re doing the same thing you would do in a DO loop, FOR loop, WHILE loop, or some other loop structure. Excel is not a language, but the Iterate procedure allows you to do some of the things you might do with a loop. To sum integers, we want to add the current value of the iteration to an existing sum. We need a third cell. In A3, enter the formula = IF(A1=0, 0 , A3+A2) The A1 switch initializes A3 to 0. Then as A2 increases 1, 2, 3, ..., that value is added to the growing sum in A3. Verify that the sum 0 + 1 + 2 + ... + 50 = 1275. 4 Hint 1: To better see what’s happening, set Maximum Iterations to 1. Initialize the counter, then set A1=1 and press Manually Calculate. It will do 1 iteration, adding 0 + 1 to get 1. Without reinitializing, press Manually Calculate again. Excel will do one more iteration, increasing A2 to 2 and adding it to the 1 that was already in A3, giving 3. Another Manually Calculate will increase A2 to 3 and A3 to 6, and so on. When building a spreadsheet that uses iteration, it’s often good to iterate one step at a time to verify that things are behaving properly. Hint 2: If you type something wrong, things can go horribly wrong in an iteration, sometimes getting stuck inside the iteration. To “break out” and halt the iterations on a Mac, type COMMAND + .. That is, hit the period key . while holding down COMMAND. It will be something similar on a PC, but you might want to experiment to find out Q5: Build a 3-cell spreadsheet to calculate factorials. Report the formula you enter in each cell. Calculate 50! and 120!. You can check the first on your calculator, but the second is probably a larger number than your calculator can handle. What does this have to do with random walks? The random walk calculation that took a 1001row-long column in Part 1 can be collapsed into 1 cell that reports the final value x1000! Or, by simply changing the value of Maximum Iterations to N, the final value xN for any N. Build a spreadsheet with Cell A1 being the 0 (to initialize) or 1 (to run) switch. Cell A2 being an iteration counter, as above. Cells B1 to B1000 calculating 1000 random walks by adding ±1 to the previous value. That is, each cell uses the power of iteration to compute the final position of an N-step random walk. The 1000 cells will compute xN for 1000 independent random walks. Cell A3 calculating x as AVERAGE(B1:B1000). Set A3 to show 2 decimal places, the other cells to show 0 decimal places (integers). Now there are so many calculation being performed, slowing down the spreadsheet, that you’ll be able to see the iteration counter increasing. The x cell is a check that things are OK; the average of 1000 random walks should never stray very far from 0.00. If it does, something is wrong. Check with 100 steps, then 1000 steps that everything is OK. Note that 1000 1000-step random walks would have taken 1 million cells – 1000 x 1000 – as you were doing things in Part 1! (1000 1000-step walks takes ≈1 minute on my Mac.) What we’d like to find out is how much the random walkers spread out – how much they diffuse – as N increases. Again, this is best done graphically. Now, with lots of data, it will be best to make a histogram to summarize results rather than looking at all 1000 data points. An Excel spreadsheet can have several individual worksheets, accessed with the Sheet 1, Sheet 2, etc. tabs at the bottom of the sheets. You’ve written your random walk program on Sheet 1. Now go to 5 Sheet 2, and in Column A put the numbers –50, –48, –46, ... 48, 50. (Recall that x has only even values if N is an even number.) Of the 1000 random walkers on Sheet 1, we want to find how many end at xN = –50, how many end at xN = –48, and so on up to xN = +50. This is a good use for the COUNTIF function, which you can use Excel Help to read about. Use COUNTIF in Column B, with the range of cells to be counted referring back to B1:B1000 on Sheet 1 while the condition for being counted is the neighboring value in Column A. In cell B53, put a formula to add all the values B1 to B51. You have 1000 walkers on Sheet 1, each should have been counted once, so all the counts added together should total 1000. Note: Something about the order in which Excel does things during iterations seems to require putting this “data analysis” on a separate sheet from the random walk calculations. I tried putting it all on one sheet, but couldn’t get it to work right. But the spreadsheet probably calculates faster with the analysis on a separate sheet. Now to test it. Go to Sheet 1, set Maximum Iterations to 100, initialize the sheet, then run it. Stay on Sheet 1 until the iteration counter reaches 100. Then switch over to sheet 2. The first few rows of Column B should be 0, then values should start increasing until reaching a maximum at or near the row in which Column A is zero (i.e., having x100 = 0), then decrease until the final few rows in Column B are 0. The counter beneath Column B should be exactly 1000. Assuming all is OK, it’s time to graph. Make an xy-graph of Columns A and B, this time displaying points only (make them big!), no connecting lines. You’ll need to manually set the xaxis to go from –60 to 60 or –40 to 40 or something like that. You want to make sure 0 gets a label. The graph should look something like: I used a tick mark of 4, so I wanted to end the axis at a multiple of 4. I used ±44 here. 6 What are we looking at? The horizontal axis is xN, the final position at the end of an N-step random walk. The vertical axis is how many of the 1000 random walks have that value of xN. For example, we see that 16 of the 1000 100-step walks ended at x100 = 20. You can see that there’s a bell-shaped distribution of the end points of the random walk process. Note: If you have a fast enough computer, or are patient enough, you can get “smoother” curves by using more random walkers. But you have to make big changes to see much difference. This is 1000 walkers. The curve will look better and smoother if you use 2000, but an increase from 1000 to, say, 1100 would hardly be noticeable. 4000 walkers would reduce the “fluctuations” by a factor of 2 to make an even better curve. Remember that all the walkers started at x = 0. The “spread” of the distribution, as seen in the graph, tells us how much the walkers have “diffused” after N steps. For N = 100, the area –20 < x < 20 seems to have been “hit” by the diffusers, but none have yet reached as far as x = ±50. We want to learn how the spread varies with N, but we need to decide how to measure “spread.” If the number of walkers ∞, it can be shown that the bell-shaped distribution seen in the graph becomes a Gaussian curve. For a Gaussian, the appropriate way to measure width or spread is to measure the point corresponding to e–0.5 ≈ 0.60. In other words, the spread, which we’ll call ∆x, is found by reading the value of x at the point where the curve has dropped to 60% of its peak value. Now that’s theory. In reality, you’ve got a “noisy” curve, and it’s not exactly clear what the peak value is. But you can estimate. Imagine a smooth curve through the points in the graph above. I’d say the peak of a smooth curve would be ≈58. 60% of that is ≈35. On the left side, a smooth curve passes through 35 at x ≈ –14. On the right, at x ≈ 12. Averaging these two, we could say that ∆x ≈ 13 for this curve. A bell-shaped curve doesn’t have a sharp edge, but ∆x (on both sides of center) seems a reasonable measure of how much diffusion has occurred. Q6: Using at least 1000 random walkers (2000 or 4000 if you can), measure ∆x using the method just described for N = 50. Repeat 4 times, getting 4 different graphs. Repeat, 4 times each, for N = 100, 400, and 900. (The last one will be really noisy and hard to measure, but do your best.) Put your results in a table with 4 columns (labeled 50, 100, 400, 900) and 1 row for each ∆x. Then, at the bottom of the columns, average the 4 values of ∆x for each N. Submit one graph for each N (a total of 4 graphs) and the table of measurements. Q7: Propose a hypothesis of how ∆x depends on N. That is, a specific mathematical equation by which you could predict what the spread ∆x should be for a particular N. (Don’t forget that your ∆x values have some uncertainty because they’re a statistical sample; don’t treat them as exactly known quantities.) Be sure that your equation makes the correct prediction for N = 0. Then use your hypothesis to predict ∆x after 10,000 steps. Q8: In the diffusion of a gas from a point source, the number of “steps” in the random walk is essentially the number of collisions a molecule has with other molecules. At a given pressure and temperature, the number of collisions is proportional to the time t since the molecules were released. That is, N µ t, where µ is the proportionality symbol. Based on this, how should the 7 spread of a diffusing cloud of gas atoms depend on t? Give your answer as ∆ x µ ____ where some function of t goes in the blank. Explain your reasoning. For Part 2, turn in 4 graphs (with Q6) and the answers to questions Q5 – Q8. Part 3: The random walk in two dimensions – EXTRA CREDIT The one-dimensional random walk could move only right ( = 0°) or left ( = 180°), with step size 1. Now consider a particle that still has step size 1 but can move randomly at any angle between 0° and 360°, or, equivalently, between 0 and 2π rad. Because RAND() generates a random number between 0 and 1, 2*PI()*RAND()generates a random angle between 0 and 2π rad. With a step size of 1, a particle at (xi, yi) moves to (xi+1, yi+1) = (xi + cos, yi + sin), where is the randomly generated angle. Devise a method to generate and plot five 2000-step two-dimensional random walks. (My spreadsheet is 30 columns across, 2000 columns down.) It will look something like Set both axes to span –80 to 80, and adjust the horizontal and vertical size to make the over-all graph as close to square as you can. You can plot more than 5 if you want, but they’re often so overlapped that more than 5 gets to be a huge mess! Here, much more than in one dimension, you really begin to sense the idea of diffusion. For Part 3, turn in 1 graph showing the paths of 5 or more random walkers. 8