Numerical programming in C Mervyn Roy January 28, 2015 1 Introduction In this workshop you will gain experience in writing C code to solve numerical problems. You will develop simple numerical algorithms and learn about numerical convergence, accuracy and stability. 1.1 Algorithms To investigate physical problems we develop mathematical models and then attempt to solve the equations. Often this cannot be done analytically and instead we must use a computer. But how do we use a computer programme to solve a mathematical problem? First we must develop an algorithm: a set of instructions the computer can follow to perform the numerical task we are interested in. Next we need to develop a programme to implement the algorithm. Importantly, we must then test our programme, and finally we can run the programme to generate the results we need. A good piece of advice when writing a new piece of code from scratch is to start from the basics. Break the algorithm into steps then programme and test each step in turn. The smaller the piece of code involved, the easier it is to find problems. Making the code compile is the first and easiest step. It is then more difficult (and important) to check the numerical accuracy of the code. Although it may compile and run, is the code actually doing what you want it to? It is important to be absolutely confident that your code is working as you would expect before using it to generate new results. 1 1.2 Existing software There are many existing numerical software packages that can help when solving numerical problems computationally. These range from low level subroutines and functions, like the blas and lapack linear algebra routines, to sophisticated software packages such as MatLab, Maple or IDL that come with canned routines for solving standard problems. Most of these packages are excellent and their use can make life much easier. However, you should never use these routines just as a ’black box’. One rule of thumb is that the more sophisticated a package, the more cautious you should be of using it without at least some understanding of the underlying numerical methods that it employs. 1.3 Further reading This workshop introduces numerical methods, but this is an enormous field and, even in the areas covered here, we will barely scratch the surface. For further information look at the excellent Numerical Recipes in C by Press, Teukolsky, Vetterling and Flannery. This book covers important numerical techniques in a broad range of subjects and also has a comprehensive list of references. 2 Numerical integration Imagine we want to find the definite integral Z b A= f (x)dx. (1) a Perhaps the integral cannot be performed analytically, or perhaps f (x) is some numerical data stored in a table. Then we will need to calculate A using a computer programme. First, we need an algorithm. A is just the area under the curve f (x) from x = a to x = b. To find the area, we could draw f (x) on graph paper and count the squares underneath the curve. Here we will develop a computer programme which will essentially do this for us. The key to the method is to split up the unknown area A into smaller sections of area that we can calculate. One way to do this is illustrated in Fig. 1. Here, the area A is split into trapeziums. In Fig. 1 we divide the range of the integral (a ≤ x ≤ b) into 2 f2 f3 f1 f0 A1 A2 A3 h x0=a x1=a+h x2=a+2h x3=b Figure 1: Trapezium rule algorithm on a uniform grid. 3 equal sections. If we know the value of f (x) at each of the points x0 = a, x1 = a + h, x2 = a + 2h and x3 = b we can then calculate the area of each trapezium section, for example A1 = 0.5h(f0 + f1 ), and sum the areas of all of the trapeziums to get an approximation to the integral, A. 1 1 1 h(f0 + f1 ) + h(f1 + f2 ) + h(f2 + f3 ) 2 2 2 1 = h (f0 + f3 ) + f1 + f2 . (2) 2 A ≈ A1 + A2 + A3 = Obviously, A1 + A2 + A3 is not a very accurate estimate for A, but the approximation can easily be improved by splitting the range of the integral into smaller sections. The trapezium rule algorithm works by approximating the function f (x) to first order within each small range, h. For example, close to x = a, df δ 2 d2 f f (a + δ) = f (a) + δ + + ···, dx a 2 dx2 a (3) where 0 ≤ δ < h. As the range is split into smaller sections, h becomes smaller and the second order (and other higher order) corrections become less important. From Eqs. (2) and (3) it is clear that the trapezium rule is accurate to within a constant multiplied by h3 f 00 . In mathematical text books this is usually denoted by 0(h3 f 00 ). There are many higher order algorithms for numerical integration but, to paraphrase ref. [1], ’higher order does not necessarily mean higher accuracy’. The trapezium rule is usually good for most purposes, but another important algorithm is Simpson’s rule which, through a lucky cancellation of higher order terms, is accurate to 0(h5 f (4) ). We should also briefly mention open integration formulae. The trapezium rule is a closed formula: it explicitly includes the end points f (a) and f (b) from Eq. 2. It is also possible to develop algorithms which do not include these points and this can 3 be advantageous in some situations. For example if f (a) is a singular point. See ref. [1] for more information. To develop a general trapezium rule algorithm to calculate A first split the range b − a into N equal sections so x0 = a, x1 = a + h, · · · , xN −1 = b where h = (b − a)/(N − 1). Then Z b A= a 1 f (x)dx = h (f0 + fN −1 ) + f1 + f2 + · · · + fN −2 + 0(h2 ) 2 N −2 X 1 = h (fo + fN −1 ) + fi . 2 i=1 " # (4) This is easily coded with a single for loop in a C programme. 2.0.1 Task: calculate Rπ 1 2xdx using the trapezium rule • First code the algorithm. Split it into steps then programme and test each step separately. First define the number of steps N. Then calculate the step size, h=(pi-1.0)/(N-1.0). • Use a for loop to count over the N points. For each point calculate the required value of x and f=2.0*x. An example section of code to calculate and print values of x from 0 to (N − 1)h is for (i=0;i<N;i++){ x=h*i; printf("%e\n",x); } • Output x and f to a file and plot the results (see the appendix on plotting with gnuplot) to check that the code is working. • Sum each of the values of f with the appropriate weighting to calculate A. An example section of code to calculate the sum of all the integers between 1 and N − 1 is A=0; for (i=1;i<N;i++){ A=A+i; } printf("Sum = %e\n",A); The important line is A=A+i: each time around the loop the new value of A is set to be equal to the old value of A plus the current value of i. So, the first time around the loop the old value of A is 0 and the new value is then set to be 0+1 = 1. The second time around the loop the 4 old value of A is now 1, the value of i is 2 and so the new value of A is set to be 1+2 = 3. • The trapezium rule should be exact for this integral, independent of the number of points. Check that this is the case. 2.0.2 Task: calculate R 3π/4 π/3 sin(x)dx and examine the convergence R 3π/4 • Modify the code to calculate π/3 sin(x)dx. First generate and plot x and f=sin(x) between π/3 and 3π/4. • Calculate and plot the integral as a function of the step size h (or, alternatively as a function of the number of points, N ). Compare the numerical result to the analytical result and see how the numerical result converges to the exact value as h → 0 (or N → ∞). 2.0.3 Task: calculate R4 10−4 ln(x) sin(exp(−x))dx The first few tasks simulate the type of testing procedure you should go through with any new piece of code. Once the code has been tested you can then be more confident in applying it to unknown cases. 4 • Investigate the convergence of 10 −4 ln(x) sin(exp(−x))dx as a function of the number of points in the integral. R 2.1 Nonlinear grids The problem with the function in the previous task is that it varies rapidly near to the origin and slowly everywhere else. A small step size is needed to get accurate values to the integrand but, away from the origin, the small steps are unnecessary and by calculating them we are wasting computer time. One solution is to break up the range of the integral and use different uniform step sizes in each range. Another solution is to use a continuously varying step size. This type of grid is often useful in physical problems where the length scale varies continuously. For example, exponential grids are often used in atomic physics problems to deal with the rapid variation of the potential near to the nucleus. 5 2.1.1 Task: generalise the trapezium rule to non-uniform step size • Generalise the trapezium rule algorithm to the case of non-uniform step size. For each step you will need to calculate the area of the associated trapezium (A1 , A2 and A3 from Fig. 1) and then sum the areas to get the final result. First, keep the step size as h and test the new version of the code against the previous version. • Generate an appropriate non-uniform grid. An example section of code to do this is xmult=...; x=1.e-4/xmult; for (i=0;i<N;i++){ x=x*xmult; if (i>0) h=x-xold; /* h is the step size */ xold=x; } • Compare the convergence of the new algorithm with the uniform step size algorithm by plotting the result for the integral as a function of N . What other types of non-uniform grid could you use? 3 Series Computers are fantastic tools for performing simple numerical tasks again and again. This means that sequences and series can be very easy to evaluate using a computer programme. 3.1 3.1.1 Power series Task: evaluate the Maclaurin series for exp(x) • Use a single loop to evaluate the Maclaurin series for exp(x) with nmax terms in the expansion and for a given value of x. nb. you can calculate the factorial with a similar code fragment to that used in task 2.1.1 and you can sum up the series in a similar way to the example in task 2.0.1. Hint: Think about the largest number you can represent as an integer and try calculating a range of factorials as both integers and doubles. Can you explain what you see? 6 • Calculate and plot the Maclaurin series result for e10 and e−10 as a function of the number of terms in the series. What is the smallest fractional difference you can obtain between the exact and series values? What limits this difference, and why is the result different for e10 and e−10 ? 3.2 Fourier series As another example let us examine the Fourier sine series for f (x) = 1 with 0 < x < l. This is ∞ X nπx , (5) f (x) = bn sin l n=1 where ( bn = 4/(nπ) 0 if n is odd, otherwise. (6) As usual, to calculate f (x) we first need an algorithm. Imagine that we would like to calculate f (x) for a particular value of x (say x = 2). Once we have chosen a value of x we must evaluate the sum in Eq. 5 but, as before, we cannot compute an infinite number of terms. Instead we have to cut off the series at some maximum number of terms (nmax ) and later we will use our computer program to see how the series converges as the number of terms increases. To calculate f (x) = nX max n=1 bn sin nπx l (7) we can use a fragment of code that looks something like the following: x = 2.0; f = 0.0; for (n=1; n<=nmax; n++) { bn = ...; f = f + bn * sin(n*pi*x/L); } On the first line we tell the computer what value of x we are interested in (x=2.0). Then, for each different value of n we calculate bn using the result from Eq. 6. The next line sums up each term in the series. This uses exactly the same method as in task 2.0.1: the new value of f equals the old value of f plus bn sin(nπx/l). To produce a plot of f (x) against x we need to calculate the series for a range of different x values, and we can do this easily in our computer 7 program with another loop. If we want to plot a graph with ixmax points between some minimum (xmin) and maximum (xmax), x-values we must calculate the Fourier series (as above) at each of the ixmax points. This can be done with the following code fragment: for (ix=0; ix<ixmax; ix++){ x = ... f = 0.0; for (n=1; n<=nmax; n++) { bn = ...; f = f + bn * sin(n*pi*x/L); } printf("%e\t%e\n",x,f); } 3.2.1 Task: Fourier series of a square wave • Code the algorithm to calculate the Fourier series of the square wave. As usual, test each part of the code as you write it. Investigate the appropriate number of x points ixmax that you will need to generate smooth curves. Then plot the Fourier series between −l and 3l. Explain the shape of the waves that you see. • Examine the convergence of the series then investigate (and plot) the Gibbs oscillations as a function of nmax. 3.2.2 Task: Fourier series of more complex functions • Calculate the Fourier sine series for f (x) = 1/(2 − e−x ) for 0 ≤ x < π. In this case we do not know the analytical result for bn . Instead we R must calculate it numerically from bn = (2/π) 0π f (t) sin(nt)dt and this can be done with the code produced in task 2.0.1. Think about the most efficient way to do this. You will need three loops: one loop over x, one loop over n and one extra loop over t to perform the integral to evaluate bn . But, because bn does not depend on x, it would be very inefficient to simply nest the t loop inside the existing code: we only need to calculate each bn once, not ixmax times. One solution is to calculate and store each bn in an array variable at the start of the code, before the loop over x. Think of another efficient way to rearrange the loops. Programme one method and compare the Fourier series result for f (x) to the exact function. n.b. remember that array variables are defined by, for example, double b[10]. This will reserve 8 space for 10 double precision numbers which can be accessed by b[0], b[1], etc. 4 Finding the roots of an equation Often we may want to solve an equation f (x) = 0 where an exact analytical solution is not possible. We could try to approximate the solutions, or we could attempt to solve the equation graphically or numerically with a computer programme. The way that most numerical equation solvers work is as follows: first we guess a value for the root, say x = x1. Then we use some information about the function or its derivatives at x = x1 to generate a new, (hopefully) better, guess for the root at x = x2 . We can then repeat this procedure and move iteratively toward the real root (x = x0 ) until, after N iterations, we decide our guess x = xN ≈ x0 is ’good enough’. i.e. when some set convergence condition is reached, either on the value of the function or on the uncertainty in the position of the root. This idea of moving toward some goal through successive iterations is a very common one in numerical methods - essentially because it is very easy to code. To start most root finding algorithms we must first bracket the root: if f (x) is continuous and f (a) and f (b) have opposite signs then there must be at least one root in the interval (a, b), and we say that the root is bracketed by a and b (see Fig. 2). f(x) a xo b x Figure 2: Bracketing a root. The root at x0 is in the interval (a, b). The vertical dotted line shows the new limit of the interval at (a + b)/2 which will be imposed at the next iteration of the bisection method. 9 4.1 Bisection method Here, we will investigate the bisection method. This is a simple and extremely robust method for finding roots in 1D. In fact, once a root is correctly bracketed, the bisection method cannot fail. It works like this: the root is bracketed with a lower limit xl = a and an upper limit xu = b. We know the root lies somewhere between these two bounds, so our initial guess for the root, x0 , is x1 = (xl + xu )/2 with an error of ±(xl − xu )/2. At each successive iteration we refine our guess for the root by reducing the size of the interval in which we know the root must lie. We bisect the interval and test the value of the function at (xu + xl )/2. If the function has the same sign as the function at xl then the lower limit is updated and we know the root now lies between (xu + xl )/2 and xu . If the function has the same sign as the value of the function at the upper limit (this is the case in Fig. 2) then the upper limit must be updated and we know that xl < x0 < (xu + xl )/2. There are a few special cases where you may have to take care. The most obvious is at a repeated root - or indeed any double root where, although the interval (a, b) may contain a root, there is not necessarily a sign change in f (x) between x = a and x = b. Also, if there are many roots between a and b the bisection method will only find one of them and, interestingly, if there is a singularity but no roots in the initial interval the bisection method will converge to the point of the singularity. The bisection method is usually quite good in 1D (given an appropriate initial interval in which to search). However, in more than one dimension things get much more complicated very quickly. The simplest algorithm which can work is the Newton-Raphson method, where both the function and its derivative at the initial guess for a root are used to generate the next guess. There are also many other advanced methods. See ref. [1] and references therein for more details. 4.1.1 Task: Code the algorithm • Write the code to implement the bisection method given an initial interval (a, b). You will need to loop over iterations, bisecting the interval at each iteration until some convergence criteria is reached. Think about how to implement this stopping condition. You will also need to think about how to orient the search - how to update the appropriate bound to the interval at each iteration depending on the sign of the function at each of the bounds and at the midpoint. • Test the code by solving x3 − 7x2 + 14x − 8 = 0 and comparing the numerical with the analytical solution. Investigate the effect of chang10 ing the initial bounds a and b. What happens if these do not bracket a root? • The ground state energy, E, of an electron in a finite square well of depth V0 is found by solving the transcendental equation 1 ka tan(ka) = (k02 a2 − k 2 a2 ) 2 , (8) where k 2 = 2mE/h̄2 and k02 = 2mV0 /h̄2 . Use your code to find the ground state energy in an InAs quantum well in GaAs with effective mass 0.027, well width 2a = 0.6 nm, and V0 = 0.5 eV. Check your solution graphically. There is a useful discussion on bracketing the root in this case in ref. [2]. 5 Numerical differentiation It is common to have to differentiate a data set numerically. If we have a set of tabulated data fi on a uniform grid xi = ih, where h is the uniform step size, then an obvious first guess for the numerical derivative is df ∆fi fi+1 − fi xi ≈ = . dx ∆xi h (9) This is a forward difference and, although it is a reasonable estimate for the derivative, it is only accurate to 0(hf 00 ), so the error will reduce very slowly (linearly) with the step size, h. This is not very satisfactory, especially if we cannot control the step size in the data - for example if our data set is pre-generated from some experiment. However, we can do better than this by thinking about the Taylor series again. Near to xi we can expand f (xi + h) and f (xi − h) as Taylor series, df h2 d2 f h3 d3 f + + + ···, dx xi 2 dx2 xi 6 dx3 xi f (xi + h) = f (xi ) + h df h2 d2 f h3 d3 f f (xi − h) = f (xi ) − h + − + ···, dx xi 2 dx2 xi 6 dx3 xi (10) where f (xi +h) = fi+1 and f (xi −h) = fi−1 . Clearly, if we take the difference between fi+1 and fi−1 the second order term vanishes and we can write a new (centred difference) approximation to the first derivative, fi+1 − fi−1 df = + 0(h2 f (3) ). dx xi 2h 11 (11) 5.0.2 Task: calculate the first derivative q • ψ(r) = exp(−r/a0 )/ πa30 is the wavefunction of the Hydrogen ground state, where a0 is the Bohr radius. Write a C-function, pr, to calculate the Hydrogen 1s probability density (P (r) = 4πr2 ψ 2 ). Calculate P (r) on a uniform grid and plot it. • Write the code to calculate the first derivative using the centered difference formula and your probability density function from the previous task. You will need to supply a step size h, and make two calls to pr. Plot and examine the first derivative. • Combine this code with your root finding code from section 4 to calculate the most probable value for the radius of an electron in a Hydrogen 1s orbital. Check that this agrees with the analytical value. 5.0.3 Task: calculate the second derivative • Derive the classical 3 point formula for the second derivative by eliminating first and third order terms in Eqs. 10. Write some code to calculate the second difference of P (r) using this formula and a function to evaluate P (r). • Plot P (r) along with the first and second derivative and explain what you see. 6 Multi-dimensional numerical integration Many numerical problems are relatively easy in 1D but tend to get difficult very quickly in more than one dimension. This is unfortunately the case with numerical integration. There are two problems that we come up against. The first and often more serious problem is that the limits of integration in N dimensions can be a complicated. The second problem is that the number of function evaluations needed to perform an N dimensional integral tends to rise as the N th power. So if you need 1000 points in 1D, you will need 106 points in 2D and 1000N points in N -dimensions. However, if the number of dimensions is small, the cost of evaluating our function is cheap and, most importantly, the limits run along co-ordinate axes (or are easily specified in closed form) it is easy to generalise the 1D integration code in section 2 to deal with N dimensions. For more complicated situations there are many special methods that have been developed, for example Monte Carlo integration. 12 6.1 2D integrals with the trapezium rule Before we look at Monte-Carlo integration we will first generalise our trapezium rule integration code to perform a simple 2D integral. Imagine we want to calculate the volume V under a surface g(x, y), Z x2 Z y2 g(x, y)dydx. V = x1 (12) y1 As before, we can calculate our function on a uniform grid. Then evaluating R y2 y1 g(x, y)dy for each of the x = xi points on our grid between x1 and x2 is easy. For each separate value of xi we can calculate the area A(xi ) under the curve, g(xi , y), with our standard trapezium rule code. Fig. 3 shows a surface g(x, y) as a function of x and y. So, for example, the line along x = x2 is essentially a plot of g(x2 , y) as a function of y and the area under this line is simply A(x2 ). We could calculate and store each of the y R y2 R x2 integrals at each xi point, so A(x) = y1 g(x, y)dy. Then V = x1 A(x)dx. And, obviously, this integral can also be calculated using our 1D trapezium code. 28 24 20 16 12 8 g(x,y) 20 A(x) 15 y2 x1 10 x2 y1 x1 x2 Figure 3: Left: surface g(x, y). For each fixed xi we can calculate A(xi ), the area under the R y curve g(xi , y) between y1 and y2 . This is shown in the right hand curve: A(x) = y 2 g(x, y)dx. 1 So to calculate the 2D integral we need to perform a 1D integral over y for each required value of x on our grid, store the result as an array variable, then perform another 1D integration over x. In fact, things are simpler than this. We do not need to explicitly store A(x) at all, we just have to nest the y integration loop inside the x integration loop. If xi = ihx , 0 ≤ i < N , yj = jhy , 0 ≤ j < M and f (xi , yj ) = gi,j then, from the trapezium rule, " V = hx N −2 X 1 (A1 + AN −1 ) + Ai , 2 i=2 # 13 (13) where 1 Ai = hy (gi,1 + gi,M −1 ) + 2 6.1.1 M −2 X gi,j . (14) j=2 Task: Write the code to calculate V = y 2 + y)dxdy R π R π/2 0 1 y exp(−x2 − • Modify your code to calculate V using Eqs. 13 and 14. First make sure you have tested your code on a 2D function that you can integrate analytically. Then calculate the integral above. 6.2 Monte Carlo integration Unsurprisingly (given the gambling connotations) the ’Monte Carlo’ method works with random numbers. If we randomly pick N points at position xi within a multi-dimensional volume V , then the integral of f (xi ) over the volume is given by Z f dV = V hf i ± V where hf i = hf 2 i − hf i2 N −1 1 NX f (xi ). N i=0 !1/2 , (15) (16) Monte Carlo methods are very useful for integrals with complicated √ boundaries but, because the error in the integral only falls off like 1/ N , it can be very difficult to obtain high accuracy. So, how does it work? First, let us look at what happens in 1 dimension. Eq. 15 becomes Z b −1 b − a NX f (x)dx = f (xi ), (17) N i=0 a and you can see that this is very similar to the trapezium rule result from Eq. 4 without the end point corrections (on a uniform grid, h = (b−a)/(N −1)). In the Monte Carlo algorithm we randomly select N points between a and b, weight each point by the value of the function at that point, then sum the weights to get the approximation to the integral. The Monte Carlo algorithm comes into its own for higher dimensional integrals, especially those with complicated limits. Fig. 4 shows an example 2D function g(x, y) which we would like to integrate over the region s. First we 14 1 7 6 5 4 3 2 1 0 Area, A g(x,y) 0.5 1 s -0.5 Region, s 0.5 0 0.5 0 -1 -0.5 0 0.5 1 0 Figure 4: Left: surface g(x, y). Right: region, s, over which we would like to integrate the function. pick N random points (xi , yi ) over the area A (in our example this is the rectangle from −1 < x < 1 and 0 < y < 1). If the point is inside the region s we weight it by the value of the function g(xi , yi ) at this point. If it is outside the region s we set its weight to 0. Then our approximation to the integral is just the sum of the weights multiplied by the area A and divided by the number of points N . This method can be explained with the following analogy. Imagine throwing stones at random into a field of area A in which there is a pond (region s). To find the area of the pond, simply compare the number of stones, p, that fall into the pond with the total number of stones N that are thrown. The area of the pond is then simply Ap/N (and calculating the area of the region s is equivalent to integrating the function g(x, y) = 1 over this region). So the key steps of a Monte Carlo integration are simple: 1. generate a random number. 2. convert that to a position inside a given fixed area or N -dimensional volume. 3. test if the position is inside the region s of the integral. 4. weight the point appropriately. To generate a pseudo-random number you will need to call the C-function rand() which will return an integer between 0 and the system variable RAND MAX. The pseudo-random number generator is often seeded with a call to srand(time(0)) and so you will need to include the <time.h> header file in your code. 6.2.1 Task: Use the Monte Carlo method to calculate p r = x2 + y 2 RR r≤5 e −(x2 +y 2 ) dxdy, • As usual, break your code into steps and test each step independently. Think about how to choose the area A to maximise the efficiency of the code - obviously it must include all of s, but be as close to s as possible so fewer points are wasted. Calculate the Monte Carlo error 15 and show that the numerical result converges to the expected result as N is increased. 6.2.2 Task: a 3D integral • This example is taken directly from numerical recipes. We wish to find the mass of a component of complex shape: the component is a section p cut from a torus defined by the conditions z 2 + ( x2 + y 2 − 3)2 ≤ 1 m2 , x ≥ 1 m, and y ≥ −3 m, and the density is ρ(x, y, z) = (10 + x3/2 ) kg/m3 . Calculate the mass using the Monte Carlo algorithm. Hint: First think about the extremes of the values of x, y and z that are allowed. You should be able to work out the bounds on x, y and z and use these to set the size of the Monte Carlo sampling volume. 7 7.1 Numerical solution of differential equations Euler integration The key step to solving a differential equation numerically is to discretise the equation - to write it as a difference equation on a grid. As an example, let us investigate the second order differential equation governing the motion of a particle in 1 dimension, d2 x 1 = F (v, x, t), 2 dt m (18) where m is the mass of the particle and F (x, v, t) is the force acting on the particle. We first write our second order differential equation as two coupled first order differential equations. If v = dx/dt, then dv dt dx dt = 1 F (v, x, t) m = v, (19) is exactly equivalent to Eq. 18. These coupled differential equations are then easily discretised: to first order, we can use the forward differencing scheme from section 5 to give, 1 = vi + ∆ t F (vi , xi , ti ) m = xi + ∆ t vi . vi+1 xi+1 16 (20) (21) This is the Euler integration method and, as we saw in section 5, its accuracy is strongly dependent on the step size, ∆t . However it is conceptually simple, and easy to program. To turn Eqs. 20 and 21 into a numerical scheme we think of them as update equations. In an initial value problem, we know x0 , v0 and F0 , the values of the position, velocity and force at t = t0 . But, then, from Eqs. 20 and 21 we can calculate v1 and x1 at time t1 = t0 + ∆t . So, we increment the time by ∆t and update the velocity and position. At the next time step, we then know x1 , v1 and F1 so we can use the update equations to generate v2 and x2 , and so on. 7.1.1 Task: code the Euler method and numerically solve the equations for a harmonic oscillator • For harmonic motion, F = −kx − r(v, x). Begin with an undamped oscillator with r = 0, m = 1 kg, k = 2.5 N/m, and dt = 0.1 s. Start the particle from x = 0 with an initial velocity and investigate the motion over a few cycles of oscillation. Compare with the analytical solution. What is going wrong? How small does dt have to be to give reasonable agreement between the numerical and analytical solutions? 7.2 Runge-Kutta integration The problem with the Euler method is that it is only accurate to first order and if ∆t is large, the method is unstable. In each of the coupled difference equations (Eq. 20 and Eq. 21), we have ignored terms proportional to ∆2t . If these are not negligible, our difference equations are not equivalent to the second order differential equation (Eq. 18), and this means we are getting the physics wrong. For example, in Eq. 21 we are missing a term proportional to ∆2t d2 x/dt2 which is equivalent to adding an additional nonphysical acceleration to our particle. This sort of problem is particularly significant in cases where the general solution contains both an exponentially decaying and an exponentially growing part. In physical problems we usually want the part of the solution that decays with time. But, any numerical error will start to mix in a small proportion of the exponentially growing solution and, after a long enough time, this erroneous part to the solution will come to dominate our numerical result. However, the good news is that we can easily do better than the Euler method by using the centred difference approximation from section 5. Then, 17 our update equations become 1 F (vi , xi , ti ) m = xi−1 + 2∆t vi . vi+1 = vi−1 + 2∆t xi+1 (22) (23) This is a second order scheme with error 0(∆3t ) called the midpoint or second order Runge-Kutta method. In this scheme we calculate the position and velocity at time steps 2∆t , and we use some additional information, the position at velocity at an intermediate step ∆t , to improve the accuracy of our update equations. So, how does this work numerically? We start with the initial conditions, vi−1 = v0 and xi−1 = x0 at ti−1 = t0 . Then we make an estimate of the position and velocity at the intermediate time step: vi = vi−1 + ∆t Fi−1 /m and xi = xi−1 + ∆t vi−1 . Finally, these estimates are used to generate the updated position and velocity at t0 + 2∆t using Eqs. 22, and 23. Because this method works on a time step of 2∆t , in text books you will often see a slightly different notation: a full time step of ∆ = 2∆t and an intermediate time step of ∆/2 = ∆t . It is also relatively easy to use the Taylor series (Eq. 3) to generate higher order Runge-Kutta schemes and fourth order Runge-Kutta is often used for physical problems. 7.2.1 Task: code the midpoint method and investigate the undamped oscillator in task 7.1.1 • Start with 2∆t = 0.1. How large can you make 2∆t before the results start to become unphysical? 7.2.2 Task: investigate a damped driven oscillator • Include some frictional force r = βv n in the equation of motion and drive the oscillator with a forcing function A sin(ωt). Start the oscillator from rest away from x = 0 and investigate the transient and steady state parts of the motion. Plot curves to demonstrate the key features of the motion for different values of ω and β. 7.2.3 Task: Lissajous figures • Plot the position against velocity for a damped driven oscillator. Investigate the motion for different values of β and A, and for different initial conditions. Explain what you see. 18 8 Partial differential equations One way to solve a partial differential equation numerically is to discretise it and rewrite it as a finite difference equation. We can do this easily on a regular grid using the Taylor expansion again. For example, take the Laplace equation for the electric potential with no sources of charge, ∇2 v = 0, (24) ∂2v ∂2v + = 0. ∂x2 dy 2 (25) or, in 2 dimensions, To discretise this equation on a uniform grid first Taylor expand vi,j = v(x, y). If vi+1,j = v(x + h, y), vi−1,j = v(x − h, y), vi,j+1 = v(x, y + h), and vi,j−1 = v(x, y − h), then vi+1,j = vi,j + h ∂v h2 ∂ 2 v h3 ∂ 3 v + + + ···, ∂x 2 ∂x2 6 ∂x3 (26) vi−1,j = vi,j − h ∂v h2 ∂ 2 v h3 ∂ 3 v + − + ···, 2 ∂x 2 ∂x 6 ∂x3 (27) vi,j+1 = vi,j + h ∂v h2 ∂ 2 v h3 ∂ 3 v + + + ··· ∂y 2 ∂y 2 6 ∂y 3 (28) vi,j−1 = vi,j − h ∂v h2 ∂ 2 v h3 ∂ 3 v + − + ···, ∂y 2 ∂y 2 6 ∂y 3 (29) If we sum Eqs. 26, 27, 28 and 29 we find, vi+1,j + vi−1,j + vi,j+1 + vi,j−1 = 4vi,j + h 2 ∂2v ∂2v + ∂x2 ∂y 2 ! + 0(h4 f (4) ), (30) and, because ∇2 v = 0, we have vi,j = 1 (vi+1,j + vi−1,j + vi,j+1 + vi,j−1 ) . 4 (31) This is the finite difference version of the Laplace equation (Eq. 25) on a uniform grid with hx = hy . You should also be able to derive the equivalent finite difference equation if hx 6= hy . We can solve this equation using Jacobi’s method. Although this is slow and has been superseded by many other faster methods it is a good place to start. In Jacobi’s method we start our solution off from some arbitrary initial state and let it relax toward the actual solution iteratively. 19 First define a uniform grid, xi = ih, yj = jh, with i = 0 · · · N − 1, and j = 0 · · · N − 1. Fix the values of the potential at the edge of the grid according to the physical boundary conditions of the problem then relax the solution over a number of iterations. At each successive iteration the potential at grid point i, j is set to be equal to to the average of the potential on the surrounding grid points at the previous iteration. So, for iteration n + 1, 1 n n+1 n n n vi,j = vi+1,j + vi−1,j + vi,j+1 + vi,j−1 . (32) 4 As n → ∞ the changes to vi,j become negligible and vi,j converges to the true solution of the finite difference Laplace equation, Eq. 31. A small tweak to Jacobi’s method gives the Gauss-Sidel method which is even easier to code. In Gauss-Sidel we use the updated values of the potential on the right hand side of Eq. 32 as soon as they become available. This avoids the need to store the values of the potential at both the n and the n + 1 iterations. So, how would we go about following this procedure computationally? First define the grid as a 2D array (double v[N][N]) then set the boundary elements of the array accordingly. Next, loop over iterations, n, until some stopping criteria is reached (you must decide when the solution is ’good enough’). For each iteration, loop over all the i and j elements of the 2D array updating v[i][j] according to Eq. 31. In physical problems, there are two common types of boundary condition. In the first type, or Dirichlet boundary condition, the values of the solution vi,j are fixed at the boundaries. This is the type of boundary condition we will deal with here. Alternatively, when Neumann boundary conditions are imposed, the values of the derivative of the solution are fixed at each boundary. L L v=0 2 L v=2 v=0 v=0 1.5 1 0.5 v=sin(πx/L) 0 0 0 L 0 0 L 0 0 L Figure 5: Solution to Laplace equation in 2D for an example system. Left: arrangement of gates. Centre: Grey-scale map of the calculated potential v(x, y) on a 100×100 grid. Right: equipotential lines. 20 8.0.4 Task: write a Gauss-Sidel code to solve the Laplace equation in 2D with Dirichlet boundary conditions • Think about the stopping criterion and how to decide when your solution is acceptable. 8.0.5 Task: calculate the potential in a 2D gated system • Use the following boundary conditions: v(0, y) = 0, v(L, y) = 0, v(x, L) = 0, v(x, 0) = 2 sin(πx/L): the system is grounded on 3 edges and the potential on the bottom gate is set to vary sinusoidally. Calculate the potential on a few different N × N grids (eq. 10 × 10, 50 × 50, and 100 × 100). Demonstrate that the number of iterations needed to converge the solution varies as N 2 . (This slow convergence is one of the reasons that Gauss-Sidel is no longer used for realistic problems, although a minor change - over relaxation with Chebyshev acceleration - makes the Gauss-Sidel scheme competitive for small scale calculations). Examine your converged 2D solutions, v(x, y), with gnuplot. • Solve the Laplace equation analytically for the above case. Compare your numerical and analytical solutions. How would you modify your code to improve the agreement? • Investigate a more complex geometry of gates. Produce some figures to show the key features of the potential. An example with a square terminal of fixed potential in the middle of the system is shown in Fig. 5. A Plotting data with gnuplot Your code will be used to generate data and you will often need to analyse this. For example, you may wish to calculate, output and plot a graph of some function φ(x) as a function of x. There are many ways to accomplish this (for example by using R or IDL) but one relatively easy way is to use a plotting package called gnuplot. This is widely used free software that is available for both Windows and Linux. A.1 Simple plots You will need to format the data you wish to plot as an ascii or text file containing columns of numbers separated by tabs (\t) or by spaces. Once 21 in ascii format your data can be plotted by a whole range of packages, for example you could import your data into Microsoft Origin or even Excel on CFS (although Excel plots look very poor in publications). On the University’s spectre system gnuplot will do the job of plotting your data. gnuplot can be used to generate sophisticated publication quality graphs, but its main advantage is that it makes it extremely easy to produce quick and simple plots which will help you to analyse data as you go along. To run gnuplot, open a terminal window, change directory to the place where your data files are stored, then simply type ’gnuplot’ at the command line. This will give you the gnuplot command line prompt (gnuplot> ). Now, imagine that your ascii data file (data.txt) contains three columns, first x, then f (x), then φ(x). To plot φ(x) against x you simply need to type gnuplot> plot ’data.txt’ using 1:3 with line This plots the data in file ’data.txt’ using columns 1 and 3 with a line. gnuplot understands some abreviations so, for example ’with line’ can be replaced by ’w l’ and ’using’ by ’us’. To plot with points just use ’w p’ instead of ’w l’ and to plot column 1 against column 2 simply replace ’1:3’ with ’1:2’. To overlay two curves on top of each other the syntax is also simple. For example gnuplot> plot ’data.txt’ us 1:2 w l, ’data.txt’ us 1:3 w l plots two lines on the same graph, f (x) (column 2) against x (column 1) and φ(x) (column 3) against x. gnuplot can also perform operations on the data in the columns, for example gnuplot> plot ’data.txt’ us (\$1*\$1):(\$2/3.0) w l would plot f (x)/3 against x2 . You might also want to change the x range (xra) or y range (yra) of a plot. The syntax is simple, for example, set xra[0:10] will set the x range of the plot from 0 to 10. gnuplot can perform many more sophisticated operations like curve fitting or surface and contour plots. It is fairly intuitive software and it comes with a comprehensive help. To access the help type ’ ?’ at the command prompt or look on-line for one of the many gnuplot tutorials or reference manuals. 22 A.2 Functions gnuplot will generate and plot standard and user defined functions. This can be very useful if you want to compare your data to a function, or if you just wish to quickly look at the behaviour of a given function. The syntax to define a function is simple. For example, gnuplot> f(x)=sin(x) gnuplot> g(x,y)=exp(0.1*(x**2+y**2)) f (x) and g(x, y) can then be plotted in 1D or 2D, gnuplot> plot f(x) gnuplot> splot g(x,y) To change the number of points gnuplot uses to plot a function in 1D you will need to change the ’samples’ variable. For example ’set samples 50’ will tell gnuplot to use 50 points in the x direction. The splot command above is short for ’surface plot’ and is a nice way to represent 2D data. You can use a variety of styles, for example the ’w pm3d’ style gives filled shaded surfaces and the ’set view map’ will give a top down view that is often useful. To change the number of points in the y direction when plotting a function in 2D you will need to set the ’isosamples’ variable. gnuplot> set samples 50 ; set isosamples 50 gnuplot> set view map; set pm3d gnuplot> splot g(x,y) w pm3d A.3 Plotting 2D data As well as plotting surfaces for functions, gnuplot can also be used to visualise a 2D data set. For surface plots, gnuplot expects at least 3 columns. For example, a column containing x, a column containing y, and a third column containing the result, g(x, y), for that particular x and y. The best way to organise the data is in blocks: the first block should contain a specific value of x and all the values of y and g(x, y) at that particular value of x. Next, gnuplot will expect a blank line to separate blocks. Then, the next block should contain the next value of x and all the values of y and g(x, y) in the data set at that particular x. And so on. For example, a 2D data set with g(x, y) = exp(xy), 0 ≤ x < 2 and 0 ≤ y < 3 with just 2 points in x and 3 points in y would look like: 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 23 1.000000e+00 1.000000e+00 0.000000e+00 2.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 0.000000e+00 1.000000e+00 2.000000e+00 1.000000e+00 2.718282e+00 7.389056e+00 When the data is in this format all of the various gnuplot surface options like pm3d can be used. A.4 gnuplot printing By default, gnuplot is set to plot to the ’X11 terminal’, or maybe the ’wxt’ or ’qt’ terminals (all refer to the screen). You can change the terminal to many different types (type ’ ? set term’ to get a complete list). Perhaps the most useful types for generating hard copies are the postscript or png types. Postscript is used by most journals for publication quality plots, but png might also be useful for inserting into PowerPoint. To create a postscript file of a plot of column 1 vs 3 in ’data.txt’, type gnuplot> set term post gnuplot> set out ’plot.ps’ gnuplot> plot ’data.txt’ us 1:3 w l This will write the postscript output to a file called ’plot.ps’. After you type ’set term post’ gnuplot will display a list of the default postscript options. You can change these. For example, type ’set term post color’ to generate colour output rather than the default monochrome. Try ’help set term post’ for information on the different postscript options that are available. On spectre you can view postscript files with a viewer called ’gv’, or with the General Image Manipulation Programme (GIMP). To view the file plot.ps simply type ’gv plot.ps’ at the command line. A.4.1 Conversion to pdf You can easily convert postscript output to other formats. For example, to convert plot.ps to a pdf file simply type ’epstopdf plot.ps’ at the command line. 24 A.4.2 Publication quality plots In gnuplot there are many different options available for enhancing the look and style of your plots. Try googling, or searching the gnuplot help for information on axis labels, the key and plot styles. References [1] Numerical Recipes in C, W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery, 2nd. Ed. Cambridge University Press. [2] Quantum Mechanics A. I. M. Rae, 5th. Ed., Taylor & Francis. 25