Stochastic Models Simulating Markov Chains in discrete time 1. The transition matrix A general discrete-time Markov chain has transition probabilities which are likely to vary depending on the current state. It is not too hard to write a macro in Visual Basic to carry out the transitions, but it is more challenging, and in some ways more transparent, to try to arrange a simulation mechanism using only the functions which are built into Excel. We shall use look-up tables to accomplish our goals. Load up the workbook from the previous session and insert a new worksheet.1 Give it some suitable name by double-clicking the tab labelled “Sheet4” (or whatever) and overtyping the new name. Enter a title for the sheet somewhere on row 1. We are going to simulate a no-claims discount scheme with 4 states: No discount, small discount, medium discount and large discount. The transition mechanism is: if there are no claims during a year, the policyholder moves to the next higher level of discount (unless already receiving Large discount); if there is one or more claim during a year the policyholder moves to the next lower discount level (unless receiving no discount). We need a parameter: type “Claim frequency” into cell A3, with some value like 0.4 in B3: this represents the average number of claims per year. The Poisson distribution will be used, so the probability of no claims, q, is given by the formula =EXP(-B3): enter this formula in B4, with the label q in A4, and name the cell q (using Insert | Name | Define). Now enter the transition matrix into a blank area on the worksheet, starting from column E. The diagram below contains labels in column D: they are not necessary, but serve to remind us which state is which. The procedure outlined here only works if the states are numbered with integers starting from 0. Don’t forget that every entry must be non-negative and that the row sums must be 1: it’s probably best to make the last column contain 1(sum of previous columns). D 3 No discount 4 Small disc 5 Med disc 6 Large disc E 0 1 2 3 F =1-q =1-q 0 0 G =q 0 =1-q 0 H 0 =q 0 =1-q I =1-SUM(F3:H3) =1-SUM(F4:H4) =1-SUM(F5:H5) =1-SUM(F6:H6) J The lookup function needs a cumulative transition matrix: the table shown below will calculate these in the form required. Use Insert | Name | Define to name the cumulative matrix (in this example, F8:J11) CumuMatrix, and the vector of states (F12:J12) StateVector. D 7 8 No discount 9 Small disc 10 Med disc 11 Large disc 12 E 0 1 2 3 F 0 0 0 0 =E8 G =F3 =F4 =F5 =F6 =E9 H =G8+G3 =G9+G4 =G10+G5 =G11+G6 =E10 I =H8+H3 =H9+H4 =H10+H5 =H11+H6 =E11 J =I8+I3 =I9+I4 =I10+I5 =I11+I6 0 1 The reason for reloading the workbook from session 1 is that we want to re-use the random numbers which we generated last time. If you no longer have session 1's workbook, start a new one and use the formula =RAND() to generate a column of 200 random numbers on one of the sheets: name this column Random. See Appendix on page 4. Stochastic Models 200405 Session 2, page 1 2. The LOOKUP function The LOOKUP function has syntax LOOKUP(lookup_value, lookup_vector, result_vector). Here the lookup_value is going to be a random number between 0 and 1. We tell Excel which row of the cumulative transition matrix to consult (the row corresponding to the current state, i). When it finds the largest entry in that row which does not exceed the lookup_value it notes how many columns it had to search, then sends back the corresponding entry from the result_vector: this will be the state j to which the chain jumps next. If this is hard to follow, type in the formulas below and see if the practical application makes it clearer. Type column labels n in A6, Xn in B6, then 0 in A7 and some initial state (eg., 0) in B7. Row 8 should contain =1+A7 in A8 and =LOOKUP(Random, OFFSET(CumuMatrix,B7,0,1,5), StateVector) in B8. It is this last entry which is doing all the work. The OFFSET function directs the search to the appropriate row of the matrix CumuMatrix (B7 contains the current state, which is the row number, where the top row is always numbered 0); the parameters 1 and 5 refer to the number of rows (1) and the number of columns (5) which are to be looked at. Once the function has found the appropriate entry it returns the corresponding value from StateVector. A8:B8 can be copied and pasted into as many rows as you like to simulate a sequence of consecutive values of the chain. Obtain a plot of the path of the Markov chain using the Line option of Chart Wizard. Give the column of observed values of the Markov chain (B7:Bwhatever) the name Chain. 3. Observed average discount We are going to find the observed proportion of time spent in states 0, 1, 2 and 3. Type 0 in F14, 1 in G14, 2 in H14, 3 in I14. Then in F15 enter the formula =COUNTIF(Chain,F14)/COUNT(Chain) to find the observed proportion of time spent in state 0. Copy the formula into G15:I15. Now think up some figures for the percentage discount at each discount level, starting from discount 0 in state 0, and enter them in F16:I16. To calculate the average discount achieved by the simulated policyholder over the period of the simulation, all we need is the SUMPRODUCT function, which essentially does a dot product of two vectors. Thus =SUMPRODUCT(F15:I15,F16:I16) gives the required answer for the observed average discount. 4. Array functions Excel has three built-in matrix functions, which can speed up a matrix-type calculation immensely but are a little tricky to type in. The functions are MDETERM, MINVERSE and MMULT. We are going to start off this section by using MMULT to produce successive calculations of the probability that the chain is in state i at time n starting from state 0. Theoretically, if vn is the vector of probabilities given by vn,i = P(Xn=i), then vn satisfies the recurrence relation vnT=vn-1TP. We are going to need to multiply a row vector vT by the transition matrix P. Stochastic Models 200405 Session 2, page 2 Select the whole transition matrix (F3:I6) and give it a name, such as Transition_matrix. Now type in the initial vector of probabilities. Assuming that the motorist starts with no discount, the vector will be (1 0 0 0): enter these four values into cells F18:I18. We have to multiply this by the matrix P to get the probabilities of being in the various states at the next time point. So select the cells F19:I19 to store the results of the calculation, type in the formula =MMULT(F18:I18,Transition_matrix) but DO NOT PRESS ENTER press Ctrl-Shift-Enter instead. All cells of the new array (F19:I19) will be filled with the same formula, shown inside braces (curly brackets). The Ctrl-Shift-Enter method is the way all arrays are entered. If you make a mistake in entering the formula you will have to clear the whole array at once, then reselect the whole area to have another go. It is not possible to change just one element of an array. Copy this formula down into several more blank rows by dragging the drag-handle at the bottom right corner of the selected area. Watch the probabilities converge to their long-run limits . 5. Matrix inversion Let us try to automate the procedure used to calculate . The first thing to bear in mind is that we need to solve TP = T, or in other words T(I P) = 0T. To start with, then, let us get the identity matrix into the worksheet. Choose a 5 by 5 area such as L3:P7. Enter the names of the states (0, 1, 2, 3 in this case) into the first column (L3:L6) and the last row (M7:P7). We can fill the matrix using a single array function: select the matrix area M3:P6, type in the formula =IF(L3:L6=M7:P7,1,0) and press Ctrl-Shift-Enter. (This function returns a value of 1 if the row is equal to the column, 0 otherwise.) Below this, in M9:P12, we can now enter the matrix IP: just use a simple subtraction formula like =M3-F3 and copy it across the whole of M9:P12. (Don't use Ctrl-Shift-Enter here as we will want to change the last column later on and Excel will not let you change just part of an array.) We know, however, that the matrix IP is not invertible, that there are multiple solutions to the equation T(IP) = 0T, that the last equation in the set is redundant and that the solution we want has i = 1. This means that we need to replace the last column of the matrix IP by a column of 1's, resulting in a matrix M, say, then solving the equation TM = (0 0 0 1), which is to say T = (0 0 0 1) M1. So you need to replace the last column of the matrix IP (P9:P12) by a column of 1's. Now select the region M14:P17, type in the formula =MINVERSE(M9:P12) and press Ctrl-ShiftEnter: as if by magic, the inverse of the matrix M appears in the selected area. You are probably expecting that we will now have to type in (0 0 0 1) and perform a matrix multiplication to obtain the equilibrium probability vector . In fact, though, a little thought will reveal that (0 0 0 1) M1 is simply the bottom row of the matrix M1, so in fact the equilibrium distribution is now sitting in M17:P17. Stochastic Models 200405 Session 2, page 3 6. Other things to do The expected long-run discount enjoyed by a policyholder whose expected claim frequency is as specified in B3 is given by =SUMPRODUCT(F16:I16,M17:P17). Store this formula in P19, say. We are now going to use a Data Table to calculate the expected long-run discount for various different values of the claim frequency. Enter values from 0.1 to 1.0 in a column, say from L21 to L30. Then in M20 use the formula =P19 to get a copy of the expected percentage discount. Now select the whole area L20:M30, use Data | Table and in the window labelled "Column input cell" type in B3 (or the address of whichever cell you have used to store your claim frequency parameter). Press Enter and the whole table fills up with the required values. Make a chart out of these values (use an XY scatter plot, with points joined by lines). Now change the discounts on offer (in F15:I15) and see how the chart changes in response. Label the axes. Think how you could use Excel to ensure that your discount scheme makes a profit for your company both from policyholders who have very few claims per year and from those who have a large average claim frequency. Appendix Problems with using =RAND() for random numbers The Excel pseudo-random number generator produces acceptable results for our purposes, in that the sequence of numbers generated passes many of the standard tests for randomness. The problem with using it in our worksheets is that Excel generates an entire new set of random numbers any time it calculates anything. So every time you enter a formula in the worksheet all the random numbers are generated all over again. This plays havoc with any idea that you might want a repeatable sequence of random numbers. The only way to get around this is to generate the numbers just once and then fix them; to do this, replace the formulas by the values (select the list of random numbers, choose Edit | Copy and then Edit | Paste Special | Values). You are left with a column of numbers, which does not require constant recalculation; now, however, the drawback is that you can't just generate a whole new set of random numbers by changing the seed. All in all, it is easier to write your own random number generator. Stochastic Models 200405 Session 2, page 4