sim2_05

advertisement
Stochastic Models
Simulating Markov Chains in discrete time
1. The transition matrix
A general discrete-time Markov chain has transition probabilities which are likely to vary
depending on the current state. It is not too hard to write a macro in Visual Basic to carry out
the transitions, but it is more challenging, and in some ways more transparent, to try to
arrange a simulation mechanism using only the functions which are built into Excel. We shall
use look-up tables to accomplish our goals.
Load up the workbook from the previous session and insert a new worksheet.1 Give it some
suitable name by double-clicking the tab labelled “Sheet4” (or whatever) and overtyping the
new name. Enter a title for the sheet somewhere on row 1.
We are going to simulate a no-claims discount scheme with 4 states: No discount, small
discount, medium discount and large discount. The transition mechanism is: if there are no
claims during a year, the policyholder moves to the next higher level of discount (unless
already receiving Large discount); if there is one or more claim during a year the
policyholder moves to the next lower discount level (unless receiving no discount).
We need a parameter: type “Claim frequency” into cell A3, with some value like 0.4 in B3:
this represents the average number of claims per year. The Poisson distribution will be used,
so the probability of no claims, q, is given by the formula =EXP(-B3): enter this formula in
B4, with the label q in A4, and name the cell q (using Insert | Name | Define).
Now enter the transition matrix into a blank area on the worksheet, starting from column E.
The diagram below contains labels in column D: they are not necessary, but serve to remind
us which state is which. The procedure outlined here only works if the states are numbered
with integers starting from 0. Don’t forget that every entry must be non-negative and that the
row sums must be 1: it’s probably best to make the last column contain 1(sum of previous
columns).
D
3 No discount
4 Small disc
5 Med disc
6 Large disc
E
0
1
2
3
F
=1-q
=1-q
0
0
G
=q
0
=1-q
0
H
0
=q
0
=1-q
I
=1-SUM(F3:H3)
=1-SUM(F4:H4)
=1-SUM(F5:H5)
=1-SUM(F6:H6)
J
The lookup function needs a cumulative transition matrix: the table shown below will
calculate these in the form required. Use Insert | Name | Define to name the cumulative
matrix (in this example, F8:J11) CumuMatrix, and the vector of states (F12:J12) StateVector.
D
7
8 No discount
9 Small disc
10 Med disc
11 Large disc
12
E
0
1
2
3
F
0
0
0
0
=E8
G
=F3
=F4
=F5
=F6
=E9
H
=G8+G3
=G9+G4
=G10+G5
=G11+G6
=E10
I
=H8+H3
=H9+H4
=H10+H5
=H11+H6
=E11
J
=I8+I3
=I9+I4
=I10+I5
=I11+I6
0
1
The reason for reloading the workbook from session 1 is that we want to re-use the random numbers which we generated last
time. If you no longer have session 1's workbook, start a new one and use the formula =RAND() to generate a column of 200
random numbers on one of the sheets: name this column Random. See Appendix on page 4.
Stochastic Models 200405
Session 2, page 1
2. The LOOKUP function
The LOOKUP function has syntax LOOKUP(lookup_value, lookup_vector, result_vector).
Here the lookup_value is going to be a random number between 0 and 1. We tell Excel which
row of the cumulative transition matrix to consult (the row corresponding to the current state,
i). When it finds the largest entry in that row which does not exceed the lookup_value it notes
how many columns it had to search, then sends back the corresponding entry from the
result_vector: this will be the state j to which the chain jumps next. If this is hard to follow,
type in the formulas below and see if the practical application makes it clearer.
Type column labels n in A6, Xn in B6, then 0 in A7 and some initial state (eg., 0) in B7. Row
8 should contain =1+A7 in A8 and
=LOOKUP(Random, OFFSET(CumuMatrix,B7,0,1,5), StateVector) in B8.
It is this last entry which is doing all the work. The OFFSET function directs the search to
the appropriate row of the matrix CumuMatrix (B7 contains the current state, which is the
row number, where the top row is always numbered 0); the parameters 1 and 5 refer to the
number of rows (1) and the number of columns (5) which are to be looked at. Once the
function has found the appropriate entry it returns the corresponding value from StateVector.
A8:B8 can be copied and pasted into as many rows as you like to simulate a sequence of
consecutive values of the chain. Obtain a plot of the path of the Markov chain using the Line
option of Chart Wizard. Give the column of observed values of the Markov chain
(B7:Bwhatever) the name Chain.
3. Observed average discount
We are going to find the observed proportion of time spent in states 0, 1, 2 and 3. Type 0 in
F14, 1 in G14, 2 in H14, 3 in I14. Then in F15 enter the formula
=COUNTIF(Chain,F14)/COUNT(Chain) to find the observed proportion of time spent
in state 0. Copy the formula into G15:I15.
Now think up some figures for the percentage discount at each discount level, starting from
discount 0 in state 0, and enter them in F16:I16. To calculate the average discount achieved
by the simulated policyholder over the period of the simulation, all we need is the
SUMPRODUCT function, which essentially does a dot product of two vectors. Thus
=SUMPRODUCT(F15:I15,F16:I16) gives the required answer for the observed average
discount.
4. Array functions
Excel has three built-in matrix functions, which can speed up a matrix-type calculation
immensely but are a little tricky to type in. The functions are MDETERM, MINVERSE and
MMULT. We are going to start off this section by using MMULT to produce successive
calculations of the probability that the chain is in state i at time n starting from state 0.
Theoretically, if vn is the vector of probabilities given by vn,i = P(Xn=i), then vn satisfies the
recurrence relation vnT=vn-1TP. We are going to need to multiply a row vector vT by the
transition matrix P.
Stochastic Models 200405
Session 2, page 2
Select the whole transition matrix (F3:I6) and give it a name, such as Transition_matrix. Now
type in the initial vector of probabilities. Assuming that the motorist starts with no discount,
the vector will be (1 0 0 0): enter these four values into cells F18:I18.
We have to multiply this by the matrix P to get the probabilities of being in the various states
at the next time point. So
 select the cells F19:I19 to store the results of the calculation,
 type in the formula =MMULT(F18:I18,Transition_matrix) but DO NOT
PRESS ENTER
 press Ctrl-Shift-Enter instead.
All cells of the new array (F19:I19) will be filled with the same formula, shown inside braces
(curly brackets).
The Ctrl-Shift-Enter method is the way all arrays are entered. If you make a mistake in
entering the formula you will have to clear the whole array at once, then reselect the whole
area to have another go. It is not possible to change just one element of an array.
Copy this formula down into several more blank rows by dragging the drag-handle at the
bottom right corner of the selected area. Watch the probabilities converge to their long-run
limits .
5. Matrix inversion
Let us try to automate the procedure used to calculate . The first thing to bear in mind is that
we need to solve TP = T, or in other words T(I  P) = 0T. To start with, then, let us get the
identity matrix into the worksheet.
Choose a 5 by 5 area such as L3:P7. Enter the names of the states (0, 1, 2, 3 in this case) into
the first column (L3:L6) and the last row (M7:P7). We can fill the matrix using a single array
function: select the matrix area M3:P6, type in the formula =IF(L3:L6=M7:P7,1,0) and
press Ctrl-Shift-Enter. (This function returns a value of 1 if the row is equal to the column, 0
otherwise.)
Below this, in M9:P12, we can now enter the matrix IP: just use a simple subtraction
formula like =M3-F3 and copy it across the whole of M9:P12. (Don't use Ctrl-Shift-Enter
here as we will want to change the last column later on and Excel will not let you change just
part of an array.)
We know, however, that the matrix IP is not invertible, that there are multiple solutions to
the equation T(IP) = 0T, that the last equation in the set is redundant and that the solution
we want has i = 1. This means that we need to replace the last column of the matrix IP by
a column of 1's, resulting in a matrix M, say, then solving the equation TM = (0 0 0 1), which
is to say T = (0 0 0 1) M1.
So you need to replace the last column of the matrix IP (P9:P12) by a column of 1's. Now
select the region M14:P17, type in the formula =MINVERSE(M9:P12) and press Ctrl-ShiftEnter: as if by magic, the inverse of the matrix M appears in the selected area.
You are probably expecting that we will now have to type in (0 0 0 1) and perform a matrix
multiplication to obtain the equilibrium probability vector . In fact, though, a little thought
will reveal that (0 0 0 1) M1 is simply the bottom row of the matrix M1, so in fact the
equilibrium distribution is now sitting in M17:P17.
Stochastic Models 200405
Session 2, page 3
6. Other things to do
The expected long-run discount enjoyed by a policyholder whose expected claim frequency is
as specified in B3 is given by =SUMPRODUCT(F16:I16,M17:P17). Store this formula
in P19, say. We are now going to use a Data Table to calculate the expected long-run
discount for various different values of the claim frequency.
Enter values from 0.1 to 1.0 in a column, say from L21 to L30. Then in M20 use the formula
=P19 to get a copy of the expected percentage discount. Now select the whole area
L20:M30, use Data | Table and in the window labelled "Column input cell" type in B3 (or
the address of whichever cell you have used to store your claim frequency parameter). Press
Enter and the whole table fills up with the required values.
Make a chart out of these values (use an XY scatter plot, with points joined by lines). Now
change the discounts on offer (in F15:I15) and see how the chart changes in response. Label
the axes.
Think how you could use Excel to ensure that your discount scheme makes a profit for your
company both from policyholders who have very few claims per year and from those who
have a large average claim frequency.
Appendix
Problems with using =RAND() for random numbers
The Excel pseudo-random number generator produces acceptable results for our purposes, in
that the sequence of numbers generated passes many of the standard tests for randomness.
The problem with using it in our worksheets is that Excel generates an entire new set of
random numbers any time it calculates anything. So every time you enter a formula in the
worksheet all the random numbers are generated all over again. This plays havoc with any
idea that you might want a repeatable sequence of random numbers.
The only way to get around this is to generate the numbers just once and then fix them; to do
this, replace the formulas by the values (select the list of random numbers, choose Edit |
Copy and then Edit | Paste Special | Values). You are left with a column of numbers,
which does not require constant recalculation; now, however, the drawback is that you can't
just generate a whole new set of random numbers by changing the seed.
All in all, it is easier to write your own random number generator.
Stochastic Models 200405
Session 2, page 4
Download