Management Science (G12 2001/02) Part I (Michaelmas): Uncertainty Part II (Lent): Choice Today’s plan What is Management Science and what’s the purpose of this course? • Spreadsheets as a modelling platform • Introduction to Monte Carlo Simulation • How can a computer roll a die? Management Science • … uses computer technology to help managers make good decisions and optimise organisational processes • Operations Research and Decision Science are other names • Professional bodies – International: Institute for Operations Research and the Management Sciences (INFORMS) (www.informs.org) – UK: Operational Research Society (www.orsoc.org.uk) Core issues of Management Science • Problem areas – decision making – Organisational process design and control • Core issues – complexity – uncertainty – choice (optimality) • Methodology: – Modelling (graphical, computer, or mathematical models) Some important side issues • Models as communication devices – Presentation of quantitative information • Quantitative (“hard”) versus qualitative (“soft”) modelling – The “human factor” • Management information systems – Availability and reliability of data Aim of the course • Make you aware of the potential (and limitations) of the Management Science approach • Provide you with some hands-on experience in modelling (coursework) • Discuss theoretical underpinning of some important modelling templates (exam) Michaelmas term: Uncertainty • Computer models of uncertainty – Monte Carlo simulation • Mathematical models of uncertainty – Probability theory and stochastic processes • Modelling templates – Queuing systems – Markov chains – Others as time permits • Analysing dependencies between uncertainties – Linear regression • Forecasting the future Supervisions and coursework • Supervision 1: Simulation in spreadsheets • Supervision 2: Stochastic processes • Supervision 3: Regression and forecasting • Coursework: Analysis of a business case using spreadsheet simulations (more later…) Today’s plan • What is Management Science and what’s the purpose of this course? Spreadsheets as a modelling platform • Introduction to Monte Carlo Simulation • How can a computer roll a die? Why spreadsheets? • A spreadsheet is a tool that allows you to – – – – store and present quantitative information process quantitative information perform what-if analyses do much more… • Spreadsheets have many disadvantages – – – – Limited data structure (2-dimensional array) Difficult to validate and document Inflexible Unreliable numerical routines The Cons And the Pros Background Information • I assume that you are familiar with basic spreadsheet programming • If not, go through a free tutorial on the web, e.g. – http://www.compusmart.ab.ca/alummis/excel/exceltutorial.h tml – http://www.usd.edu/trio/tut/excel/ – http://www.jcu.edu/infoservice/training/excel/start.htm • More advanced material can be found in B.V. Liengme, A guide to Microsoft Excel for scientists and engineers (CUED Lib.) A tip for spreadsheet modelling • Clearly separate – – – – Data (input to the model that is not under your control) Design parameters (input that is under your control) The actual model (logical description of the model) Model output ( basis for decision, often includes graphical elements) • For larger models use separate worksheets • Ideally, no cell in the logical model section contains a number – These cells only contain formulas and references to other cells, e.g. in the data or parameter section The five stages of computer modelling (Donald Knuth) 1. Decide what you want the model to do 2. Decide how to build the model 3. Build the model 4. Debug the model 5. Trash stages 1 to 4 and start again, now that you know what you really wanted in the first place Don’t get frustrated: A modelling process is a learning process The main benefit of building a (computer) model to analyse a problem is not the quantitative information obtained as output of the model but the enhanced understanding of the problem gained during the modelling process Today’s plan • What is Management Science and what’s the purpose of this course? • Spreadsheets as a modelling platform Introduction to Monte Carlo Simulation • How can a computer roll a die? Example: A product launch • Main criterion: profitability Profit = sales*(unit price- unit costs)-fixed costs • Suppose fixed costs known • Price is a decision variable and influences sales – Pricing decision depends, among other things, on the level of competition and the reaction of the competitors to the launch (uncertain) • Unit cost is uncertain and depends on prices for raw materials, energy, etc. • Let’s look at a spreadsheet model… The Flaw of the Average • Plugging average values into uncertain cells can lead you astray • The resulting bottom line (e.g. profit) is often not the average profit • Mathematical Reason: E(f(X))=f(E(X)) for a random variable X holds ONLY if f is linear What do we want to achieve with a simulation model? • A single number, e.g. an average, gives very little information if model input is uncertain • Manual what-if analysis is cumbersome and biased • We want to estimate the distribution of output cells – Give a graphical representation of this distribution • cumulative distribution function • histogram Example: Value at risk • Given a distribution function for profit, we can read off the loss x such that there is an α% chance that the loss is at least £ x • That number is called the α% value at risk (α% VAR) Profit Distribution 100.0% 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% -£1,500,000 -£1,000,000 -£500,000 £0 £500,000 £1,000,000 £1,500,000 £2,000,000 £2,500,000 10% VAR is roughly £500,000 5% VAR is roughly £800,000 Main steps of a simulation project • Understand the problem • Programme and validate a deterministic model • Determine the distribution of the uncertain inputs – Collect relevant data • Run the simulation experiments • Analyse the model output • Communicate the model and its output Building the model… • Essential questions before you model: – What are the questions that the model is to address? – What are the interesting outputs / performance measures? – What is the appropriate level of detail? • Discuss these questions with all stakeholders in the decision situation / process • Validate the logic of your model before you enter uncertainty Incorporating uncertainty • Estimate a probability distribution on the basis of “hard” data whenever possible – Re-sampling from historic data (see product launch example) is a simple and valid way of generating numbers for uncertain cells • Sometimes you need subjective probabilities – Get estimates from many independent experts (Delphi method) – Check sensitivity of outputs w.r.t. changing probabilities • A triangular distribution is often a good starting point – Defined by lowest, highest and most likely value Running a simulation • Important Rule: do more replications than you expect necessary – at least several hundred • Check that running averages of outputs have settled in a steady state – E.g. record after each replication the average profit over all past replications • Do several runs and compare the results Analysing the output • Use visual aids (histograms, distribution functions, scatter diagrams, etc.) • Use statistics (confidence intervals, hypothesis tests) • What are the implications of the results – for your model world? – for the real world? Communicating the model and the results • IMPORTANT: Communicate regularly with all stakeholders in the decision situation or process you are modelling – Build credibility for your model – Model becomes a “language” – Forces you to be as simple as possible – Forces you to be as relevant as possible A word about simulation platforms • Spreadsheets are useful for the simulation of many day-to-day decision problems • They are NOT suitable for the simulation of complex processes, not least because they are difficult to validate • Professional platforms are available that facilitate the programming and validation of complex models, e.g. through graphical interfaces • Back to spreadsheets… Preparing a spreadsheet model for simulation • Write a model as if all inputs (data) were certain • Mark clearly all uncertain input cells (colour them) • Feed the input cells with appropriate randomly chosen numbers – The F9 key (recalculation) now produces one scenario after the other • Set up a worksheet for replications of your model, using the data table command – More specifics can be found in Ragsdale, chapter 12.4-12.8 Feeding uncertain cells • Suppose cell x is known to be uniformly distributed on the interval [0,1] – Put “=rand()” into cell x – Pressing F9 is equivalent to sampling from a uniform distribution and putting the number into cell x • randbetween(a,b) samples integers between a and b, including the integers a and b, with equal probability 1/(b-a+1) • Analysis Tool Pack needs to be loaded for this to work (Tools –> add-ins) More general distributions • Will see later how to generate more general distributions, using the rand() function (inverse transform method) • Example: norminv(rand(),a,b) samples from a normal distribution with mean a and standard deviation b • Alternative random variable generators are provided with the add-ins in the books by Ragsdale and Savage Today’s plan • What is Management Science and what’s the purpose of this course? • Spreadsheets as a modelling platform • Introduction to Monte Carlo Simulation How can a computer roll a die? Issues to be addressed • How can a computer roll a die? • How can we use past data? • What if random cells are statistically dependent (e.g. annual demands over the next five years, stock prices of BMW and Daimler-Chrysler) Random Number Generation: Can a Computer Roll a Die? • Computers can only perform arithmetic operations which by their very nature give deterministic and not random results. • There is no such thing as a true random number generator on a digital computer. Random numbers generated by a computer are therefore sometimes called PSEUDORANDOM NUMBERS. John von Neumann (1951) Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin. For...there is no such thing as a random number - there are only methods to produce random numbers, and a strict arithmetic procedure of course is not such a method.... We are dealing here with mere ‘‘cooking recipes’’ for making digits... BUT… John von Neumann goes on by saying that these recipes ...probably...can not be justified, but should merely be judged by their results. Some statistical study of the digits generated by a given recipe should be made, but exhaustive tests are impractical. If the digits work well on one problem, they seem usually to be successful with others of the same type. What can we hope for? • An arithmetic method (‘recipe’) that generates a sequence of numbers which appear as if they were randomly chosen in the sense that they pass certain statistical tests • Best understood and widely used are linear congruential RNGs possibly enhanced by a “shuffling technique” Linear Congruential Methods - Choose initial num ber r0 (Seed) - Recursion form ula: ri ari 1 c (mod m) In words: divide ari 1 c by m and let ri be the rem ainderof this division - Need four integer parametersfor the method: m, a, c, r0 modulus m 0, mutiplier 0 < a < m, increment0 c < m and seed 0 < r0 < m Random number streams • A linear congruential method produces a sequence of numbers r0 ,r1 ,r2 ,... • All numbers ri lie between 0 and m-1. • The conversion formula ui=(ri+0.5)/m gives a sequence of numbers u1 , u2,... which lie between 0 and 1. • We call the finite sequence u0 ,u1 ,u2 ,..., un a random number stream generated by a linear congruential RNG Good Versus Bad Random Number Streams • A random number stream u1,u2,u3,...,un should resemble a sequence of n independent samples from a uniform distribution on the interval [0,1]. • Whether a linear congruential RNG has this property depends a lot on the choice of the parameters a,c,m. Example • What is the cycle length of the linear congruential generator with modulus 15, multiplier 4 and increment 0? • Try out various seeds until all numbers between 1 and 14 have shown up: • 1,4,1… • 2,8,2… • 3,12,3… • 5,5… • etc. • Why is this a bad RNG? Cycle Length • Want the sequence of random numbers to successively fill the whole interval [0,1] without leaving large gaps. • Since all numbers ri lie between 0 and m-1, the sequence r0 ,r1 ,r2 ,.... will repeat after at most m -1 iterations – It may, however, start repeating much earlier • The cycle length (or period) of a linear congruential RNG is the minimal length n of a sequence r0 ,r1 ,...,rn with rn = r0 • The RNG is said to have maximal cycle length if its cycle length is m-1 Built-in RNGs • Be suspicious of built-in RNGs on your computer – You can assume that it has maximal cycle length but that does not guarantee good statistical properties • If possible, find out which generator is used and whether it has been tested in the literature • If this information is not available (quite likely), you should at least perform some statistical tests before you use it for simulations Some Popular RNGs • Most RNGs are purely multiplicative(c=0) – m=231-1, a=75, c=0 (Learmouth and Lewis 1973) – m=231-1, a=630,360,016, c=0 (Payne et al. 1969) • A comparison of various multipliers for the modulus m=231-1 has been done in a series of papers by Fishman and Moore (1981,1982,1986). – They found that the statistical performance of the Payne et al. RNG is better than that of the Learmouth Lewis RNG Statistical Testing of RNGs Generate a random number stream u1,...,un and use statistical tests to see how closely the stream resembles a sample of size n drawn from a uniform distribution on the interval [0,1]. Setting up the test • We want to check how good a sequence produced by an RNG ‘fits’ the uniform distribution • Divide the interval [0,1] into k subintervals of equal size. (Typically 100<k<n/5) • Determine the number fi of values in the random number stream that fall in the i-th subinterval What do we expect? • If the ui’s are drawn from a uniform distribution over [0,1] then we expect that fi is approximately n/k • Chi-square Statistic k k n 2 ( fi ) n i 1 k 2 • Mathematical result: If the frequencies fi are obtained from a uniform distribution (and n>5k) then the distribution of the random variable is close to a chi-square distribution with k-1 degrees of freedom The goodness of fit test • Suppose RNG is uniform (hypothesis) and let x be the observed value of the test statistic • The hypothesis is (statistically) inconsistent with the observation if it is unlikely that the test statistic assumes a value as large as x – Reject hypothesis if P( x) is small, e.g. below 5% • P( x) is called the p-value of the test – Reject if p-value is below the significance level (5%) Higher dimensional goodness of fit tests • d-dimensional vectors (u1 ,...,ud ), (ud+1 ,...,u2d ), ... should be uniformly distributed in the d-dimensional cube [0,1]d • A division of [0,1] into k subintervals of equal size gives a division of the d-cube [0,1]d into dk subcubes of equal volume • Generate n d-vectors U1 ,...,Un (each requiring the generation of d random numbers) and let fi1...id be the number of vectors having their j th component in the ij th subinterval • Chi-squared test can be appropriately modified Shuffling • One can enhance the performance of an RNG by using another RNG to shuffle a set of random numbers generated by the first one. • Start with an array U=(u1 ,...,un ) of n random numbers generated by RNG 1. Then repeat the following procedure: – Use RNG 2 to generate a random integer M between 1 and n. – Select uM as your next random number and replace uM in the array U by a new random number generated by RNG 1 Further Tests • Want consecutive draws to be independent • Test of independence can be done – by estimating lag correlation – by examining the random number stream for unbroken sub-sequences of maximal length within which the random numbers increase monotonically (run-ups or run-downs). Standard reference: D.E. Knuth “The Art of Computer Programming”, Volume 2. Serial correlation • Main disadvantage of congruential RNGs is that they are not free of serial correlation – Example: Steel manufacturer simulates sales • Main uncertainty drivers aare time between orders and order quantity • Uncertain quantities are generated in pairs (time to next order, order quantity) during the simulation • There should be no correlation between the two variables • More sophisticated RNGs avoid this but they are computationally more expensive – There is generally a trade-off between statistical performance and computing time Where are we now? • You know that RNGs do not really generate random numbers but rather pseudo-random numbers • A lot of software has a built-in uniform RNG – You have seen a popular “recipe” for uniform RNGs – You have seen how uniform RNGs can be tested statistically • How can we generate other random variables with a uniform RNG? The Inverse Transform Method (discrete RVs) • Given a stream of uniform random numbers ui on [0,1], generate a stream of realizations xi of a discrete RV X with probability mass function pi=P(X=i) for i=0,1,2,... • If U is a uniform RV then P(p1+...+ps-1 U p1+...+ps-1+ps) = ps • Hence, given ui we set xi=i if p1+...+ps-1 ui p1+...+ps-1+ps The Inverse Transform Method (continuous RVs) • Task: Given a stream of uniform random numbers ui,…,un on [0,1], generate a sample x1,…, xn from a continuous distribution with cumulative distribution function F • Assumption for the method: F increases strictly on the range of values x with 0<F(x)<1. – Then F(x)=z has a unique solution x=F-1(z) for every z with 0<z<1 – and F-1(z) is strictly increasing as well The Inverse Transform Method (continuous RVs, cont.) • If U is uniform RV on [0,1] then F(x) = P(U F(x)) = P(F-1 (U) F-1(F(x))) = P(F-1 (U) x) • Hence the RV F-1 (U) has the same distribution as X • Calculate xi from ui by setting xi = F-1 (ui ), i.e. by solving F(xi )= ui – Example Norminv(rand(),a,b) – Applies to exponential RV with mean m (F(x)=1-e-x/m) What if you don’t know the distribution? • Many distributions (normal, exponential, etc.) are defined by a small set of parameters (mean, variance, etc.) – If you know the parameters you know the distribution • Sometimes you don’t know the specific type of distribution but are instead faced with a set of historical company data for the uncertain cell – e.g. order quantities and times between orders for a particular product Resampling • In this case, you can create a historical distribution function and sample from this • There an easy way of doing this – Suppose you have a list of n data values (in a spreadsheet) – Choose a number x between 1 and n at random (using randbetween(.,.) function) – Select the x-th data in your list and plug it into the cell (using the vlookup or hlookup functions) Statistical dependence • If uncertain cells are statistically dependent (e.g. temperature and ice cream sales) then sampling independently from a list of temperatures for the temp-cell and from a list of ice cream sales for the sales cell gives a wrong result – try this with the product launch model with price and sales • Instead, set up a list of associated (temperature,sales) pairs and then resample from the list of pairs and assign the temperature and sales from the same pair to the respective cells What you have learned? • Management Sciene is about using computers to help managers make good decisions and optimise their operational processes • Spreadsheets – are used and accepted by managers everywhere in the world – are good fun to play with – have considerable drawbacks and should not be used for complex models • Monte Carlo Simulation is a good way of analysing uncertainty • You know how a computer samples from a distribution – You can perform a simple test for a built-in RNG – You can use uniform RNGs to generate samples from more general distributions Homework • Test the Excel rand() function using a 1dimensional goodness of fit test with k=100 and n=500 – use chidist function (function wizard fx -> statistical) to compute the tail probability of the chi-square distribution • Consider the linear congruential RNG with m=64,a=21, c=1. – Use Excel to show that it has maximal cycle length. – Plot the overlapping pairs (u1,u2),(u2,u3),(u3,u4),.... Is the emerging pattern a pattern that you would expect to be generated by a good RNG? Could this RNG cause problems in a simulation? Homework • Generate a binomial distribution with parameters 10 (# trials) and 0.2 (winning probability) in a spreadsheet and generate a histogram and a cumulative distribution function by running a simulation • Set up the Excel spreadsheet for the product launch and look at the differences in the simulation results if you sample prices and sales independently rather than dependently