CIS 2033 based on Dekking et al. A Modern Introduction to Probability and Statistics. 2007 Michael Baron. Probability and Statistics for Computer Scientists, CRC 2006 Instructor Longin Jan Latecki Ch. 6 Simulations What is a simulation? One uses a model to create specific situations in order to study the response of the model to them and then interprets this in terms of what would happen to the system “in the real world”. Models for such systems involve random variables, and we speak of probabilistic or stochastic models, and simulating them is stochastic simulation. Stochastic simulation of a system means generating values for all the random variables in the model, according to their specified distributions, and recording and analyzing what happens We refer to the generated values as realizations of the random variables Quick exercise 6.1 Describe how you can simulate a coin toss when instead of a coin you have a die. Any ideas on how to simulate a roll of a die if you only have a coin? Quick exercise 6.1 Describe how you can simulate a coin toss when instead of a coin you have a die. Any ideas on how to simulate a roll of a die if you only have a coin? Repeat the coin tosses if you get TTH or TTT. Quick exercise 6.2 A random variable Y has outcomes 1, 3, and 4 with the following probabilities: P(Y = 1) = 3/5, P(Y = 3) = 1/5, and P(Y = 4) = 1/5. Describe how to construct Y from a U(0, 1) random variable. 6.2 Generating realizations: Bernoulli Simulations are almost always done by computer, which usually have one or more socalled (pseudo) random number generators. Which mimics U(0,1). Bernoulli random Variables Suppose U has a U(0,1) distribution. To construct a Ber(p) random variable for some 0 < p < 1. so that A Matlab code: U = rand; X = (U<p) Binomial Binomial Random Variables We can obtain a binomial RV as a sum of n independent Bernoulli RVs. For this we start with n uniform RVs, for example: A Matlab code: n = 20; p = 0.68; U = rand(n,1); X = sum(U<p) Geometric Geometric Random Variables A while loop of Bernoulli trials generates a geometric RV. We run the loop until the first success occurs. Variable X in the while loop counts the number of failures. A Matlab code: X=1; while rand > p X = X + 1; end; X %at least one trail %continue while there are failures %stop at the first success Arbitrary discrete distribution Arbitrary Discrete Random Variable X that takes values x0, x1, … with probabilities pi = P(X=xi) such that p0+p1+…=1. Algorithm 1. Divide interval [0, 1] into subintervals of lengths pi A0 = [0, p0) A1 = [p0, p0+p1) A2 = [p0+p1, p0+p1+p2), … 2. Obtain U from Uniform(0, 1), the standard uniform distribution 3. Set X = xi if U belongs to interval Ai A0 0 X has the desired distribution, since P( X xi ) P(U Ai ) pi A1 A2 A3 1 Continuous Random Variables GaussianEx1.m plots histogram of samples drawn from a Gaussian with mean zero and std one and plots the Gaussian with mean 1 and std 1. Values of histogram are normalized so that the sum of the area of the histogram bars (rectangles) is equal to 1. This way both plots are directly comparable, since the area under the Gaussian curve is also one. x = -5:0.1:5; g=normpdf(x,0,1); samples=randn(1000,1); %samples = normrnd(0,1,1000,1); [hs, hx] = hist(samples,x); hs=(hs/1000)*(101/10); figure; bar(x,hs,'r'); hold on plot(x,g,'LineWidth',3); Here we drew samples from the target distribution, which is usually impossible. Continuous Random Variables A simple yet surprising fact: Regardless of the initial distribution of RV X, the cdf of X, FX(X), has uniform distribution: Let X be a continues RV with a strictly increasing cdf FX(x). Define an RV as U = FX(X). Then the distribution of U is Uniform(0, 1). Proof: Since FX(x) is a cdf, 0 ≤ FX(x) ≤ 1 for all x. Hence values of U lie in [0, 1]. We show that the cdf of U is FU(u)=u, hence U is Uniform(0, 1): FU (u ) P (U u ) P ( FX ( X ) u ) P ( X F inv X (u )) FX ( F u inv X (u )) by definition of cdf Generating Continuous Random Variables In order to generate a continues RV X with cdf FX, let us revert the formula U = FX(X). Then X can be obtained from the standard uniform variable U as X F inv X (U ) Proof: P( X x) P( F inv X (U ) x) P( FX ( F inv X (U )) FX ( x)) P(U FX ( x)) since U is Uniform(0, 1) FX ( x) Algorithm 1. Obtain a sample u from the standard uniform RV U 2. Compute a sample from X as x F inv X (u) Exponential random variables Applying this method to exponential distribution F(x) = 1 – e-λx F(x) = u ↔ x = ln (1- u) = Finv(u) if U has a U(0,1) distribution, then the random variable X is defined by X = Finv(U) = ln (1- U) = ln (U) So we can simulate a RV with Exp(λ) using a Uniform(0,1) RV SamplingEx1.m plots histogram of samples drawn from exponential distribution and plots the exponential distribution Values of histogram are normalized so that the sum of the area of the histogram bars (rectangles) is equal to one. This way both plots are directly comparable. x = 0:0.1:10; lambda=1; f=lambda*exp(-lambda*x); %sample from uniform dist. U(0,1) samples=rand(1000,1); %samples = normrnd(0,1,1000,1); samples=-(1/lambda)*log(1-samples); [hs, hx] = hist(samples,x); hs=(hs/1000)*(101/10); figure; bar(x,hs,'r'); hold on plot(x,f,'LineWidth',3); Quick exercise 6.3 A distribution function F is 0 for x < 1 and 1 for x > 3, and F(x) = 1/4 (x − 1)2 if 1 ≤ x ≤ 3. Let U be a U(0, 1) random variable. Construct a random variable with distribution F from U. Quick exercise 6.3 A distribution function F is 0 for x < 1 and 1 for x > 3, and F(x) = 1/4 (x − 1)2 if 1 ≤ x ≤ 3. Let U be a U(0, 1) random variable. Construct a random variable with distribution F from U. 6.4 A single server queue There is a well and people want to determine how powerful of a pump they should buy. The more power would mean less wait times but also increased cost Let Ti represent the time between consecutive customers, called interarrival time, e.g., the time between customer 1 and 2 is T2. Let Si be the length of time that customer i needs to use the pump The pump capacity v (liters per minute) is a model parameter that we wish to determine. If customer i requires Ri liters of water, then Si = Ri / v To complete the model description, we need to specify the distribution of T and R Interarrival times: every Ti has an Exp(0.5) distribution (minutes) Service requirement: every Ri has U(2,5) distribution (liters) 6.4 cont Wi denotes the waiting time of customer i, for customer W1 = 0 since he doesn’t have to wait. The waiting time of customer i depend on customer i-1: Wi = max{Wi-1 + Si-1 - Ti, 0} We are interested in average waiting time: n= The average wait time for v = 2 turns out to be around 2 minutes The average wait time for v = 3 turns out to be around .5 minutes plot of (n, n) for customers n = 1,2,… 6.4 Work in system One approach to determining how busy the pump is at any one time is to record at every moment how much work there is in the system. For example if I am halfway through filling my 4 liter container and 3 people are waiting who require 2, 3, and 5 liters, then there are 12 liters to go; at v = 2, there are 6 minutes of work in the system and at v = 3 there is just 4. The amount of work in the system just before a customer arrives equals the waiting time of the customer, this is also called virtual waiting time. Figure 6.8 illustrates the work in the system for v = 2 and v = 3