Slide set 18 Stat 330 (Spring 2015) Last update: February 16, 2015 Stat 330 (Spring 2015): slide set 18 Stochastic Processes Review: What is a Random variable? Definition: A stochastic process is a set of random variables indexed by some indices, particularly time t, and is usually denoted by X(t). Some remarks: 1. Stochastic process is a mathematical model of reality. 2. Modeling usually requires somehow specifying the joint distribution (X(t1), · · · , X(tk )) or P (X1 ∈ A1, · · · , Xk ∈ Ak ) 3. Values of X(t) are called states, the set of all possible values for X(t) is called the state space. The example about ’hits on a webpage’ is a typical example of stochastic process, and it has a special name: Poisson Process. 1 Stat 330 (Spring 2015): slide set 18 Poisson Process Review: What is Exponential distribution? and Poisson distribution? 1. Exponential: P (T ≤ t) = 1 − e−λt for all t ≥ 0 where T is waiting time for rare event to happen (once). 2. Poisson: P (X = k) = e−λλx/x! where X is the number of observations of rare event during certain time period (or space). 3. pdf of Exponential distribution: fT (t) = λe−λt for t ≥ 0, and λ is the rate, 1/time. What is E(T ), and Var(T )? What is E(X) and Var(X)) 4. Lack of memory property for Exponential: P (T > t + s|T > t) = P (T > s) (this is key for Poisson process later) 5. Exponential race: P (min(S, T ) > t) = P (S > t, T > t) = e−(λ+µ)t if T, S independent. What about P (min(T1, · · · , Tn) > t)? 2 Stat 330 (Spring 2015): slide set 18 Poisson process Definition: A stochastic process X(t) is called homogenous Poisson process with rate λ, if 1. for t > 0, X(t) takes values in {0, 1, 2, 3, . . .}. 2. distribution depends only on length of interval for any 0 ≤ t1 < t2: X(t2) − X(t1) ∼ P oλ(t2−t1) 3. non-overlapping intervals are independent for any 0 ≤ t1 < t2 ≤ t3 < t4 X(t2) − X(t1) is independent from X(t4) − X(t3) Jargon: X(t) is a “counting process” with independent Poisson increments. 3 Stat 330 (Spring 2015): slide set 18 Example ♣ A counter of the number of hits on our webpage is an example for a Poisson process with rate λ = 2/min. ♥ Here arrival times are generated from Exp(2). X(t) counts numbers of hits until time t min. ♦ For example, we find that X(t) = 3 for t ∈ [5, 8] minutes; i.e., only 3 hits upto any time within 5 to 8 minutes. 4 Stat 330 (Spring 2015): slide set 18 Example (cont’d) Remarks 1. X(t) can be thought of as the number of occurrences until time t. 2. Similarly, X(t2) − X(t1) is the number of occurrences in the interval (t1, t2]. 3. With the same argument, X(0) = 0 - ALWAYS! 4. The distribution of X(t) is Poisson with rate λt, since: X(t) = X(t) − X(0) ∼ P oλ(t−0) 5 Stat 330 (Spring 2015): slide set 18 Example (Cont’d) Based on the last example: For a given Poisson process X(t) we define occurrences O0 = 0, Oj = time of the j thoccurrence = the first t for which X(t) ≥ j and the inter-arrival time between successive hits: Ij = Oj − Oj−1 for j = 1, 2, . . . The time until the k th hit Ok is therefore given as the sum of inter-arrival times Ok = I1 + . . . + Ik . 6 Stat 330 (Spring 2015): slide set 18 Equivalence theorem Equivalence theorem: X(t) is a Poisson process with rate λ iff the inter-arrival times I1, I2, . . . are i.i.d. Expλ. Corollary: The time until the kth hit Ok is an Erlangk,λ distributed variable, ⇐⇒ X(t) is a Poisson process with rate λ. Note: This theorem is very important! - it links the Poisson, Exponential, and Erlang distributions tightly together! Some thoughts: • Why Poisson so important?! • We mention homogeneous Poisson process; What is meant by homogeneous? • What is a nonhomogeneous process? 7 Stat 330 (Spring 2015): slide set 18 Example Hits on a website: Hits on a popular Web page occur according to a Poisson Process with a rate of 10 hits/min. One begins observation at exactly noon. 1. Evaluate the probability of 2 or less hits in the first minute. Let X be the number of hits in the first minute, then X is a Poisson variable with λ = 10: P (X ≤ 2) = P o10(2) = e−10 + 10 · e−10 + 102/2e−10 = 0.0028. (You may also check the Poisson cdf table). 2. Evaluate the probability that the time till the first hit exceeds 10 seconds. Let Y be the time until the first hit - then Y has an Exponential distribution with parameter λ = 10 per minute or λ = 1/6 per second. P (Y ≥ 10) = 1 − P (Y ≤ 10) = 1 − (1 − e−10·1/6) = e−5/3 = 0.1889. 8 Stat 330 (Spring 2015): slide set 18 3. Evaluate the mean and the variance of the time till the 4th hit. Let T be the time till the 4th hit. Then T has an Erlang distribution with stage parameter k = 4 and λ = 10 per minute. E[T ] = V ar[T ] = k 4 = = 0.4 minutes λ 10 4 k 2 = = 0.04minutes . 2 λ 100 4. Evaluate the probability that the time till the 4th hit exceeds 24 seconds. Need P (T > 24/60) where T ∼ Erlang(4, 10) and T is in minutes; so we’ll use the Gamma-Poisson formula: P (T > 0.4) = P (X < 4) where X ∼ P oi(λ · t) = P (X ≤ 3) where X ∼ P oi(10 · 0.4) = P o4(3) = 0.433 Website table,p.786 or Baron p.384 9 Stat 330 (Spring 2015): slide set 18 5. The number of hits in the first hour is Poisson with mean 600. You would like to know the probability of more than 650 hits. Exact calculation isn’t really feasible. So approximate this probability and justify your approximation. Recall that a Poisson distribution with large rate λ can be approximated by a normal distribution with mean µ = λ and variance σ 2 = λ. approx Then X ∼ N (600, 600) → Z := approx X−600 √ ∼ 600 N (0, 1). Then: P (X > 650) = 1 − P (X ≤ 650) = 1 − P Z≤ 650 − 600 √ 600 ≈ ≈ 1 − Φ(2.05) = 1 − 0.9798 = 0.0202. Webpage table, p.789 or Baron p. 386 10 Stat 330 (Spring 2015): slide set 18 Poisson Process: Conditioning Poisson process possesses an interesting property that is consistent with thinking of it as ”random occurrences” in time t, which leads to the conditioning theorem Theorem: Let X(t) be a Poisson process. Given that X(T ) = k, the conditional distribution of the time of the k occurrences O1, . . . , Ok is the same as the distribution of k ordered independent standard uniform variables U(1), U(2), . . . , U(k). ♣ In other word, given that there were k arrivals, the set of arrival times is the same as the locations of k darts thrown at random on the interval [0, t]. ♠ This tells us a way to simulate a Poisson process with rate λ on the interval (0, T ). 11 Stat 330 (Spring 2015): slide set 18 Simulating a Poisson Process • first, draw a Poisson value w from P oλT . ( This tells us, how many uniform values Ui we need to simulate ) • second, generate w many standard uniform values u1, . . . , uw • define oi = T ·u(i), where u(i) is the ith smallest value among u1, . . . , uw . ♥ The above theorem tells us, that, if we pick k values at random from an interval (0, t) and order them, we can assume that the distance between two successive values has an exponential distribution with rate λ = k/t. ♦ So far, we are looking only at arrivals of events. Besides that, we could, for example, look at the number of surfers that are on our web site at the same time. ♣ There, we have departures as well and, related to that, the time each surfer stays - which we will call service time (from the perspective of the web server). 12