Slide set 14 Stat 330 (Spring 2015) Last update: February 3, 2015 Stat 330 (Spring 2015): slide set 14 Gamma Example (Baron 4.7) Compilation of a computer program consists of 3 blocks that are processed sequentially, one after the other. Each block takes Exponential time with mean of 5 minutes, independently of other blocks. (a) Compute the expectation and variance of the total compilation time. For a Gamma random variable T with α = 3 and λ = 1/5, E(T ) = 3 1/5 = 15 (min) and V ar(T ) = 3 (1/5)2 = 75 (min) (b) Compute the probability for the entire program to be compiled in less than 12 minutes. This can be done using repeated integration by parts (see Baron p.87). 1 Stat 330 (Spring 2015): slide set 14 Gamma Example (Cont’d) However we will use the Gamma-Poisson formula: For T ∼ Gam(α, λ) and X ∼ P o(λt), P (T > t) = P (X < α) and P (T ≤ t) = P (X ≥ α) Need P (T < 12) where T ∼ Gam(3, 1/5). Note that t = 12 so X ∼ P o(12/5) i.e., X ∼ P o(2.4). From the Gamma-Poisson formula, P (T < 12) = P (T ≤ 12) = P (X ≥ 3) = 1 − P o2.4(2) = 1 − 0.5697 2 Stat 330 (Spring 2015): slide set 14 Erlang distribution Hits on a web page Recall: we modeled waiting times until the first hit as Exp(2). How long do we have to wait for the second hit? To calculate waiting time for the second hit, we add the waiting times until the first hit and the time between the first and the second hit. Let Y1=the waiting time until the first hit. Then Y1 ∼ Exponential with λ=2 Let Y2, the time between first and second hit. By the memoryless property of the exponential distribution, Y2 has the same distribution as waiting time for the first hit. That is Y2 ∼ Exponential with λ = 2 We want the total time for the second hit X := Y1 + Y2. This is the sum of two independent exponential random variables. 3 Stat 330 (Spring 2015): slide set 14 Erlang distribution (cont’d) If Y1, . . . , Yk are k independent exponential random variables with parameter λ, their sum X has an Erlang distribution: X := k X Yi i=1 is Erlang(k,λ). The Erlang density fk,λ is λk f (x) = xk−1e−λx (k − 1)! for x ≥ 0 where k is called the stage parameter, λ is the rate parameter. Note this is the same density as that of Gamma(k, λ) with α = k is an integer. 4 Stat 330 (Spring 2015): slide set 14 Erlang distribution (cont’d) Expected value and variance of an Erlang distributed variable X can be computed using the properties of expected value and variance for sums of independent random variables: k k X X 1 E[X] = E[ Yi] = E[Yi] = k · λ i=1 i=1 k k X X 1 V ar[X] = V ar[ Yi] = V ar[Yi] = k · 2 λ i=1 i=1 Alternatively, we can use the formulas for the expectation and variance of Gamma(k, λ) random variable. This so because Erlang(k, λ) distribution is the same as a Gamma(k, λ) distribution where k is an integer. 5 Stat 330 (Spring 2015): slide set 14 Erlang distribution (cont’d) Thus, in order to compute the distribution function, we can use the GammaPoisson formula. We need the cdf of the Erlang random variable X denoted by Erlangk,λ(x) Erlangk,λ(t) = P (X ≤ t) In order to use the Gamma-Poisson formula, we now consider distribution of X as X ∼ Gamma(k, λ) From Gamma-Poisson formula, P (X ≤ t) = P (Y ≥ k) where Y ∼ P o(λt) Thus Erlangk,λ(t) = P (Y ≥ k) = 1 − P (Y ≤ k − 1) = 1 − P oλt(k − 1) 6 Stat 330 (Spring 2015): slide set 14 Erlang distribution: Example Hits on a web page (continued) 1. What is the density of the waiting time until the second hit? We previously defined X , as the sum of two exponential variables, each with rate λ = 2. Thus X has an Erlang distribution with stage parameter 2; thus, the density of X is fX (x) = fk,λ(x) = 4xe−2x for x ≥ 0 2. Find the probability that we have to wait > 1 min for the 3rd hit. Z := waiting time until the third hit has an Erlang(3,2) distribution. Thus P (Z > 1) = 1 − Erlang3,2(1) = 1 − (1 − P o2·1(3 − 1)) = P o2(2) = 0.677 We will come across the Erlang distribution again, when modelling the waiting times in queueing systems, where customers arrive with a Poisson rate and need exponential time to be served. 7 Stat 330 (Spring 2015): slide set 14 Normal distribution The normal density is a “bell-shaped” density. parameters: µ and σ 2 and is fµ,σ2 (x) = √ 1 2πσ 2 e (x−µ)2 − 2σ 2 The density has two for − ∞ < x < ∞ The expected value and variance of a normal distributed r.v. X are: Z ∞ xfµ.σ2 (x)dx = . . . = µ E[X] = −∞ ∞ Z V ar[X] = (x − µ)2fµ.σ2 (x)dx = . . . = σ 2. −∞ Thus, the parameters µ and σ 2 are actually the mean and the variance of the N (µ, σ 2) distribution. 8 Stat 330 (Spring 2015): slide set 14 Normal densities for several parameters µ determines the location of the peak on the x−axis, σ 2 determines the “width” of the bell. 9 Stat 330 (Spring 2015): slide set 14 Normal distribution (cont’d) The cumulative distribution function (cdf) of X is Rt Nµ,σ2 (t) := Fµ,σ2 (t) = −∞ fµ,σ2 (x)dx Unfortunately, there does not exist a closed form for this integral. However, to get probabilities means we need to evaluate this integral. Fortunately, tables of the cdf of standard normal distribution N (0, 1),the normal distribution that has mean 0 and a variance of 1, are available. We can use these tables to compute the cdf of the normal distribution N (µ, σ 2) for any set of values of µ and σ. How? We use the fact that X ∼ N (µ, σ 2) can be standardized to obtain a Z random variable Z ∼ N (0, 1) as follows: Z= X −µ σ 10 Stat 330 (Spring 2015): slide set 14 Standard Normal distribution IF X ∼ N (µ, σ 2) then Z = X−µ σ ∼ N (0, 1). no Thus E[Z] = σ1 (E[X] − µ) = 0 V ar[Z] = 1 V σ2 ar[X] = 1 It is common practice to denote the cdf N0,1(t) by Φ(t) (more commonly represented as Φ(z)) The values of the Φ(z) are tabulated in tables usually called stanadard normal tables (or Z tables); however, these tables are (sometimes) only available for positive values of z This table is sufficient because, Φ(−z) = 1 − Φ(z) as f0,1 is symmetric around 0. 11 Stat 330 (Spring 2015): slide set 14 Standard Normal distribution (cont’d) Recall, the area to the left of the graph up to a specified vertical line at z represents the probability P (Z < z) It’s easy to see, that the area in the lkeft tail is equal to the area in the right tail: P (Z ≤ −z) = P (Z ≥ +z). This is true because P (Z ≥ +z) = 1 − P (Z ≤ z), which proves the above statement. 12 Stat 330 (Spring 2015): slide set 14 Using the Z-table Suppose Z is a standard normal random variable. straight look-up • P (Z < 1) = Φ(1) • P (0 < Z < 1) = P (Z < 1) − P (Z < 0) = Φ(1) − Φ(0) = 0.8413 − 0.5 = 0.3413. • P (Z < −2.31) = 1 − Φ(2.31) = 1 − 0.9896 = 0.0104. or P (Z < −2.31) = Φ(−2.31) = 0.0104. • P (|Z| > 2) = P (Z < −2) + P (Z > 2) = 2(1 − Φ(2)) = 2(1 − 0.9772) = 0.0456. look-up or P (|Z| > 2) = P (Z < −2) + P (Z > 2) = 2Φ(−2) = 2 × 0.0228 = 0.0456. = 0.8413. look-up look-up look-up 13 Stat 330 (Spring 2015): slide set 14 Using the Z-table (cont’d) Suppose, X ∼ N (1, 2) and that we need to calculate P (1 < X < 2) A standardization of X gives Z := X−1 √ . 2 Thus: 1−1 X −1 2−1 √ < √ < √ P (1 < X < 2) = P = 2 2 2 √ = P (0 < Z < 0.5 2) = Φ(0.71) − Φ(0) = 0.7611 − 0.5 = 0.2611. Note that the standard normal table only shows probabilities for z < 3.99. This is all we need, though, since P (Z ≥ 4) ≤ 0.0001. Review Examples 4.10, 4.11, and 4.12 from Baron 14