Stat 330 (Spring 2015): slide set 10 2 P (3 ≤ Y ≤ 7) = P (Y ≤ 7)−P (Y ≤ 2) = 1−0.957 −(1−0.952) = 0.204 • the first job to time out is between the third and the seventh? P (Y < 3) = P (Y ≤ 2) = 1 − 0.952 = 0.0975 • Y is less than 3? P (Y = 3) = 0.9520.05 = 0.045 • the third job times out? Example 2. Watch the input queue at the alpha farm for a job that times out. The probability that a job times out is 0.05. Let Y be the index of the first job to time out, then Y ∼ Geo0.05. What’s then the probability that Geometric distribution Example 2 Last update: January 28, 2015 Stat 330 (Spring 2015) Slide set 10 Stat 330 (Spring 2015): slide set 10 Variance Var[X] = k−1 failures success! 1−p p2 Geometric distribution Example 2 (cont’d) 1 we expect the 20th job to be the first to time out = 20 p 1−p = 380 very spread out! p2 3 That is, X is memoryless: “does not remember that it counts up to i already”! If X ∼ Geop, then P (X ≥ i + j|X ≥ i) = P (X ≥ j) for i, j = 0, 1, 2, . . . Interesting property of the Geometric distribution V ar[Y ] = E[Y ] = Plugging in p = 0.05 in the above formulas gives us: What are the expected value for Y , what is V ar[Y ]? 1 Stat 330 (Spring 2015): slide set 10 How often is S executed on average? - What is E[X]? P (X = k) = pX (k) = 0.9k−1 · 0.1 Solution: Assume P (B = true) = 0.1 and let X be the number of times S is executed. Then, X has a geometric distribution with pmf: Repeat S until B Example 1: Examine the following programming statement: 3. The cdf is: FX (t) = P (X ≤ t) = 1 − (1 − p)t 2. Expectation E[X] = 1 p, p 1. The pmf is: pX (k) = P (X = k) = (1 − p)k−1 · Review: X =number of repetitions of the experiment until we have the first success in a Bernoulli experiment. Geometric distribution Stat 330 (Spring 2015): slide set 10 e−λ λx x! for x = 0, 1, 2, 3, . . . Stat 330 (Spring 2015): slide set 10 p(x) = x=0 ∞ e−λ y=0 λ y! y =λ x=1 (Note carefully that the average number of initiation for a two-day period is 20) Then we have The number of initiation per day X has a Poisson distribution with parameter λ = 10. ( The above assumes that account initiations is a rare event within the time period of one day because no two customers can open an account at the same time.) 6 The number of initiation in a two-day period Y has a Poisson distribution with parameter λ = 20 Part (a) What is the probability that more than 8 new accounts will be initiated today? This is a key step in solving Poisson distribution related problems. 7 Note that X and Y are random variables with different Poisson distributions because the events they represent occur during different time intervals. P (Y > 16) = 1 − P o20(16) = 1 − 0.221 = .779 Part (b) What is the probability that more than 16 new accounts will be initiated in two days? New Accounts Customers of an internet service provider initiate new accounts at the average rate of 10 accounts per day. Then we have P (X > 8) = 1 − P o10(8) = 1 − 0.333 = .667 Poisson distribution: Example 3.22 (Baron) (Cont’d Stat 330 (Spring 2015): slide set 10 λx−1 (x−1)! Stat 330 (Spring 2015): slide set 10 x=1 ∞ 5 = e−λλ 4 • Var[X] = . . . = λ (left as an exercise) = e−λλ ∞ λx (x−1)! ∞ λx λx = e−λ · = e−λeλ = 1 x! x! x=0 Expected Value and Variance of X ∼ P oλ are: ∞ ∞ −λ x • E[X] = x=0 x e x!λ = 0 + e−λ x=0 ∞ 2. Do all probabilities sum to 1? 1. Obviously, all values of p(x) ≥ 0 for x ≥ 0. Check that p(x) defined above is actually a probability mass function. How? Poisson pmf (cont’d) Poisson distribution: Example 3.22 (Baron) We denote the cdf by P oλ(t) λ is called the rate parameter. p(x) = Definition: The Poisson probability mass function (pmf) is defined as: Z = # of hits on a web page in a 24h period. Y = # of flaws on a standard size piece of manufactured product (e.g., 100m coaxial cable, 100 sq.meter plastic sheeting) X = # of alpha particles emitted from a polonium bar in an 8 minute period. Examples: Situation: The Poisson distribution follows from a certain set of assumptions about the occurrence of “rare” events in time or space. Poisson distribution Stat 330 (Spring 2015): slide set 10 Stat 330 (Spring 2015): slide set 10 100 0.991000.010 = 0.366. 0 Such a beautiful result requires very delicate mathematics. in distribution. Xn → X ∼ Poisson(λ) 10 Theorem: {Xn} is a sequence of random variables s.t. Xn ∼ Bin(Nn, pn) with Nn → ∞, pn → 0 and Nnpn → λ ∈ (0, ∞), then Rule of thumb: use Poisson approximation if n ≥ 20 and (at the same time) p ≤ 0.05. P (X = 0) = 1 − since (1 − x/n)n → ex. alternatively 2 1000 ≈ e−2 = 0.13534 1000 P (X = 0) = (1 − 0.002)1000 = 0.9981000 = 0.13506 The probability for no typo on a page is P (X = 0), i.e 11 Result (not a theorem): For large n, the Binomial distribution can be approximated by the Poisson distribution, where λ is taken as np: (np)k n k p (1 − p)n−k ≈ e−np k! k Poisson to approximate Binomial (example) Stat 330 (Spring 2015): slide set 10 Stat 330 (Spring 2015): slide set 10 Example: (Typos) Imagine you are supposed to proofread a paper. Let us assume that there are on average 2 typos on a page and a page has 1000 words. This gives a probability of 0.002 for each word to contain a typo. The number of typos on a page X is then a Binomial random variable, i.e. X ∼ B1000,0.002. 9 8 Ramification: For larger k, however, the binomial coefficient nk becomes hard to compute, and it is easier to use the Poisson distribution instead of the Binomial distribution. Poisson to approximate Binomial We need to obtain a value for λ! Approximation: On the other hand, a defective chip can be considered to be a rare event, since p is small (p = 0.01). So, approximate X as Poisson variable. P (X = 0) = Then Solution: Let X be the number of defective chips found in the box. Model X as a Binomial variable with distribution B100,0.01. Then e−110 = 0.3679. 0! We know that the expected value of X is λ. In this example, therefore, we take λ = 1. Example: A manufacturer of chips produces 1% defectives. What is the probability that in a box of 100 chips no defective is found? P (X = 0) = Note that we expect 100 · 0.01 = 1 chip out of the box to be defective. Poisson distribution: Example (cont’d) How do we choose λ in an example? - look at the expected value! Poisson distribution: Another Example Stat 330 (Spring 2015): slide set 10 P (X = 1) = 1000 · 2 999 2 1− ≈ 2 · e−2 = 0.27067! 1000 1000 e−λ λx x! P (X = 1) ≈ That is use P (X = x) = e−221 = 2 · e−2 = 0.27067 1! to calculate 12 So basically, we are calculating this probability using the Poisson pmf with λ = 1000 · 0.002 = 2 and The probability of one typo on a page is 1000 P (X = 1) = 0.002 · 0.998999 = 0.27067 1 P (X = 2) ≈ e−222 = 0.27067 2! 1000 (1 − 0.002)9980.0022 = 0.27094 2 alternatively, using X ≈ P o2 P (X = 2) = The probability for two typos on a page is P (X = 2), i.e 13 Stat 330 (Spring 2015): slide set 10 Poisson to approximate Binomial (example cont’d)