Slide set 10 Stat 330 (Spring 2015) Last update: January 28, 2015 Stat 330 (Spring 2015): slide set 10 Geometric distribution Review: X =number of repetitions of the experiment until we have the first success in a Bernoulli experiment. p 1. The pmf is: pX (k) = P (X = k) = (1 − p)k−1 · |{z} | {z } k−1 failures 2. Expectation E[X] = p1 , Variance Var[X] = success! 1−p p2 3. The cdf is: FX (t) = P (X ≤ t) = 1 − (1 − p)btc Example 1: Examine the following programming statement: Repeat S until B Solution: Assume P (B = true) = 0.1 and let X be the number of times S is executed. Then, X has a geometric distribution with pmf: P (X = k) = pX (k) = 0.9k−1 · 0.1 How often is S executed on average? - What is E[X]? 1 Stat 330 (Spring 2015): slide set 10 Geometric distribution Example 2 Example 2. Watch the input queue at the alpha farm for a job that times out. The probability that a job times out is 0.05. Let Y be the index of the first job to time out, then Y ∼ Geo0.05. What’s then the probability that • the third job times out? P (Y = 3) = 0.9520.05 = 0.045 • Y is less than 3? P (Y < 3) = P (Y ≤ 2) = 1 − 0.952 = 0.0975 • the first job to time out is between the third and the seventh? P (3 ≤ Y ≤ 7) = P (Y ≤ 7)−P (Y ≤ 2) = 1−0.957 −(1−0.952) = 0.204 2 Stat 330 (Spring 2015): slide set 10 Geometric distribution Example 2 (cont’d) What are the expected value for Y , what is V ar[Y ]? Plugging in p = 0.05 in the above formulas gives us: E[Y ] = V ar[Y ] = 1 = 20 we expect the 20th job to be the first to time out p 1−p = 380 very spread out! 2 p Interesting property of the Geometric distribution If X ∼ Geop, then P (X ≥ i + j|X ≥ i) = P (X ≥ j) for i, j = 0, 1, 2, . . . That is, X is memoryless: “does not remember that it counts up to i already”! 3 Stat 330 (Spring 2015): slide set 10 Poisson distribution Situation: The Poisson distribution follows from a certain set of assumptions about the occurrence of “rare” events in time or space. Examples: X = # of alpha particles emitted from a polonium bar in an 8 minute period. Y = # of flaws on a standard size piece of manufactured product (e.g., 100m coaxial cable, 100 sq.meter plastic sheeting) Z = # of hits on a web page in a 24h period. Definition: The Poisson probability mass function (pmf) is defined as: p(x) = e−λ λx x! for x = 0, 1, 2, 3, . . . λ is called the rate parameter. We denote the cdf by P oλ(t) 4 Stat 330 (Spring 2015): slide set 10 Poisson pmf (cont’d) Check that p(x) defined above is actually a probability mass function. How? 1. Obviously, all values of p(x) ≥ 0 for x ≥ 0. 2. Do all probabilities sum to 1? ∞ x x X λ λ = e−λeλ = 1 p(x) = e−λ = e−λ · x! x! x=0 x=0 x=0 ∞ X ∞ X Expected Value and Variance of X ∼ P oλ are: ∞ P P∞ e−λλx −λ • E[X] = x=0 x x! = 0 + e x=1 =e −λ λ ∞ P y=0 λy y! λx (x−1)! =e −λ λ ∞ P x=1 λx−1 (x−1)! =λ • Var[X] = . . . = λ (left as an exercise) 5 Stat 330 (Spring 2015): slide set 10 Poisson distribution: Example 3.22 (Baron) New Accounts Customers of an internet service provider initiate new accounts at the average rate of 10 accounts per day. Part (a) What is the probability that more than 8 new accounts will be initiated today? The number of initiation per day X has a Poisson distribution with parameter λ = 10. ( The above assumes that account initiations is a rare event within the time period of one day because no two customers can open an account at the same time.) Then we have P (X > 8) = 1 − P o10(8) = 1 − 0.333 = .667 6 Stat 330 (Spring 2015): slide set 10 Poisson distribution: Example 3.22 (Baron) (Cont’d Part (b) What is the probability that more than 16 new accounts will be initiated in two days? The number of initiation in a two-day period Y has a Poisson distribution with parameter λ = 20 (Note carefully that the average number of initiation for a two-day period is 20) Then we have P (Y > 16) = 1 − P o20(16) = 1 − 0.221 = .779 Note that X and Y are random variables with different Poisson distributions because the events they represent occur during different time intervals. This is a key step in solving Poisson distribution related problems. 7 Stat 330 (Spring 2015): slide set 10 Poisson distribution: Another Example How do we choose λ in an example? - look at the expected value! Example: A manufacturer of chips produces 1% defectives. What is the probability that in a box of 100 chips no defective is found? Solution: Let X be the number of defective chips found in the box. Model X as a Binomial variable with distribution B100,0.01. Then P (X = 0) = 100 0.991000.010 = 0.366. 0 Approximation: On the other hand, a defective chip can be considered to be a rare event, since p is small (p = 0.01). So, approximate X as Poisson variable. We need to obtain a value for λ! 8 Stat 330 (Spring 2015): slide set 10 Poisson distribution: Example (cont’d) Note that we expect 100 · 0.01 = 1 chip out of the box to be defective. We know that the expected value of X is λ. In this example, therefore, we take λ = 1. Then e−110 = 0.3679. P (X = 0) = 0! n k Ramification: For larger k, however, the binomial coefficient becomes hard to compute, and it is easier to use the Poisson distribution instead of the Binomial distribution. 9 Stat 330 (Spring 2015): slide set 10 Poisson to approximate Binomial Result (not a theorem): For large n, the Binomial distribution can be approximated by the Poisson distribution, where λ is taken as np: k n k n−k −np (np) p (1 − p) ≈e k k! Rule of thumb: use Poisson approximation if n ≥ 20 and (at the same time) p ≤ 0.05. Theorem: {Xn} is a sequence of random variables s.t. Xn ∼ Bin(Nn, pn) with Nn → ∞, pn → 0 and Nnpn → λ ∈ (0, ∞), then Xn → X ∼ Poisson(λ) in distribution. Such a beautiful result requires very delicate mathematics. 10 Stat 330 (Spring 2015): slide set 10 Poisson to approximate Binomial (example) Example: (Typos) Imagine you are supposed to proofread a paper. Let us assume that there are on average 2 typos on a page and a page has 1000 words. This gives a probability of 0.002 for each word to contain a typo. The number of typos on a page X is then a Binomial random variable, i.e. X ∼ B1000,0.002. The probability for no typo on a page is P (X = 0), i.e P (X = 0) = (1 − 0.002)1000 = 0.9981000 = 0.13506 alternatively 2 1000 ≈ e−2 = 0.13534 P (X = 0) = 1 − 1000 since (1 − x/n)n → ex. 11 Stat 330 (Spring 2015): slide set 10 The probability of one typo on a page is 1000 P (X = 1) = 0.002 · 0.998999 = 0.27067 1 and 2 999 2 1− P (X = 1) = 1000 · ≈ 2 · e−2 = 0.27067! 1000 1000 So basically, we are calculating this probability using the Poisson pmf with λ = 1000 · 0.002 = 2 That is use P (X = x) = e−λ λx x! to calculate e−221 P (X = 1) ≈ = 2 · e−2 = 0.27067 1! 12 Stat 330 (Spring 2015): slide set 10 Poisson to approximate Binomial (example cont’d) The probability for two typos on a page is P (X = 2), i.e P (X = 2) = 1000 (1 − 0.002)9980.0022 = 0.27094 2 alternatively, using X ≈ P o2 e−222 = 0.27067 P (X = 2) ≈ 2! 13