IE 265 • Discrete Distributions – Part 2 • Hypergeometric distribution • Poisson distribution 2 Hypergeometric distribution • Suppose we have N items, k of which are labeled as • • • • success. Suppose n items are randomly selected from N without replacement. Let X be the # of successes in our sample. Then, X is a hypergeometric random variable. In general, k N k x n x p( x) , x 0,1,2,, min{n, k} N n X ~ Hypgeo( N , k , n) 3 Hypergeometric vs Binomial • If the selection is without replacement, then • trials are dependent and • we use hypergeometric distribution • If the selection is with replacement, then • trials are independent and • we use binomial distribution 4 Hypergeometric vs Binomial • r.v. X~Hypgeo(100,90,15) • success probability in trial 1: 90/100. • success probability in trial 2: 89/99 for S (in the previous trial), 90/99 for F (in the previous trial). • success probability in trial 3: 88/98 for SS, 89/98 for SF and FS, 90/98 for FF. • dependence of a trial on the previous trials • r.v. Y~Bin(15,90/100) • success probability: (constant) 90/100 for all trials. • independence of the trials 5 Hypergeometric distribution Mean and variance k E( X ) n N similar to binomial where E(X) = m = np k N n k Var ( X ) n 1 N N N 1 similar to binomial where Var ( X ) np(1 p), but has a correction factor for dependent trials 6 Hypergeometric distribution • Ex: A warehouse contains 10 printing machines, four of which are defective. Five machines are selected for purchase a) What is the probability that all five of these are nondefective? b) What is the probability that the third defective is found in the fifth trial? 7 Binomial Approximation to Hypergeometric • If the sampling fraction n / N is small (say < 0.1), then Hypgeo( N , k , n) ~ Bin(n, k / N ) . • For example, if we draw a sample of 9 parts from a batch of 1000, we can ignore dependence due to without replacement selection, and assume p is constant at k / N for every draw. 8 Poisson distribution • Poisson distribution is used to model occurence of rare events. • It is a limiting distribution for binomial distribution as 𝑛 → ∞ and 𝑝 → 0 such that np remains constant, i.e. we take the limit under the restriction that the mean of binomial distribution, np, remains constant at a value l. • Let l be the average # of occurences per unit time, e.g. average # of accidents at a particular highway intersection during a year (rare compared to all those vehicles crossing the intersection). 9 Poisson distribution • Suppose we divide the unit time interval (e.g. one year) into n small subintervals such that: 1. P(one occurence in a subinterval) = p 2. P(no occurence in a subinterval) = 1 - p 3. P(more than one occurence in a subinterval) • 0 1, if there is an occurence in subinterval i Let X i 0, otherwise n • X X i gives total # of occurences per unit time interval, i 0 e.g. # of accidents per year • Then, X ~ Bin(n, p) where E(X) = np = l and p = l/n 10 Poisson distribution • We want to investigate the behavior of X as n , and np remains constant, i.e. we wish to find n x lim P( X x) lim p (1 p ) n x n x p 0 np l where p l n p0 11 Poisson distribution x n! l l lim P( X x) lim 1 n (n x)! x! n n p 0 n x , since p l n np l n(n 1)(n 2) (n x 1) lx l lim 1 x n x! n n 1 2 x 1 1 1 1 n n n n (each term 1 as n ) n x 12 Poisson distribution lim P( X x) lim n p 0 np l l l 1 x! n x n l 1 n x 1 l l l 1 1 1 n n n (each term 1 as n ) lim Poisson pmf e l lx l x n 1 x! n lx x! , l n given lim 1 e l n n of them 13 Poisson distribution e l lx , x 0,1,2, • In general, p ( x) x! X ~ Poisson(l ) • If l is the average (mean) # of occurences per unit time, then p(x) is the probability of x occurences per unit time. • Validity of p(x): i. p( x) 0, x ii. 2 3 e l lx l l l e 1 l 1 x! 2! 3! x 0 el 14 Poisson distribution 0.4 Poisson(1) Poisson(4) Poisson(10) 0.2 0.1 1.0 0.0 0.8 0 2 4 6 8 10 12 14 16 18 20 x F (x ) p (x ) 0.3 0.6 0.4 0.2 0.0 0 2 4 6 8 10 12 14 16 18 20 x 15 Poisson distribution Mean and variance e l lx e l lx e l lx E( X ) x x x! x! x 0 x 1 x 1 ( x 1)! e l l x 1 lx 1 ( x 1)! ly y 0 y! e l l e l l el l Var (X ) l 16 Poisson distribution • Ex: Assume that, during the course of the semester, there are 1000 lectures in the engineering school. Each lecture has a probability 1/106 of having a stranger walk in. a) What is the probability that a stranger walks into one lecture in a semester? b) In an academic year (ignoring the summer school)? (see the next slide before working on this part) 17 Poisson distribution • What if 𝜆 is given as the average # of occurences per unit time, but 𝑋 is the # of occurences in 𝑡 consecutive time units? Then, 𝑋~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆𝑡). • Let random variables 𝑌𝑚 ~𝐵𝑖𝑛(𝑛, 𝑝) with large 𝑛 and small 𝑝 for each 𝑚. • Approximately, 𝑌𝑚 ~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝑛𝑝) where 𝑛𝑝 = 𝜆. • For 𝑡 = 2: • exact distribution: 𝑋 = 𝑌1 + 𝑌2 , 𝑋~𝐵𝑖𝑛(2𝑛, 𝑝) • approximate distribution: 𝑋 = 𝑌1 + 𝑌2 , 𝑋~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(2𝑛𝑝) 18 Poisson distribution • Ex: The # of accidents per day occuring on a highway is distributed as Poisson with a mean rate of three accidents per day a) What is the probability that three or more accidents occur today? b) What is the probability that at least three accidents occur in two days? c) What is the probability that three or more accidents will occur today given that at least one accident occured today? 19 Poisson distribution • Ex: What is the probability that a machine used in production will break down five times in a year, if its mean time between breakdowns is four months? 20 Poisson Approximation to Binomial • For small p, large n, and relatively constant l = np: Bin(n, p) ~ Poisson(l ) . • In general, the approximation works well when 𝑛 ≥ 100 and 𝑝 ≤ 0.1 (criteria vary a bit from reference to reference). • Ex: A process is known to produce 5% defectives. What is the probability that there will be more than 10 defectives in the next batch of 1000 parts? X: the # of defectives in the next batch of 1000 parts Since p = 0.05 is small and n = 1000 is large, binomial distribution can be approximated by Poisson with l = np = 50. e 50 50 x P( X 10) 1 x! x 0 10