Hypergeometric Random Variables Sampling without replacement • When sampling with replacement, each trial remains independent. For example,… • If balls are replaced, P(red ball on 2nd draw) = P(red ball on 2nd draw | first ball was red). • If balls not replaced, then given the first ball is red, there is less chance of a red ball on the 2nd draw. Though for a large population of balls, the effect may be minimal. n trials, y red balls • Suppose there are r red balls, and N – r other balls. • Consider Y, the number of red balls in n selections, where now the trials may be dependent. (for sampling without replacement, when sample size is significant relative to the population) • The probability y of the n selected balls are red is p( y ) r y N r n y N n CC C Hypergeometric R. V. • A random variable has a hypergeometric distribution with parameters N, n, and r if its probability function is given by p( y ) r y N r n y N n CC C where 0 < y < min( n, r ). Hypergeometric mean, variance • If Y is a hypergeometric random variable with parameter p the expected value and variance for Y are given by nr n r N r N n E (Y ) and V (Y ) N N N N 1 ( Proof not as easy as previous distributions and is not given at this time. ) Sounds like… • If we let p = r/N and q = 1- p = (N - r)/N, then the hypergeometric measures nr E (Y ) = np and N n r N r N n N n V (Y ) npq N N N 1 N 1 Look quite similar to the expressions for the binomial distribution, E(Y) = np and V(Y) = npq. Rule of Thumb • For cases when n / N < 0.05, it may be reasonable to approximate the hypergeometric probabilities using a binomial distribution. • Suppose each hour, 1000 bottles are filled by a machine and on average 10% are “underfilled”. • Each hour 20 of the bottles are randomly selected. Find probability at least 3 of the 20 are underfilled. • Since 20/1000 = 0.02, perhaps we could use the binomial distribution to approximate the answer. Easy binomial probability? • Let p = 0.10, the “success of underfilling” • P( at least 3 underfilled ) = • 1 – P( 0, 1, or 2 underfilled) = 1 – [ P(Y = 0) + P(Y = 1) + P(Y = 2)] • Approximately equal to 1 – binomialcdf(20, 0.10, 2) = 0.32307 how close is this to actual hypergeometric? A hypergeometric probability • P( at least 3 underfilled ) = • 1 – P( 0, 1, or 2 underfilled) = 1 – [ P(Y = 0) + P(Y = 1) + P(Y = 2)] 100 900 0 20 1000 20 C C 1 C 100 900 1 19 1000 20 C C C 100 900 2 18 1000 20 C C C 0.3228 As compared to 0.32307 using a binomial approx. The Binomial Approximation The hypergeometric distribution …and a very similar binomial distribution As population increases • Let N get large as n and p=r/N remain constant, and we would see that lim N r y N r n y N n C C C C p q n y y n y Hypergeometric probabilities converge to the binomial probabilities, as the events become “almost independent”. Proof ?