Some Common Discrete Random Variables Binomial Random Variables Binomial experiment • A sequence of n trials (called Bernoulli trials), each of which results in either a “success” or a “failure”. • The trials are independent and so the probability of success, p, remains the same for each trial. • Define a random variable Y as the number of successes observed during the n trials. • What is the probability p(y), for y = 0, 1, …, n ? • How many successes may we expect? E(Y) = ? Returning Students • Suppose the retention rate for a school indicates the probability a freshman returns for their sophmore year is 0.65. Among 12 randomly selected freshman, what is the probability 8 of them return to school next year? Each student either returns or doesn’t. Think of each selected student as a trial, so n = 12. If we consider “student returns” to be a success, then p = 0.65. 12 trials, 8 successes • To find the probability of this event, consider the probability for just one sample point in the event. • For example, the probability the first 8 students return and the last 4 don’t. • Since independent, we just multiply the probabilities: P(( S , S , S , S , S , S , S , S , F , F , F , F )) P( R1 R2 P( R1 ) P( R2 ) R8 R9 R10 R11 R12 ) P( R8 ) P( R9 ) (0.65)8 (1 0.65) 4 P( R12 ) 12 trials, 8 successes • For the probability of this event, we sum the probabilities for each sample point in the event. • How many sample points are in this event? • How many ways can 8 successes and 4 failures occur? 12 8 4 4 12 8 C C , or simply C • Each of these sample points has the same probability. • Hence, summing these probabilities yields P(8 successes in n trials) = C812 (0.65)8 (0.35) 4 0.237 Binomial Probability Function • A random variable has a binomial distribution with parameters n and p if its probability function is given by p( y ) C yn p y (1 p) n y Rats! • In a research study, rats are injected with a drug. The probability that a rat will die from the drug before the experiment is over is 0.16. Ten rats are injected with the drug. What is the probability that at least 8 will survive? Would you be surprised if at least 5 died during the experiment? Quality Control • For parts machined by a particular lathe, on average, 95% of the parts are within the acceptable tolerance. • If 20 parts are checked, what is the probability that at least 18 are acceptable? • If 20 parts are checked, what is the probability that at most 18 are acceptable? Binomial Theorem • As we saw in our Discrete class, the Binomial Theorem allows us to expand n ( p q)n C yn p y q n y y 0 • As a result, summing the binomial probabilities, where q = 1- p is the probability of a failure, n n y n y n P ( Y y ) C p (1 p ) ( p (1 p )) 1 y y y 0 Mean and Variance • If Y is a binomial random variable with parameters n and p, the expected value and variance for Y are given by E(Y ) n p and V (Y ) n p(1 p) Rats! • In a research study, rats are injected with a drug. The probability that a rat will die from the drug before the experiment is over is 0.16. Ten rats are injected with the drug. • How many of the rats are expected to survive? • Find the variance for the number of survivors. Geometric Random Variables Your • • • • st 1 Success Similar to the binomial experiment, we consider: A sequence of independent Bernoulli trials. The probability of “success” equals p on each trial. Define a random variable Y as the number of the trial on which the 1st success occurs. (Stop the trials after the first success occurs.) • What is the probability p(y), for y = 1,2, … ? • On which trial is the first success expected? S = success • Consider the values of Y: y = 1: (S) (S) S y = 2: (F, S) (F, S) y = 3: (F, F, S) S y = 4: (F, F, F, S) F (F, F, S) S and so on… F (F, F, F, S) p(1) = p S F p(2) = (q)( p) p(3) = (q2)( p) …. 3 p(4) = (q )( p) Geometric Probability Function • A random variable has a geometric distribution with parameter p if its probability function is given by p( y) q y 1 p where q 1 p, for y 1,2,... Success? • Of course, you need to be clear on what you consider a “success”. • For example, the 1st success might mean finding the 1st defective item! (D) D (G, D) D G (G, G, D) D G G Geometric Mean, Variance • If Y is a geometric random variable with parameter p the expected value and variance for Y are given by 1 1 p E (Y ) and V (Y ) 2 p p At least ‘a’ trials? (#3.55) • For a geometric random variable and a > 0, show P(Y > a) = qa • Consider P(Y > a) = 1 – P(Y < a) = 1 – p(1 + q + q2 + …+ qa-1) = qa , based on the sum of a geometric series “Memoryless Property” • For the geometric distribution P(Y > a + b | Y > a ) = qb = P(Y > b) • “at least 5 more trials?” We note P(Y > 7 | Y > 2 ) = q5 = P(Y > 5). That is, “knowing the first two trials were failures, the probability a success won’t occur on the next 5 trials” is identical to… “just starting the trials and a success won’t occur on the first 5 trials” Negative Binomial Distribution • Again, considering a independent Bernoulli trials with probability of “success” p on each trial… • Instead of watching for the 1st success, let Y be the number of the trial on which the rth success occurs. (Stop the trials after the rth success occurs.) • For a given value r, the probability p(y) is p( y) Cy1,r1 pr (1 p) yr , y r, r 1,... Negative Binomial • To determine the probability the 4th success occurs on the 7th trial, we compute p(7) C6,3 p4 (1 p)3 • Note this is actually just the binomial probability of 3 successes during the first 6 trials, followed by one more success: p (7) C6,3 p 3 (1 p )3 p “a success on 4th last trial” Negative Binomial • For the negative binomial distribution, we have r r (1 p) E (Y ) and V (Y ) 2 p p • For example, if a success occurs 10% of the time (i.e., p = 0.1), then to find the 4th success, we expect to require 40 trials on average. 4 E (Y ) 40 0.1 Intuitively, wouldn’t you expect 40 trials? Poisson Random Variables Number of occurrences • Let Y represent the number of occurrences of an event in an interval of size s. • Here we may be referring to an interval of time, distance, space, etc. • For example, we may be interested in the number of customers Y arriving during a given time interval. • We call Y a Poisson random variable. Poisson R. V. • A random variable has a Poisson distribution with parameter l if its probability function is given by p( y) y l l e y! where y = 0, 1, 2, … We’ll see that l is the “average rate” at which the events occur. That is, E(Y) = l . Queries • If the number of database queries processed by a computer in a time interval is a Poisson random variable with an average of 6 queries per minute, find the probability that 4 queries occur in a one minute interval. 64 e 6 p (4) 0.13385 4! Fewer Queries • As before, for the Poisson random variable with an average of 6 queries per minute… • find the probability there are less than 6 queries in a one minute interval: P(Y 6) P(Y 5) poissoncdf (6,5) 0.44568 Some PoissonVariables • Number of incoming telephone calls to a switchboard within a given time interval; • Number of errors (incorrect bits) received by a modem during a given time interval; • Number of chocolate chips in one of Dr. Vestal’s chocolate chip cookies; • Number of claims processed by a particular insurance company on a single day; • Number of white blood cells in a drop of blood; • Number of dead deer along a mile of highway. Poisson mean, variance • If Y is a Poisson random variable with parameter l, the expected value and variance for Y are given by E(Y ) l and V (Y ) l Hypergeometric Random Variables Sampling without replacement • When sampling with replacement, each trial remains independent. For example,… • If balls are replaced, P(red ball on 2nd draw) = P(red ball on 2nd draw | first ball was red). • If balls not replaced, then given the first ball is red, there is less chance of a red ball on the 2nd draw. Though for a large population of balls, the effect may be minimal. n trials, y red balls • Suppose there are r red balls, and N – r other balls. • Consider Y, the number of red balls in n selections, where now the trials may be dependent. (for sampling without replacement, when sample size is significant relative to the population) • The probability y of the n selected balls are red is p( y ) r y N r n y N n CC C Hypergeometric R. V. • A random variable has a hypergeometric distribution with parameters N, n, and r if its probability function is given by p( y ) r y N r n y N n CC C where 0 < y < min( n, r ). Hypergeometric mean, variance • If Y is a hypergeometric random variable with parameter p the expected value and variance for Y are given by nr n r N r N n E (Y ) and V (Y ) N N N N 1 Sample of 20 Suppose among a supply of 5000 parts produced during a given week, there are 100 that don’t meet the required quality standard. Twenty of the parts are randomly selected and checked to see if they meet the standard. Let Y be the number in the sample that don’t meet the standard. a). Compute the probability exactly 2 of the sampled parts fail to meet the quality standard. b). Determine the mean, E(Y).