Unit 4 CHAPTER 17: PROBABILIT Y MODELS AP Statistics BERNOULLI TRIALS The basis for the probability models we will examine in this chapter is the Bernoulli (Ber-Noo-Lee) trial. We have Bernoulli trials if: there are two possible outcomes (success and failure). the probability of success, p, is constant. the trials are independent. THE GEOMETRIC MODEL A single Bernoulli trial is usually not all that interesting. A Geometric probability model tells us the probability for a random variable that counts the number of Bernoulli trials until the first success. Geometric models are completely specified by one parameter, p, the probability of success, and are denoted Geom(p). THE GEOMETRIC SETTING Each observation is in one of two categories: success or failure. The probability is the same for each observation. Observations are independent. (Knowing the result of one observation tells you nothing about the other observations.) The variable of interest is the number of trials required to obtain the first success. THE GEOMETRIC MODEL EXAMPLE A new sales gimmick is to sell bags of candy that have 30% of M&M’s covered with speckles. These “groovy” candies are mixed randomly with the normal candies as they are put into the bags for distribution and sale. You buy a bag and remove candies one at a time looking for the speckles. Is a geometric probability model appropriate here? THE GEOMETRIC MODEL EXAMPLE (CONT.) A new sales gimmick is to sell bags of candy that have 30% of M&M’s covered with speckles. These “groovy” candies are mixed randomly with the normal candies as they are put into the bags for distribution and sale. You buy a bag and remove candies one at a time looking for the speckles. What’s the probability that the first speckled one we see is the fourth candy we get? Note that the skills to answer this question come from the very first day of the probability unit. THE GEOMETRIC MODEL EXAMPLE (CONT.) What’s the probability that the first speckled one is the tenth one? Write a general formula. What’s the probability that the first speckled candy is one of the first three we look at? How many do we expect to have to check, on average, to find a speckled one? THE GEOMETRIC MODEL FORMULAS Geometric probability model for Bernoulli trials: Geom(p) p = probability of success q = 1 – p = probability of failure X = number of trials until the first success occurs P(X = x) = q x-1 p 1 E(X) p q p2 THE GEOMETRIC MODEL EXAMPLE 2 Postini is a global company specializing in communications security. The company monitors over 1 billion Internet messages per day and recently reported that 91% of emails are spam. Let’s assume that your emails are typical —91% spam. We’ll also assume that you aren’t using a spam filter, so every message goes to your inbox. And, since spam comes from many dif ferent sources, we’ll consider your messages to be independent. Overnight your inbox collects email. When you first check your email the next day, about how many spam emails should you expect to have to wade through and discard before you find a real message? What’s the probability that the 4 th message in your inbox is the first one that isn’t spam? INDEPENDENCE One of the important requirements for Bernoulli trials is that the trials be independent. When we don’t have an infinite population, the trials are not independent. But, there is a rule that allows us to pretend we have independent trials: The 10% condition: Bernoulli trials must be independent. If that assumption is violated, it is still okay to proceed as long as the sample is smaller than 10% of the population. THE GEOMETRIC MODEL EXAMPLE 3 People with O-negative blood are “universal donors.” Only about 6% of people have O-negative blood. 1. If donors line up at random for a blood drive, how many do you expect to examine before you find someone who has O -negative blood? 2. What’s the probability that the first O-negative donor found is one of the four people in line? GEOMETRIC PROBABILITIES USING CALCULATOR 2 nd DISTR geometpdf( Note the pdf for Probability Density Function Used to find any individual outcome Format: geometpdf(p,x) 2 nd DISTR geometcdf( Note the cdf for Cumulative Density Function Used to find the first success on or before the x th trial Format: geometcdf(p,x) Try the last example using the calculator! Much easier… PERMUTATIONS VS. COMBINATIONS Permutations: When r items are selected from n available items (without replacement). Therefore, the order matters. n! n pr (n r )! Calculate the following permutations: 10 p3 7 p2 p3 24 p3 15 PERMUTATIONS VS. COMBINATIONS (CONT.) Example: Forty-three sprinters race in a 5K. How many ways can they finish first, second, and third? PERMUTATIONS VS. COMBINATIONS (CONT.) You are picking 3 different flavors to put on your banana split. You can choose from 25 different flavors. How many ways can this be done? Does the order matter here? PERMUTATIONS VS. COMBINATIONS (CONT.) Combination Rule: When order does not matter, and we want to calculate the number of ways (combinations) r items can be selected from n different items. n! n Cr (n r )! r! RECAP: When different orderings of the same items are counted separately, we have a permutation problem, but when different orderings of the same items are not counted separately, we have a combination problem. Calculate the following combinations: 16 C4 12 C2 C3 25 C5 10 PERMUTATIONS VS. COMBINATIONS (CONT.) Example: You are picking 3 different flavors to put on your banana split. You can choose from 25 different flavors. How many ways can this be done? Example: You want to buy three different CDs from a selection of 5 CDs. How many ways can you make your selection? THE BINOMIAL MODEL Day 2 The geometric model counts the number of trials before the first success. A Binomial model tells us the probability for a random variable that counts the number of successes in a fixed number of Bernoulli trials. Two parameters define the Binomial model: n, the number of trials; and, p, the probability of success. We denote this Binom(n, p). THE BINOMIAL MODEL (CONT.) In n trials, there are n! n Ck k ! n k ! ways to have k successes. Read n C k as “n choose k.” Note: n! = n (n – 1) … 2 1, and we’re not overly excited about n n! is read as “n factorial.” THE BINOMIAL MODEL (CONT.) n p q X = = = = Binomial probability model for Bernoulli trials: Binom(n,p) number of trials probability of success 1 – p = probability of failure # of successes in n trials P(X = x) = nC x p x q n–x np npq BINOMIAL MODEL EXAMPLE Recap: The communications monitoring company has reported that 91% of e -mail messages are spam. Suppose your inbox contains 25 messages. What are the mean and standard deviation of the number of real messages you should expect to find in your inbox? What is the probability that you will find only 1 or 2 real messages? BINOMIAL PROBABILIT Y ON CALCULATOR 2 nd DISTR binompdf( Note the pdf for Probability Density Function Used to find any individual outcome Format: binompdf(n,p,x) 2 nd DISTR binomcdf( Note the cdf for Cumulative Density Function Used for getting x or fewer successes among n trials Format: binomcdf(n,p,x) Note: if you wanted to find up to a #, use the complement rule. All possible probabilities in the model will add up to 1. BINOMIAL MODEL EXAMPLE 2 20 donors come to a blood drive. Recall that 6% of people are “universal donors.” What are the mean and standard deviation of the number of universal donors among them? What is the probability that there are 2 or 3 universal donors? THE NORMAL MODEL TO THE RESCUE! When dealing with a large number of trials in a Binomial situation, making direct calculations of the probabilities becomes tedious (or outright impossible). Fortunately, the Normal model comes to the rescue… THE NORMAL MODEL TO THE RESCUE! (CONT.) As long as the Success/Failure Condition holds, we can use the Normal model to approximate Binomial probabilities. Success/failure condition: A Binomial model is approximately Normal if we expect at least 10 successes and 10 failures: np ≥ 10 and nq ≥ 10 NORMAL MODEL EXAMPLE Recall the communications monitoring company Postini has reported that 91% of email messages are spam. Recently, you installed a spam filter. You observe that over the past week it okayed only 151 of 1422 emails you received, classifying the rest as junk. Should you worry the filtering is too aggressive? What’s the probability that no more than 151 of 1422 emails is a real message? CONTINUOUS RANDOM VARIABLES When we use the Normal model to approximate the Binomial model, we are using a continuous random variable to approximate a discrete random variable. So, when we use the Normal model, we no longer calculate the probability that the random variable equals a particular value, but only that it lies between two values. WHAT CAN GO WRONG? Be sure you have Bernoulli trials. You need two outcomes per trial, a constant probability of success, and independence. Remember that the 10% Condition provides a reasonable substitute for independence. Don’t confuse Geometric and Binomial models. Don’t use the Normal approximation with small n. You need at least 10 successes and 10 failures to use the Normal approximation. RECAP Bernoulli trials show up in lots of places. Depending on the random variable of interest, we might be dealing with a Geometric model Binomial model Normal model RECAP (CONT.) Geometric model When we’re interested in the number of Bernoulli trials until the next success. Binomial model When we’re interested in the number of successes in a certain number of Bernoulli trials. Normal model To approximate a Binomial model when we expect at least 10 successes and 10 failures. ASSIGNMENTS: PP. 401 – 404 Day 1: # 1, 3, 9 – 15 ODD Day 2: # 2, 5, 10, 12, 17, 19, 21, 29, 32 Day 3: # 14 – 22 EVEN, 23, 25, 27, 37