Chapter 16: Random Variables A ___________________________ is a numerical variable whose value depends on the outcome of a chance experiment. A random variable is ____________ if its set of possible values is a collection of isolated points on the number line. A random variable is _______________ if its set of possible values includes an entire interval on the number line. Example: Shoe size vs. foot length Example: age vs. number of birthdays A ________________________________, or ___________________________, of a discrete random variable X lists the possible values of X and their probabilities. This is usually done in a table, probability histogram, or formula. Let X = number of heads in two flips of a coin. Then, the probability distribution of X is: table: histogram: The probabilities in a distribution give information about the _______________ behavior of the variable. That is, probabilities are __________________________________. For example, if P(X = 2) = .25, then after many observations of X, the value X = 2 will occur about 25% of the time, on average. The ______ of a random variable X, denoted x , describes where the probability distribution of X is centered. Let X = number of license plates on a randomly selected car. The probability distribution is: X 0 1 2 P(X) .05 .25 .7 Thus, to find the mean of a discrete random variable X, multiply each X value times its probability and add them all together. x x p ( x) The term ______________________, E(x) is sometimes used in place of mean value, x . The expected (mean) value will not necessarily be equal to one of the values in the distribution. Instead, we interpret the expected value as a long term average. For example, if we were to select many cars, we would expect there to be 1.65 license plates per car, on average. Suppose I ask you to play a betting game with me. You bet $5 and if you draw an ace of hearts, I will pay you $100. If you get any other ace, I will pay you $10 and if you get any other heart, I will give you your money back. If you get any other card, I keep your money. 1. Create a probability model for the amount you will win. 2. Should you play this game? HW #1: Read 368-370 Problems 16.1-16.7 (Note: answer to 5a is wrong) Chapter 16: SD of a Random Variable Now that we can find out where the distribution of a random variable is centered, we would also like to know how far the values are from the center, on average. This, of course, is the ___________________ Using the license plate distribution from yesterday, if we had a sample of 100 cars, we would expect 5 0’s, 25 1’s, and 70 2’s. Note: We divide by n instead of n-1 and use the symbol x instead of s x since we are not calculating the standard deviation of a sample. There is no uncertainty about the values in the distribution or their probabilities. Note: It is calculated in much the same way as the SD of a data set, except that we weight each value with its probability, since the values that show up more often should have more influence in the calculation. Thus, the standard deviation of a discrete random variable is: x x P x 2 x The ______________ of a discrete random variable is the square of the standard deviation: 2 x2 x x P x Find and interpret the SD for the betting game from yesterday’s notes. In a certain class, there are 15 girls and 10 boys. Suppose that you are randomly selecting 2 students for a committee. Create a probability model for the number of girls in the sample. Then, calculate and interpret the mean and SD of the number of girls. HW #2: Read 370-372, Problems 16.15-21 odd Chapter 16: Transforming + Combining RV’s Now that we know how to find the mean and standard deviation of a random variable, we would also like to know the mean and standard deviation of a function of that random variable. If x = the length of a taxi ride in miles, suppose that x = 5.2 miles and x = 2.8 miles. Two taxi companies have different fare-schedules: company 1 charges $2.50 per mile and company 2 charges $2 per mile plus an initial fee of $5. Thus, if we are interested in the total fare of a taxi ride (y), the functions are: How can we find the mean and standard deviation for y in each case? Recall from chapter 4 that adding a constant to every value in a data set changes the measures of position (mean, median, min, max, Q1, Q3, etc.) BUT NOT the measures of spread (standard deviation, range, IQR). However, multiplying every value in a data set by a constant changes both the measures of position AND the measures of spread. In addition to paying the fare, the passenger usually gives the driver a tip (t). Assume that t = 1.50, t = .70, and that the tip does not depend on the fare (that is, t and y are independent). What are the mean and standard deviation of the new variable c, the total cost of a trip, including tip? The mean and variance of a combination of random variables: If x and y are random variables with means x and y and standard deviations x and y , then: x y x y and x y x y Note: the mean formulas work regardless of whether x and y are independent. x2 y x2 y2 and x2 y x2 y2 thus x y x2 y2 and x y x2 y2 Note: the variance (SD) formulas work ONLY when x and y are independent. Note: Always remember that the variances add, not the standard deviations Note: Even when we are subtracting the random variables, we add their variances. More variables always means more variability! Note: All 4 of these rules work for combining more than 2 random variables as well. Find c and c for company 2. HW #3: Read 373-375, Problems 16.4, 18, 23, 24, 27, 31 Chapter 16: Combining Normal Random Variables Any linear combination of normally distributed random variables is also normally distributed. If x and y are normally distributed then (x + y), (x – y), (2x + 3y), etc. are also normally distributed. If men’s weights (in pounds) are N(170, 40), women’s weights are N(135, 30), and a man and woman are selected at random, what is the probability that the sum of their weights is at least 300 pounds? What assumptions must you make to do this problem? What is the probability that the woman is heavier than the man? If you were to select 3 men at random, what is the probability that the sum of their weights is at most 500 pounds? HW #4: Read 376-380 Problems 16. 33-39 odd Chapter 16: Calculator Hints Suppose the probability model for grades on the AP Stats test (x) at LSHS is given below: x 1 2 3 4 5 P(x) .03 .04 .24 .43 .26 Find the mean and SD of x: Find the mean and SD for the sum of 3 randomly selected scores: Suppose that the weight of a candy bar has a mean of 4 ounces with a standard deviation of 0.15 ounces. Also, the King size version of the candy bar has a weight of 7.5 ounces with a SD of 0.4 ounces. Assuming both distributions are approximately normal, what is the probability that a randomly selected King size bar weighs more than 2 randomly selected regular bars? HW #5: Problems: 16.2, 10, 22, 28, 34, 36 Review chapter 16 HW #6: Problems: 16. 15, 25, 29, 32, 38 Chapter 17: The Geometric Model There are few other commonly used probability models besides the Normal model. Two of these, the Binomial and Geometric, are based on __________________________. Bernoulli trials have 3 important characteristics: 1. There are only two possible outcomes on each trial (success or failure) 2. The probability of success, p, is the same on every trial. 3. The trials are independent. That is, knowing the results of previous trials doesn’t change the probability of future trials. Note: whenever we sample without replacement, this condition is violated. However, as long as the sample is not more than 10% of the population, the changes in the probabilities are too small to worry about. Some examples of Bernoulli trials include: The _________________________ tells us the probability that it takes a specific number of Bernoulli trials to achieve a success. The geometric random variable is always: x = the number of trials required to get one success The model is completely determined by one parameter (p) and is denoted: G(p). For example, suppose I am rolling a die until I get a “5”. Is this Geometric? What is the probability that I get a “5” on my first roll? What is the probability that I get my first “5” on the 2nd roll What is the probability that I get my first “5” on the 3rd roll What is the probability that I get my first “5” on the 4th roll? What is the probability that I get my first “5” on the kth roll? For geometric models in general, P(X = x) = q x1 p where q = 1 – p = probability of failure 1 Expected Value: p Standard Deviation: q p2 Suppose that 30% of LSHS students have a driver’s license. On average, how many students will you need to randomly select to find one with a DL? What is the probability it will take less students than expected? HW #7 Read: 386-390, Problems: 17.1, 7, 9, 11 Chapter 17: The Binomial Model Instead of asking how many trials it will take to achieve one success, we can also ask how many successes we expect to see in a fixed number of Bernoulli trials. The __________________________ tells us the probability that we will achieve a specific number of successes in a given number of trials. The Binomial RV is always: x = number of successes The model is determined by two parameters: the number of trials (n) and the probability of success (p) and is denoted: B(n, p). Suppose that 70% of people in the mall will make a purchase and 3 mall visitors are randomly selected. Use a tree diagram to identify the 8 possible outcomes and their probabilities. What is the probability that exactly 2 of the 3 shoppers will make a purchase? Write out the complete probability model: X P(X) 0 1 2 3 This problem wasn’t too difficult, but what if the question asked about the probability that exactly 200 of 300 shoppers made a purchase? There would be way too many outcomes to display in a tree diagram. Is there an easier way? Notice the patterns below: P X 0 1.7 .3 0 P X 1 3 .7 .3 1 3 2 P X 2 3 .7 .3 2 P X 3 1.7 .3 3 1 0 If x ~ B(n, p) and x = # of successes: n x n x P X x p q where q = 1 - p x n n! Note: = “n choose x” = n Cx (TI83:Math:Prb) = nth row in Pascal’s triangle x ! n x ! x Suppose you rolled a die 4 times and let x = number of sixes. Find the probability distribution of x and the probability that you get 2 or more sixes. HW #8: Read: 390-393, Problems: 17.2, 8, 10, 13, 19 Chapter 17: More on the Binomial Model Suppose that you are a telemarketer and historically only 15% of people you call will remain on the phone for at least 1 minute. What is the probability that 1 of the next 5 people you call will remain on the phone for more than 1 minute? Using the TI-83 to do binomial calculations: Shortcuts on TI83: Distribution Menu (DISTR = 2nd VARS) Assume x ~ B(5, .15) Dist: binompdf(n, p, k) gives the probability that there will be k successes in n trials. ex: P(x = 1) = binompdf(5, .15, 1) = Note: pdf stands for probability density function Dist: binompdf(n, p) gives the probability of each possible value of k (0, 1, …, n) ex: binompdf(5, .15) = Dist: binomcdf(n, p, k) gives the probability that the number of successes is k ex: P(x ≤ 1) = binomcdf(5, .15, 1) = Note: cdf stands for cumulative probability density function (cumulative means added up) Dist: binomcdf(n, p) gives the cumulative probabilities for each possible value of k ( 0, 1, …, n) ex: binomcdf(5, .15) = If I roll a die 30 times, what is the probability that I get at least 5 sixes? What is the probability I get at least 2 sixes and at most 7 sixes? The mean and standard deviation of a binomial random variable: If x is a binomially distributed random variable with n trials and probability of success p, then np and npq where q = 1- p Suppose that 75% of all customers at a gas station choose 87 octane gas. If you observe the next 50 customers, a. How many customers do you expect to choose 87 octane gas? b. What is the standard deviation of the number of customers that choose 87 octane gas? c. What is the probability that the number of customers who choose 87 octane gas is within 2 standard deviations of the mean? HW #9: Problems 17.14-17 Chapter 17: The Normal Model to the Rescue Suppose we were to roll a die one million times. What is the probability that we would get more than 167000 sixes? In many cases, it is useful to approximate a discrete distribution with a continuous distribution. In particular, we often use the normal distribution to approximate the binomial distribution. Historically, this is because we didn’t have calculators with binomcdf. Even now, there are some problems even too formidable for our TI’s! Fortunately, the normal distribution approximates the binomial quite well in certain circumstances. It seems like this approximation works best when p is close to 0.5 and n is large. Success/Failure Condition : If x ~ B(n, p) and np 10 and nq 10, then x ~ N np, npq . That is, it is OK to use the normal approximation whenever the expected number of successes and failures are both at least 10. Why np > 10? If we want to use the normal approximation when p is close to 0, we should be able to go at least 3 SD below the mean and still be in the domain of x (that is, we shouldn’t be in negative territory). 3 0 np 3 npq 0 np 3 npq n 2 p 2 9 npq np 9q np 9 (since q is close to 1) Therefore, to be safe, np 10. Use the normal approximation to answer the original question. Even though our calculator can handle any reasonable sample size, we need to use the normal approximation to the binomial when we do the inference procedures for proportions in a few chapters. Suppose that 51% of US residents are women. In a random sample of size 1000, what is the probability that fewer than 500 women are selected? In a sample of size 10, what is the probability that fewer than 5 women are selected? HW #10: Read 394-397, Problems 17.20, 21, 23, 24 Chapter 17 Review HW #11: Problems 17.22, 25, 27, 29