c Math 141, Spring 2014, Benjamin Aurispa 8.1 Distributions of Random Variables A random variable is a rule that assigns a number to each outcome of an experiment. We usually denote a random variable by X. There are 3 types of random variables: finite discrete, infinite discrete, and continuous. We will define each of them below. We can find probability distributions of random variables the same way we did with experiments. List all the values that are possible for X and calculate their probabilities using any method we know. Example: A coin is tossed three times and the sequence of heads and tails is observed. • List the outcomes of the experiment. • Let the random variable X denote the number of tails that occur. What are the possible values of X? • Find the probability distribution of X. • What is P (1 ≤ X ≤ 2)? • What is P (X > 0)? The above random variable is called finite discrete because it takes on only finitely many values. 1 c Math 141, Spring 2014, Benjamin Aurispa Consider the experiment of rolling 2 fair 5-sided dice. We know that the sample space of this experiment is (1, 1), (2, 1), (1, 2), (2, 2), S= (3, 1), (3, 2), (4, 1), (4, 2), (5, 1), (5, 2), (1, 3), (2, 3), (3, 3), (4, 3), (5, 3), (1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (1, 5), (2, 5), (3, 5), (4, 5), (5, 5) Let X be the sum of the numbers rolled. • What are the possible values that X can take on? Since X can only be a finite number of values, X is finite discrete. • Find the probability distribution of X. Example: A carton contains a dozen eggs, of which 5 are cracked. An experiment consists of randomly selecting 3 eggs from the carton. Let X be the number of cracked eggs that are selected. • Find the probability distribution for X. • What is the probability that more than 1 cracked egg will be selected? 2 c Math 141, Spring 2014, Benjamin Aurispa Example: A survey was conducted of families to determine the distribution of families by size. The results are: Family Size Number of Families 2 29 3 16 4 24 5 11 6 8 Let the random variable X be the number of people in a randomly chosen family. Find the probability distribution for X. Example: A die is rolled. Let the random variable X denote the number of times the die is rolled until a 5 appears. What are the values that X may assume? This random variable is called infinite discrete because there are an infinite number of possible values and they can be arranged in a sequence of separated numbers. (We can list them all in a certain order even though there are infinitely many of them.) Example: Suppose I have a random variable X = The distance a person travels to work in miles. What values may X take on? This random variable is called continuous because there are infinitely many values that make up an interval of numbers. Continuous random variables often deal with time, distance, and measurements. 3 c Math 141, Spring 2014, Benjamin Aurispa Examples: List the possible values of X and determine whether the random variable is finite discrete, infinite discrete, or continuous. 1. Let X be the number of times a 3 is rolled in 5 rolls of a die. 2. Marbles are pulled from a jar with 3 red, 2 blue, and 3 green marbles one at a time with replacement until a blue marble is drawn. Let X be the number of number of pulls needed. 3. Four marbles are pulled one at a time with replacement from a jar with 3 red, 2 blue, and 3 green marbles. Let X be the number of times a blue marble is pulled. 4. Same experiment as above but without replacement. 5. A marble is pulled from a jar with 3 red, 2 blue, and 3 green marbles without replacement until a green marble is drawn. Let X be the number of number of pulls needed. 6. Marbles are pulled one at a time without replacement from a jar with 3 red, 2 blue, and 3 green marbles until all the green marbles are drawn. Let X be the number of number of pulls needed. 7. Let X be the amount of time in minutes a student works on homework on a given day. 4 c Math 141, Spring 2014, Benjamin Aurispa A histogram is a graphical way of displaying the probability distribution of a random variable. Plot the values of the random variable along the x-axis. Then draw a rectangle of width 1 centered above each value with height equal to the probability of the random variable attaining this value. (So, each rectangle has area equal to the probability.) Remember that probability distribution values must add up to 1, which means the heights of all the rectangles (and also their areas) must add up to 1. Draw a histogram for the probability distribution below. Shade the area representing P (0 < X ≤ 3) and calculate its value. X Probability 0 0.2 1 0.15 2 0.3 3 0.1 4 0.05 5 0.2 8.2 Expected Value and Other Measures of Central Tendency The average, or mean, of a set of numbers x1 , x2 , . . . , xn , denoted by x, or µ is µ=x= x1 + x2 + . . . + xn n The median of a set of numbers is the middle number when they are arranged in increasing order. If there are an even number of entries, there is no middle number, so we let the median be the average of the middle two numbers. The mode of a set of numbers is the number that occurs most frequently. What are the mean, median, and mode of the set of numbers 1, 4, 4, 5, 9, 9, 9? What are the mean, median, and mode of the set of numbers 1, 1, 4, 5, 5, 9? 5 c Math 141, Spring 2014, Benjamin Aurispa You can find mean and median on your calculator as follows: (Note: The mode is not given on the calculator.) • Press STAT, 1:Edit; Enter in the list of values under L1. • Press STAT, cursor right to CALC, and then press 1: 1-Var Stats • Then, you need to add L1, (2nd-1), so that the home screen reads “1-Var Stats L1”. • Press enter. x is the mean or average. Scroll down to find the median, which is Med. Example: The daily average temperatures were recorded for the city of College Station during a span of two weeks in the month of January. The results were: (I made these up.) 70, 67, 75, 77, 77, 69, 68, 68, 75, 68, 42, 55, 65, 69 Find the mean, median, and mode for this data. If data values are listed with frequencies, put the data values under L1 and their frequencies under L2. Then, type “1-Var Stats L1, L2” and press enter. By default, if you just type “1-Var Stats,” then the calculator assumes you mean “1-Var Stats L1.” Example: A certain puzzle manufacturer makes and sells puzzles. The data below shows how many puzzles this company makes with different numbers of pieces. Number of Pieces Number of Puzzles 500 400 1000 200 750 350 2000 150 300 275 1500 300 3000 75 What is the average number of pieces in a puzzle made by this company? What are the median and mode for the number of pieces in a puzzle? In the language of probability, we use the term expected value instead of average or mean, although the concepts are fairly similar. With expected value, we are essentially finding the “weighted average” of a random variable X where each value is weighted by its probability. The expected value of a random variable X, denoted E(X), is given by E(X) = x1 p1 + x2 p2 + . . . + xn pn where x1 , x2 , . . . , xn are the values that X may assume, and p1 , p2 , . . . , pn are the probabilities of each of these values. However, you can find expected value on the calculator by using the exact same method above where the values of X are typed in L1 and their probabilities are typed in L2. 6 c Math 141, Spring 2014, Benjamin Aurispa Example: Find the expected value (by hand and using the calculator) of a random variable X given the histogram below. Also find the mode(s). 0 X 1 2 3 4 5 Probability 4/16 3/16 2/16 1/16 0 1 2 3 4 5 The expected value of a random variable X can be thought of graphically as the value along the x-axis where the histogram of X would be balanced. The mode is the x-value that has the tallest rectangle. The median is not as intuitive visually in a histogram, but in general it is where the area of the histogram is cut in half. Two cards are selected at random from a standard deck of cards. What is the expected number of hearts? Expected values are often used in games to determine whether the game is “fair.” In a game situation, we let X be the net winnings (profit) of the player. A game is considered “fair” when the expected net winnings are 0, i.e., when E(X) = 0. Ben is playing a game at a carnival. The game costs $1. Ben selects a card from a standard deck. If the card is an ace, then he wins $3. If the card is a face card, he wins $2. Find the expected net winnings. 7 c Math 141, Spring 2014, Benjamin Aurispa Example: A raffle is held. 2000 people buy a ticket for $3 each. First prize is $1500. Second prize is $750. Then, four $100 consolation prizes will be given. What are the expected net winnings for one person who buys a $3 ticket. A game consists of rolling a pair of fair 6-sided dice. The game costs $4 to play. If you roll the same number on both dice (a double), you win $a. Otherwise, you get nothing. What value of a would make this game fair? Example: A man purchased a $25,000 life insurance policy from his employer for $200/yr. The probability that he lives another year is 0.9935. What is the life insurance company’s expected gain? If the probability that the man lives drops to 0.98, what is the minimum amount of money, $a, he can expect to pay for his policy? 8 c Math 141, Spring 2014, Benjamin Aurispa If you are given that the odds in favor of an event E occurring are a to b, then the probability of E occurring is a P (E) = a+b Example: The odds of a certain horse winning the Kentucky Derby are 5 to 3. What is the probability that this horse will win? What is the probability that the horse will NOT win? If you are given probability and asked to find odds, first convert the probability to a reduced fraction Then, the odds in favor of the event occurring are m to n − m. m n. Example: The probability that it will rain Friday is 0.6. The probability it rains Saturday is 0.35. The probability it rains both days is 0.2. • What are the odds that it will rain on Saturday? • What are the odds that it rains at least one of the two days? • What are the odds it rains only on Saturday? 9 c Math 141, Spring 2014, Benjamin Aurispa 8.3 Variance and Standard Deviation The mean, or expected value, of a random variable measured the central tendency of the variable (where the histogram would be balanced). However, this gives us no indication of how spread out the distribution is. The variance of a random variable is a measure of how spread out the distribution is. If the variance is large, then the distribution has a larger spread about the mean. If the variance is small, then the distribution is more clustered around the mean. Another measure of spread is the standard deviation of a random variable. Again, the larger the standard deviation, the more spread out the distribution is from the mean. 1 2 3 4 5 1 2 3 4 5 Notation: Standard deviation is often written as the Greek letter sigma: σ; Variance is written as V ar(X). The standard deviation of a random variable is defined to be the square root of the variance. σ= q V ar(X) which means that σ 2 = V ar(X) Standard deviation has the same units as the values of the random variable, whereas variance would have these units squared. (So if X is in cm, then standard deviation is in cm, whereas variance would be in cm2 .) Standard deviation is also found using 1-Var Stats. Standard deviation is the 5th entry down, σx. Then, just square the standard deviation to find variance. Example: Find the variance and standard deviation of the random variable X, whose probability distribution is given. X Probability −4 0.1 −2 0.2 0 0.3 2 0.1 4 0.3 Example: The number of jelly beans in 150 bags of jelly beans were counted. The results are given below. Number of Bags Number of Jelly Beans 30 25 40 32 65 28 15 34 Find the mean, standard deviation, and variance for the number of jelly beans in a bag. 10 c Math 141, Spring 2014, Benjamin Aurispa 8.4 The Binomial Distribution Experiments with exactly two outcomes are called Bernoulli trials or binomial trials. A sequence of these binomial trials is called a binomial experiment. Its important to recognize when an experiment is binomial because there is a quick way to calculate binomial probabilities. Properties of Binomial Experiments: 1. The number of trials in the experiment is fixed. 2. There are two outcomes in the experiment: “success” and “failure.” (Note: Defining “success” and “failure” varies according to each problem and what we are observing.) 3. The probability of success in each trial is the same. 4. The trials are independent of each other. For example, if I toss a coin, a “success” could be getting a head, and then a “failure” would be getting a tail. The probability of success in this example would be 12 . Another example: If an experiment consists of rolling a die and observing if a 6 is rolled, I could say that I have a “success” if I roll a 6, in which case a “failure” would be rolling anything other than a 6. So, the probability of a success in this example would be 16 . Examples: Determine whether the following experiments are binomial experiments. If not, why not? • Rolling a die until a 5 is rolled. • Rolling a die 3 times and observing the number rolled each time. • Selecting 7 cards one at a time with replacement out of a standard deck and observing if each is black or red. • Selecting 4 cards one at a time without replacement and observing if each card is black or red. • Pulling 5 marbles one at a time with replacement from a jar consisting of 6 red, 5 blue, and 4 yellow marbles, and observing if the marble is red or not each time. A binomial distribution is a distribution of a random variable X, where X is the number of successes in a binomial experiment. X is called a binomial random variable and it is finite discrete. 11 c Math 141, Spring 2014, Benjamin Aurispa To find a binomial probability, we only need to know 3 things: n = total number of trials, p = probability of success, r = number of desired successes Then, the probability of getting exactly r successes in n trials, P (X = r), is: P (X = r) = C(n, r)pr q n−r where n, p, and r are as defined above, and q is the probability of failure. Since the probability of success is p, the probability of failure, q, is 1 − p. To find the probability of exactly r successes in a binomial experiment, we can use the binompdf command on the calculator instead of the formula. 1. Press 2nd VARS (which is DISTR). 2. Choose binompdf. 3. Then, the probability of exactly r successes in n trials is: P (X = r) = binompdf (n, p, r) Example: The probability a basketball player makes a free throw is 0.72. If he shoots 12 free throws in a game, what is the probability he will make exactly 7 of them? What if we want to find the probability that he makes at most 7 free throws? We could add up P (X = 0) + P (X = 1) + · · · + P (X = 7), but there is another command on the calculator that will do this for us: binomcdf. This command is found on the same menu as binompdf and is option binomcdf. In general, the probability of at most r success in n trials is: P (X ≤ r) = binomcdf (n, p, r) 12 c Math 141, Spring 2014, Benjamin Aurispa What if I want to find the probability that he makes at least 7 free throws, P (X ≥ 7)? There is no specific command on the calculator for this, but we can use the idea of complements to solve using the binomcdf command. More Examples: 1. A quiz consists of 10 multiple choice questions. Each question has five possible answers. In order to pass the quiz, a student must get at least 6 correct. If a student guesses at each question, what is the probability that they will pass the quiz? 2. Suppose that 30% of the restaurants in a certain part of a town are open past 10pm. If 5 restaurants are randomly selected, what is the probability that (a) Exactly 1 or exactly 2 of the restaurants are open past 10pm? (b) No more than 3 of the restaurants are open past 10pm? 13 c Math 141, Spring 2014, Benjamin Aurispa 3. A fair 6-sided die is cast 9 times. What is the probability of rolling a 4 more than 5 times? 4. Suppose that 6 out of 10 people who take a certain medication do not experience side effects. What is the probability that among 200 people who are taking the medication... (a) Fewer than 75 have side effects? (b) More than 80 but fewer than 100 have side effects? 5. The probability that a DVD player produced by a company is defective is estimated to be 0.09. If a sample of 14 DVD players is selected at random, what is the probability that the sample contains between 2 and 10 defectives inclusive, that is, at least two but no more than 10. 14 c Math 141, Spring 2014, Benjamin Aurispa Binomial Statistics For a binomial distribution, we can use the following formulas to find the expected value (mean), standard deviation, and variance. µ = E(X) = np √ σ = npq V ar(X) = σ 2 = npq Reminder: q is the probability of a failure: q = 1 − p. Example: At a certain university the probability that an entering freshman will graduate within 4 yr is 0.6. From an incoming class of 2000 freshmen, find • The expected number of students who will graduate within 4 yr. • The standard deviation and variance of the number of students who will graduate within 4 yr. 15