AP Statistics Chapter 6 Random Variables What is a random variable? • A random variable is a variable whose value is a numerical outcome of a random phenomenon. EXAMPLE If we toss four coins, how would we record the results? We could record it as a string of tails and heads like “HTTH” or “HTHH”. This is not a random variable because it has no numerical value to work with. Instead, we may elect to record the number of heads in the four tosses. This would make our sample space 0, 1 , 2, 3, 4 … all numerical outcomes. 2 Discrete vs. Continuous Variables • A discrete random variable has a countable number of possible values. • A continuous random variable can take any possible value over an interval. EXAMPLES The number of heads in four coin tosses. A number generated by a spinner that covers the numbers between 0 and 1. 3 Discrete Random Variables • The probability distribution of a discrete variable lists the values and their probabilities. Value X X1 X2 X3 P(X) p1 satisfyptwo p3 2 • The probabilities must requirements: … Xk … pk • Every probability is between 0 and 1. • p 1 + p2 + … + pk = 1 • Find the probability of any event by adding the individual probabilities that make up that event. 4 EXAMPLE Determine the probability distribution of the discrete random variable X that counts the number of heads in four coin tosses. We can do this if we make two reasonable assumptions: 1. The coin is balanced, so each toss is equally likely to give an H or T. 2. The coin has no memory, so each toss is independent. Since each outcome is equally likely, what is the probability of each combination? 5 Continued… The number X represents the number of heads in four tosses. These values are NOT equally likely. Use this information to complete your probability distribution. What is the probability of getting 2 or more heads? 6 Means and Variances • The mean of a set of observations is: x • The mean of a random variable X is also an average of the possible values of X. • This average must take in to account that some values of X may occur more frequently than others. • We can handle this adjustment by multiplying each outcome by its probability. Value X X1 X2 X3 … Xk P(X) p1 p2 p3 … pk X x1 p1 x 2 p 2 ... x k p k X xi p i 7 EXAMPLE According to Benford’s Law, the distribution of the first digit V in a set of legitimate business records is: First Digit V: 1 2 3 P(V) 0.301 0.176 4 0.125 0.097 5 6 7 8 9 0.079 0.067 0.058 0.051 0.046 Use this information to compute the expected value of any randomly selected first digit. (expected value = mean) The mean of V is: V 1(0.301) 2(0.176) 3(0.125) 4(0.097 ) 5(0.079) 6(0.067 ) 7 (0.058) 8(0.051) 9(0.046) V 3.441 8 EXAMPLE Continued… • While the mean of 3.441 is not a possible outcome of V, it still gives us an idea of where we can expect most values to occur. • If each digit was truly random, we would have a uniform distribution. • What would the mean be in this case? • Notice how this compares to the distribution of Benford’s Law. 9 Variance • In a set of discrete values, the variance is based off of how much each value “varies” from the expected amount. • In the case of a random variable’s distribution, we must account for the differences in frequency among outcomes. Value X X1 X2 X3 … Xk P(X) p1 p2 p3 … pk X x1 X 2 p1 x 2 X 2 2 p 2 ... x k X X xi X p i • …and the standard deviation is the square root of the variance. 2 2 pk 2 10 EXAMPLE Gain Communications sells aircraft communication units to both military and civilian markets. Gain uses the modern practice of using probability estimates to estimate sales for the upcoming year. The military division of the company estimates its sales as follows: Units Sold (X) P(X) 1000 0.1 3000 0.3 5000 0.4 10,000 0.2 Calculate the expected number of sales and the standard deviation. 11 HOMEWORK Complete the problems: pg. 353 (#1 – 16). This assignment will be due for completion at the start of the next session of class. Continuous Random Variables • As mentioned before, continuous random variables deal with an infinite number of possible outcomes over a pre-determined interval. • Since there are an infinite number of possibilities, the probability of any individual occurrence is practically zero. • Suppose we wanted to make a probability distribution for an event like, • What would be the theoretical probability assigned to 0.47? 0 .3 x 0 .7 P (0.47 ) 1 0 .0 13 Density Curves • In order to assign probabilities to events we can use density curves to describe a distribution. • The horizontal axis of the density curve will represent all of the occurrences and its height over each occurrence will represent its frequency. • The area under the curve over an interval will represent the probability of an event within that interval occurring. • The total area under the curve will equal 1. 14 EXAMPLE Let’s revisit the spinner that generates a random number between 0 and 1. What would be the probability of generating a number X between 0.3 and 0.7 ? P (0.3 X 0.7 ) 15 EXAMPLE Continued • Since each number on the spinner has an equal chance of being generated, we will call this a uniform distribution. • The area under the curve is 1. Since this is uniform, the curve will be rectangular in shape. • The probability of getting a value between 0.3 and 0.7 will be the area between those two values. P (0.3 X 0.7 ) 0 .4 16 Taking it further… • With the same example in mind, what would be the following: P ( X 0.5) 0 .5 P ( X 0.8) 0 .2 P ( X 0.5 or X 0.8) 0 .7 Is there a difference between P(X>8) and P(X>8)? 17 The Normal Distribution • We have discussed a density curve in prior chapters. It was the NORMAL CURVE. • The normal distribution is considered a probability distribution. • Recall that N(μ, σ) is our shorthand way of referring to the normal distribution having a mean of μ and a standard deviation of σ. • To standardize our values and use our normal distribution table, we must use a z-score. Z X 18 EXAMPLE An opinion poll ask an SRS of 1500 American adults what the biggest issue facing schools was. Based on the sample data, 30% of the adults said drugs. We will learn how to analyze this later, but for now, we will say that this is an estimate of the population with a distribution mean of 0.3 and a standard deviation of 0.0118. In other words… N (0.3, 0.0118) What is the probability that the result differs from the truth by more than two percentage points? In other words… P ( p 0.28 or p 0.32) Hint: Start off by “standardizing” the data. 19 EXAMPLE Continued… 0.28 0.3 P ( p 0.28) P Z 0.0118 0.32 0.3 P ( p 0.32) P Z 0.0118 P Z 1.69 P Z 1 .6 9 0 .0 4 5 5 0 .0 4 5 5 P ( p 0 .2 8 o r p 0 .3 2 ) 0 .0 9 1 0 20 HOMEWORK Complete the problems: pg. 355 (#17 – 30). This assignment will be due for completion at the start of the next session of class. Rules for Means • If the values of a random variable, X, are increased or decreased by addition or subtraction, then the mean value of X is also increased in the same manner. • If the values of a random variable, X, are increased or decreased by multiplication, then the mean value of X is also increased in the same manner. • In other words, a bX a b x 22 Rules for Means • If we have two random variables, X and Y, then the sum of those two variables will have a mean that is equal to the sum of their individual means. • In other words, X Y X Y 23 EXAMPLE Gain Communications sells aircraft communication units to both military and civilian markets. Gain uses the modern practice of using probability estimates to estimate sales for the upcoming year. The military division of the company estimates its sales as follows: Units Sold (X) 1000 P(X) 3000 0.1 0.3 5000 0.4 10,000 0.2 The civilian division of the company estimates its sales as Units Sold (Y) 300 500 750 follows: P(Y) 0.4 0.5 0.1 Compute the mean sales of each. 24 EXAMPLE • Gain makes a profit of $2000 on each military unit and $3500 on each civilian unit that is sold. • The mean military sales profit is: 2000 X $2000(5000) $10, 000, 000 • The mean civilian sales profit is: 3500 Y $3500(445) $1, 557, 500 • The total profit, Z, is the sum of all sales profits. • The mean value of Z would be: Z $ 2 0 0 0 X $ 3 5 0 0Y Z 2000 X 3500 Y 25 Rules for Variance • We can apply similar rules to the variances of random variables. • In order to do this, we must know if there the two random variables are independent of one another. • This would mean that there was a correlation of ZERO between them. • If there is a correlation between them, we must account for that correlation when we try to combine variances. • It should also be noted that we are working with variances here and not standard deviations. 26 Rules for Variance • If X is a random variable and a and b are fixed 2 2 2 numbers, then: a bX b X • Notice that addition to X does not affect the variation. Only multiplication does. • If X and Y are random variables with complete independence (no correlation): 2 2 2 X Y X Y 2 X Y 2 X Y 2 27 EXAMPLE A college uses SAT scores as one criterion for admission. Experience has shown that the distribution of SAT scores among its entire population of applicants is: SAT Math Score (X) SAT Verbal Score (Y) μx = 625 μY = 590 σx = 90 σY = 100 What are the mean and standard deviation of the total score X + Y among students applying to this college? X Y 1 8 1 0 0 1 3 4 .5 4 X Y 1215 NOTE: This is based on the assumption that the scores are independent, which many may argue that they are not. 28 EXAMPLE • A large auto dealership keeps track of sales and lease agreements made during each hour of the day. Let X = the number of cars sold, and let Y = the number of cars leased during the first hour of a randomly selected Friday. • Based on previous records, the distributions of X and Y are: Sold X 0 1 2 3 Leased Y 0 1 2 p 0.3 0.4 0.2 0.1 p 0.4 0.5 0.1 29 CONTINUED… • Find the mean and standard deviation of both X and Y. X 1.1 Y 0.7 0.64 0.943 • Nowlet’s X define the total numberY of deals as T. (T = X + Y) • Find and interpret the mean of T. • Now compute the standard deviation of T. T X Y 1 .8 30 CONTINUED • Remember that you must deal with variances instead of standard deviations. T 0.943 0.64 2 2 2 1 .2 9 8 8 T 1.14 • The dealership’s manager receives a $500 bonus for each car sold and a $300 bonus for each car leased. Find the mean and standard deviation of the manager’s total bonus. B 500(1.1) 300(0.7) B $760 500 (0.943) 300 (0.64 ) $ 5 0 9 .0 9 2 2 2 2 31 Check Your Understanding Complete the Check Your Understanding problem on the top of pg. 372. We will discuss the answers in a moment. 32 HOMEWORK Complete the problems: pg. 378 (#37 – 51). This assignment will be due for completion at the start of the next session of class. The Binomial Setting • We have a binomial situation when the following things are in place: 1. Each observation will fall in to one of two categories, usually considered “success” or “failure”. 2. There is a fixed number of observations, “n”. 3. All of the n observations are independent. 4. The probability of success is the same for each observation. 34 Binomial Distributions • In a binomial setting, the random variable X is equal to the number of successes. • The probability distribution of X in this case is considered a binomial distribution. • The parameters of the distribution are n and p. • n represents the number of observations • p is the probability of success on any observation. • As an abbreviation, we say that X is B(n, p). 35 EXAMPLE Blood type is a trait that is passed through heredity. If both parents carry the genes for both O and A blood types, there is a probability of 0.25 of having a child with Type O blood. If these parents have 5 children, how many children would have Type O blood? This is a binomial distribution B(5, 0.25). Deal 10 cards and let X be the count of the number of red cards. This would not be a binomial distribution because each occurrence is not independent. 36 Computing Binomial Probabilities • If X has the binomial distribution with n observations, having a probability of p for success on each, then the possible values of X are 0, 1, 2, …, n. If k is any of these values, n! k nk X tok )find the probability of pk number (1 ofpsuccesses ) This formula can P be(applied k !( n k ) ! in the situation described. 37 EXAMPLE A quality engineer selects an SRS of 10 switches from a large shipment for detailed inspection. Unknown to the engineer, 10% of the switches in the shipment fail to meet the specifications. What is the probability that exactly 1 of the ten switches in the sample will fail inspection? This is a distribution defined as B(10, .1). In this situation, k = 1. 10 ! P ( beXtheprobability 1) of the engineer finding (.1)1 or(.9) 0 .3switches? 874 What would fewer defective 1 9 1!(10 1) ! 0 .7 3 6 1 38 EXAMPLE 2 Each child of a particular pair of parents has a probability 0.25 of having type O blood. If they have 5 children, what is the probability that exactly 3 of the children have type o blood? P ( X 3) 5! (.25) (.75) 10 (.25) 3 (.75) 2 3!(5 3)! There is basically, an 8.8% chance that this could happen! 0 .0type 8 7 8O9 What is the probability that MORE THAN 3 of the children have blood? 3 2 39 HOMEWORK Complete the problems pg. 403 (#69 – 80). This assignment will be due for completion at the start of the next session of class. Geometric Probability • We have a geometric setting when the following characteristics are in place: 1. Each observation will fall in to one of two categories, usually considered “success” or “failure”. 2. The probability of success is the same for each observation. 3. All of the n observations are independent. 4. The variable of interest, X, is the number of trials required to obtain the first success. 41 EXAMPLE If we are rolling a single die, and we want to roll a “5”, then how many rolls would it take to get a five for the first time? GEOMETRIC DISTRIBUTION If we are rolling a die four times, and we want to count the number of fives that we roll … BINOMIAL DISTRIBUTION 42 Calculating Geometric Probabilities • If X has a probability p of occurring, and a probability q of not occurring, the possible values of X are 1, 2, 3, … • If n is any of these values, the probability that the first success occurs on the nth trial is: n 1 P (probability X nthat ) it would q takep3 rolls before we got our • What would be the first five? 6 rolls? 43 Using the TI-84: pdf • Just as with binomial probabilities, we can use the calculator to quickly compute geometric probabilities. • The geometpdf function will quickly compute the probability for a set number of trials being required to achieve first success. • To compute the probability that it would take five rolls to roll a “5” for the first time, we would use: geom etpdf (1 / 6, 5) 0 .0 8 0 4 44 The Geometric Distribution • The geometric probability distribution also has a mean and standard deviation. • The mean, or expected value, of a geometric random variable is: 1 • The standard deviation of a geometricprandom variable is: q p 2 45 HOMEWORK Complete the problems 8.37 – 8.46. This assignment will be due for completion at the start of the next session of class.