Chapter 4 Random Variables and Discrete Probability Distributions A Random variable is function that assigns a unique numerical value to each outcome of the sample space. Its values vary according to rules of probability. It is different from variables that you saw in algebra, because the value it will take on when an experiment is run is unknown, non-deterministic. The probability distribution of a random variable gives all the values the random variable can take on and the probability the random variable takes on each of its values. Discrete RV’s count the number of some object, occurrence or subject. Ex: A package of 5 pens. Let Y be the RV that counts the number of pens that are defective in the sample. The probability distribution is given as follows: Y 0 1 2 3 4 5 P(Y) .75 .10 .06 .04 .03 .02 P(Y=0) P(Y=1) P(Y=2) P(Y=3) P(Y=4) P(Y=5) = .75 = .10 = .06 = .04 = .03 = .02 0 pens of 5 was defective Note that all probabilities are between 0 and 1 and the sum of the probabilities = 1. Y = number of pens that are defective in the package These two properties will be true of all DRV’s. What is the probability that at most 2 pens are defective? P(Y ≤ 2) = P(Y=0) + P(Y=1) + P(Y=2) = .75 + .10 + .06 = .91 What is the probability that at least 1 pen is defective? P( Y ≥ 1) = 1 – P(Y = 0) = 1 - .75 = .25 What is the probability that between 1 and 4 (inclusive) pens are defective? P (1 ≤ Y ≤ 4) = P (Y ≤ 4) – P (Y < 1) = .98 - .75 = .23 P(Y=1) + P(Y=2) + P(Y=3) + P(Y=4) = .10 + .06 + .04 + .03 = .23 Note that P (Y ≤ 4) – P (Y ≤ 1) = .98 - .85 = .15 which is not the correct answer. Chapter 4 Discrete Random Variables Expected Value, Variance and Standard Deviation Expected Values Expected value = mean (weighted by probabilities) E(X) = μ = µx = ∑x P(X = x) = ∑x p(x) The mean of a DRV is the weighted average. It is interpreted as the value that will happen on average. It does not have to be an actual value that the DRV can be. For the pen example recall that the probability distribution function (pdf) There is a package of 5 pens. Let Y be the RV that counts the number of pens that are defective in the sample. The probability distribution is given as follows: Y 0 1 2 3 4 5 P(Y) .75 .10 .06 .04 .03 .02 Last example: E(Y) = 0*.75 + 1*.10 + 2*.06 + 3*.04 + 4*.03 + 5*.02 = 0.56 Does this mean that there are 0.56 defective pens in each package of 5? No. It means on average there are 0.56 defective pens in each package of 5. So if you had 100 packages of 5 pens, there would be about 56 defective pens. Variance(X) = V(X) = σ2 = E[(X – μ)2] = E(X2) - µ2 where μ = E(X) Standard Deviation(X) = σ = √( σ2) For pen example: Recall E(Y) = .56 Y 0 1 P(Y) 0.75 0.1 Y-µ (Y-µ)2 (Y-µ)2*p(y) -0.56 0.3136 0.2352 Variance Stdev 1.3664 1.168931 0.44 0.1936 0.01936 2 0.06 3 0.04 4 0.03 5 0.02 1.44 2.0736 0.124416 2.44 5.9536 0.238144 3.44 11.8336 0.355008 4.44 19.7136 0.394272 Y P(Y) 0 0.75 1 0.1 2 0.06 3 0.04 4 0.03 5 0.02 Y^2 Y^2P(Y=y) 0 0 1 0.1 4 0.24 9 0.36 16 0.48 25 0.5 variance = 1.68 -0.562 = 1.3664 Standard deviation = 1.168931 1.68 The Binomial Probability Distribution. Ex: Test of 3 Multiple Choice Questions. The probability of getting any one question right is ¼ = .25 Assume that the questions are independent. Let Y = # of questions right S = {RRR, RRW, RWR, WRR, WWR, WRW, RWW, WWW} Note: P(Y=3) = P(RRR) ≠1/8 The outcomes are not equally-likely! A tree diagram may help. P(Y=0) =¾*¾*¾ = 27/64=.42 P(Y=1) = P(RWW+WRW+WWR) P(Y=1) =3 * (¾*¾*¼) = 27/64 =.42 P(Y=2) = P(WRR+RWR+RRW) P(Y=2) =3 * (¾*¼*¼) = 9/64 =.14 P(Y=3) = P(RRR)=¼*¼*¼ =1/64 = .02 Note: 27/64+27/64+9/64+1/64=1 A Binomial Random Variable counts the number of “successes” in n trials. In the last example the number of correct answers in 3 questions. Five Characteristics of a Bin R.V. 1. Fixed # n of identical trials. Ex. 3 Multiple Choice questions or Selecting 10 people from a large population. 2. The outcome of each trial can be classified as a Success or Failure. Success is not necessarily a good thing. 3. The trials of the experiment are independent. Outcomes of previous trials to not affect future trials. 4. The probability of a Success at each trial = p, is the same for all trials. This also means that the probability of a Failure is the same also = 1-p = q. 5. The random variable counts the number of Successes in the n trials. Clues that the RV is Binomial: 1. Random sample of size n 2. Sample comes from a large population, or sampling with replacement. 3. Each trial can be classified as a Success or Failure. Binomial Formula P(Y = x) = n C x p x q n – x For x = 0, 1, 2, … , n For Multiple Choice test example. Y be Bin(n=3, p = ¼ ) Sampling with replacement, if you answer the first question correctly 1 you can answer correctly again, R is replaced. x = 0 means 0 questions correct. P(Y=0) = 3 C 0 (¾) 3 (¼) 0 P(Y=0) = 1 (27/64) (1) = 27/64 x = 1 means 1 questions correct. P(Y=1) = 3 C 1 (¾) 2 (¼) 1 P(Y=1) = 3 (9/16) (¼) = 27/64 x = 2 means 1 question correct. P(Y=2) = 3 C 2 (¾) 1 (¼) 2 P(Y=2) = 3 (3/4) (1/16) = 9/64 x = 3 means 3 questions correct. P(Y=3) = 3 C 3 (¾) 0 (¼) 3 P(Y=3) = 1 (1) (1/64) = 1/64 On the TI83/84 [2nd] [DIST] (VARS key) 0: binompdf(n, p, x) binompdf (n, p, x) gives P(Y = x) if Y is Bin(n=3, p = ¼ ) P(Y = 2) = binompdf(3, ¼, 2) = 9/64 = 0.141 On the TI83/84 [2nd] [DISTR] (VARS key) binomcdf(n, p, x) binomcdf(n, p, x) gives P(Y ≤ x) if Y is Bin(n=10, p = .6) P(Y ≤ 8) = binomcdf(10, .6, 8) = 0.9536 What if you were asked for P(Y > 8)? P(Y > 8) = 1 – P(Y ≤ 8) = 1 - .9536 =.0464 Use pdf for probability that Y exactly = a number and Use cdf for probability that Y <, > , ≤ , ≥ numbers. Ex: It is known that 25% of the population are bald. A random sample of 20 people is taken. 1. What is the probability exactly 5 people of the sample are bald? 2. What is the probability at most 4 people of the sample are bald? 3. What is the probability more than 6 people of the sample are bald? 4. What is the probability at least 1 person of the sample is bald? 5. What is the probability between 3 and 10 people of the sample are bald? Clues that it is binomial: sampling from a large population so the trials will be (in effect) independent, each trial is a success or failure. A success = the person in bald. Your n = 20, p = .25 and Y counts the number of bald people in the sample. 1. P(Y = 5) = binompdf(20, .25, 5) = .202 2. P(Y ≤ 4) = binomcdf(20, .25, 4) = .415 3. P(Y > 6) = 1 – P(Y ≤ 6) = 1 - binomcdf(20, .25, 6) = 1 - .786 = .214 4. P(Y ≥ 1) = 1 – P(Y = 0) = 1 - binompdf(20, .25, 0) = 1 - .003 = .997 5. P(3 < Y < 10) = P (4 ≤ Y ≤ 9) = P(Y ≤ 9) – P(Y ≤ 3) = Binomcdf( 20, .25, 9) – binomcdf(20,.25,3) = .986 - .225 = .761 The mean or expected value of a Binomial RV is μ = np. The standard deviation of a Binomial RV is σ = √(npq) Recall q = 1- p In the above example (bald people): µ = 20 * .25 = 5 σ = √(20 * .25 * .75) = √3.75 = 1.936 A general note: In most disciplines 0.05 is considered the cutoff value between rare / unusual events and non-rare events. When the probability of an event is less than 0.05 it is considered rare or unusual. Ex2. It is known that 10% of the US population is left-handed. A random sample of 15 people is taken. 1. What is the probability exactly 0 people are left-handed? 2. What is the probability exactly 1 person is left-handed? 3. What is the probability exactly 2 people are left-handed? 4. What is the probability less than 3 people are left-handed? 5. What is the probability of at least one left-handed person? 6. What are the mean and the standard deviation of left handed people in the sample? Answers: Let X count the number of left-handed people in the sample. Since we have a large population and each trial can be classified as a success or failure, X is a binomial random variable. Its parameters are n = 15 and p = 0.10. Then the answers to the questions are: 1. P(X = 0) = .2059 = binompdf(15,.10,0) 2. P(X = 1) = .3432 3. P(X = 2) = .2669 4. P(X < 3) = P(X≤ 2)=.8159 (used cdf) = binomcdf(15,.1,2) P(X< 3) = .2059 + .3432 + .2669 P(X <3) = .816 (Round off error) 5. P(X ≥ 1) = 1 – P(X < 1) = 1 – P(X = 0) = 1 - .2059 = .7941 6. μ= n*p = 15 * .1 = 1.5 σ = √(npq) = √(15 * .1 * .9) = √1.35 = 1.162 Sample Problem 1. An allergist claims that 20% of her patients are allergic to dandelions. Find the following: a. What is the probability exactly 2 of her next 5 patients are allergic to dandelions? b. What is the probability none of her next 5 patients will be allergic to dandelions? c. What is the probability at least 1 of her next 5 patients will be allergic to dandelions? Answers: X = number of patients who are allergic to dandelions of the 5 X has a Binomial Distribution with n = 5 and p = .20 a. P(X = 2) = .2048 b. P(X = 0) = .3277 c. P(X ≥ 1) = 1 – P(X = 0) = .6723 Another Example: There are 10 pens in a bag. Two of the pens do not work. Three pens are randomly selected. What is the probability that all three pens work? Your first instinct might be to try the Binomial distribution. Let Y be the number of pens that work in the sample, so n = 3 and p = .8, and you want to find the P(Y = 3) so you would calculate: binompdf (3, .8 ,3) = .512. This however is incorrect, because you have a small population (10) and the trials are not independent so your p changes. Draw tree diagram! The good news is you have already seen problems like this before in chapter 4. The correct answer is to calculate how many ways you can get 3 pens that work divided by how many ways you can select 3 pens total. We can still let Y be the number of pens that work in the sample, so P(Y = 3) = 8C3 / 10C3 = 56/120 = .467 which is not too far from .512. The above is an example of the Hyper-Geometric Distribution. It is much like the binomial distribution except now we have a small population so our trials are independent and our p = probability of a success at each trial changes. Let n be the sample size, N be the population size and let M be the number of successes in the population. Then Y has a hyper-geometric distribution when Y counts the number of successes in the sample. The pdf of Y is given by: MCx * ( N M )C (n x) P(Y x) NCn Where max (0, n – N + M) ≤ x ≤ min (n, M) Note that M + (N – M) = N and x + (n – x) = n Ex. In a lot of 28 gun cartridges, 8 were found to be contaminated and 20 were “clean.” A random sample if 6 cartridges is taken from the lot. a. Find the probability that all 6 are clean. b. Find the probability that at least one is contaminated. c. Find the probability that exactly 4 are clean.