Chapter 3 Random Variables A Random variable is any rule that associates a number with each outcome of the sample space S. A RV is called a variable but is really a function whose domain is S and whose range is a set of Real numbers. Traditionally capital letter are used to denote RV’s and lower case letters are used to denote specific values of the RV. Ex. Roll 2 fair dice. Let Y = sum of the dice. Note that Y is not a number, but the values Y can equal are numbers. S = {(1,1), (1,2),… (6,6)} possible value for Y = {2, 3, 4, …, 12} P(Y = 2) = 1/36 “The Probability that the random variable Y takes on the value 2 is 1/36.” P(2 < Y ≤ 5) = P(Y = 3) + P(Y=4) + P(Y=5) = 2/36 + 3/36 + 4/36 = 9/36 Could also write as P(2 < Y ≤ 5) = P(3) + P(4) + P(5) = 2/36 + 3/36 + 4/36 = 9/36 There are two types of RV’s. Discrete and Continuous. For a discrete RV, the possible values are at most countably infinite. A RV is continuous if both of the following are true: 1. The set of possible values consists of either of all the numbers in an interval or in a disjoint union of such intervals. [(0,1) or (0,1) U (2,3)] 2. No possible value of the variable has positive probability, that is, for all possible c, P(X = c) = 0. The probability distribution or probability mass function of a random variable gives the probability the random variable takes on each of its values. The pmf for a DRV is defined for every x by p(x) = P(X = x) = P(all s in S: X(s) = x). The pmf gives the probability that the RV takes on each specific value. The pmf is usually given as a table or a function. Discrete RV’s count the number of some object, occurrence or subject. For all Discrete Random Variables or Probability Distributions 1. ∑P(Y = y) =1 where the sum is over all possible values y. 2. P(Y = y) ≥ 0 for all possible values y 1 Ex. A package of 5 pens. Let Y be the RV that counts the number of pens that are defective in the sample. The probability distribution is given as follows: Y 0 1 2 3 4 5 P(y) .75 .10 .06 .04 .03 .02 P(Y=0) = .75 0 pens of 5 was defective P(Y=1) = .10 P(Y=2) = .06 P(Y=3) = .04 P(Y=4) = .03 P(Y=5) = .02 Note that all probabilities are between 0 and 1 and the sum of the probabilities = 1. What is the probability that at most 2 pens are defective? P(Y ≤ 2) = P(Y=0) + P(Y=1) + P(Y=2) = .75 + .10 + .06 = .91 What is the probability that at least 1 pen is defective? P( Y ≥ 1) = 1 – P(Y = 0) = 1 - .75 = .25 Ex. Let X be a DRV such that p(x) = .1x for x = 1, 2, 3, 4 the pmf as a table is: X 1 2 3 4 p(x) .1 .2 .3 .4 Note that all p(x) > 0 and ∑p(x) = 1 Probability Histogram 0.45 0.4 0.35 0.3 0.25 p(x) 0.2 0.15 0.1 0.05 0 1 2 3 4 x The cumulative distribution function (cdf) labeled F(x) for a DRV X with pmf p(x) is defined for every x by: F(x) = P(X ≤ x) = ∑p(y) (summation over all y ≤ x) F(x) is the probability that X is less than or equal to x. for any numbers a and b with a < b: P(a ≤ X ≤ b) = F(b) – F(a –1) 2 Ex. Y P(y) 0 .75 1 .10 2 .06 3 .04 4 .03 5 .02 F(0) = .75 = P(Y ≤ 0) F(1) = .85 = P(Y ≤ 1) = p(0) + p(1) F(1.5) = .85 = P(Y ≤ 1.5) p(1.5) = 0 = P(Y = 1.5) P(2 ≤ Y ≤ 4) = F(4) – F(1) = .98 - .85 = .13 = p(2) + p(3) + p(4) y F(y) = P(Y ≤ y) y<0 0 0≤y<1 .75 1≤y<2 .85 2≤y<3 .91 3≤y<4 .95 4≤y<5 .98 y≥5 1.0 There are a few special DRV we will mention by name shortly. In Class: 11, 15, 24, 16 #11. X 4 6 p(x) .45 .40 8 .15 0.5 0.45 0.4 0.35 0.3 0.25 p(x) 0.2 0.15 0.1 0.05 0 4 6 8 x c. P(X ≥ 6 ) = .55 P(X > 6) = .15 3 Expected Values Expected value = mean (weighted by probabilities) E(X) = μ = μx = ∑x P(X = x) = ∑x p(x) E(X) is the mean of the conceptual population, long run average. Pen example: E(Y) = 0*.75 + 1*.10 + 2*.06 + 3*.04 + 4*.03 + 5*.02 E(Y) = .56 Does this mean that there is 0.56 defective pens in each package of 5? No. It means on average there is 0.56 defective pens in each package of 5. So if you had 100 packages of 5 pens, there would be about 56 defective pens. Geometric RV: counts the number of “trials” until the first “success”. trials mean the number of times the experiment is run. success is not necessarily a good thing, just what you are counting. If X has a geometric distribution, then the pmf is p(x) = p(1 – p)x – 1 for x = 1, 2, 3, … p(x) = 0 otherwise p is the probability of a success at each trial. So if p = .75 then the P(X = 2) = p(2) = .75 (.25) = .1875 Prove that E(X) = 1/p for X a Geometric RV with parameter p. pf. d x E ( X ) x 1 xp(1 p) x 1 p 1 p x 1 dp d x 1 p E ( X ) p dp x 1 we know that 1 p x 1 p 1 p 1 (1 p ) p x 1 so d 1 p p 1 p 1 p E ( X ) p p2 dp p p Properties of Expected Values: Let X and Y be Discrete RV’s 1. 2. If c is a constant then E(c) = c If c is a constant then E(cX) = cE(X) 4 3. E(X + Y) = E(X) + E(Y) = μx + μy 4. In general, E(XY) ≠ E(X)E(Y), if X and Y are independent then E(XY) = E(X)E(Y) 5. Ex. E(h(X)) = ∑ h(x) p(x) E(X2) = ∑ x2 p(x) 6. Variance(X) = V(X) = σ2 = E[(X – μ)2] where μ = E(X) Standard Deviation(X) = σ = √( σ2) summed over all possible x. Exercise: Show that σ2 = E(X2) – μ2. σ2 = E[(X – μ)2] = E[X2 –2μX + μ2 ] = E(X2 –2 μ2 + μ2) σ2 = E(X2) – μ2 For pen example: Recall E(Y) = .56 Y P(Y) 0 0.75 1 0.1 2 0.06 3 0.04 4 0.03 5 0.02 Y–μ (Y – μ )^2 *p(y) -0.56 0.3136 0.2352 0.44 0.1936 0.01936 1.44 2.0736 0.124416 2.44 5.9536 0.238144 3.44 11.8336 0.355008 4.44 19.7136 0.394272 Variance Stdev 1.366 1.169 Y P(Y) 0 0.75 1 0.1 2 0.06 3 0.04 4 0.03 5 0.02 Y^2 E(Y^2) 0 1.68 0.1 0.24 0.36 0.48 0.5 Variance Stdev 1.366 1.169 Let X be a RV with mean μ and variance σ2, and let a and b be real numbers, then V(aX + b) = σ2 (aX + b) = a2 σ2, and σ(aX + b) = |a|σ #29 x p(x) 0 .08 1 .15 2 .45 3 .27 4 .05 5 E(X) = 0*.08 + 1*.15 + 2*.45 + 3*.27 + 4*.05 E(X) = 0 + .15 + .90 + .81 + .20 = 2.06 V(X) = .08*2.062 + .15*1.062 + .45*.062 + .27*.942 + .05*1.942 V(X) = .9364 σ = .9677 E(X2) = .08*0 +.15*1 +.45*4 + .27*9 + .05*16 = 5.18 V(X) = 5.18 – 2.062 = .9364 Chebyshev’s Inequality Let X be a RV with mean μ, standard deviation σ, and k > 1, then P(|X - μ)| ≥ kσ) ≤ 1/k2. #44 The probability that any value x is at least k standard deviations from the mean is at most 1/k2. P(|X - μ)| ≥ 2σ) ≤ ¼ = .25 P(|X - μ)| ≥ 3σ) ≤ 1/9 = .111 P(|X - μ)| ≥ 4σ) ≤ 1/16 = .0625 P(|X - μ)| ≥ 5σ) ≤ 1/25 = .04 P(|X - μ)| ≥ 10σ) ≤ 1/100 = .01 What does P(|X – μ| ≥ 3σ) mean? P(|X – μ| ≥ 3σ) = P(X - μ ≤ -3σ) + P(X – μ ≥ 3σ) P(X ≤ μ – 3σ) + P(X ≥ μ + 3σ) For exercise 13, μ = 2.64 and V(X) = 2.3074 σ = 1.5440 P(|X – μ| ≥ 3σ) = P(X ≤ 2.64 – 3*1.544) + P(X ≥ μ + 3*1.544) = P(X ≤ 2.64 – 4.632) + P(X ≥ 2.64 + 4.632) = P(X ≤ -1.992) + P(X ≥ 7.272) = 0 which ≤ 1/9 The bounds are conservative. c. X p(x) x*p(x) x*x*p(x) -1 1/18 0 8/9 1 1/18 - 1/18 1/18 0/18 0/18 1/18 1/18 E(X) = 0 E(XX) = 1/9 V(X) = 1/9 σ = 1/3 P(|X – μ| ≥ 3σ) = P(X ≤ -1) + P(X ≥ 1) = 1/9 by Chebyshev’s Inequality P(|X – μ| ≥ 3σ) ≤ 1/9, so the inequality is exact. 6 d. X p(x) x*p(x) x*x*p(x) -1 1/50 0 24/25 1 1/50 - 1/50 1/50 0/50 0/50 1/50 1/50 The Binomial Probability Distribution. Ex. Test of 3 Multiple Choice Questions. The probability of getting any one question right is ¼ = .25 Assume that the questions are independent. Let Y = # of questions right S = {RRR, RRW, RWR, WRR, WWR, WRW, RWW, WWW} Note: P(Y=3)=P(RRR) ≠1/8 The outcomes are not equally-likely! A tree diagram may help. P(Y=0) =¾*¾*¾ = 27/64=.42 P(Y=1)=P(RWW+WRW+WWR) P(Y=1) =3 * (¾*¾*¼) = 27/64 =.42 P(Y=2)=P(WRR+RWR+RRW) P(Y=2) =3 * (¾*¼*¼) = 9/64 =.14 P(Y=0)=P(RRR)=¼*¼*¼ =1/64 = .02 Note: 27/64+27/64+9/64+1/64=1 A Binomial Random Variable counts the number of “successes” in n trials. Ex. The number of correct answers in 3 questions. Five Characteristics of a Bin R.V. 1. Fixed # n of identical trials. Ex. 3 True-False questions or Selecting 10 people from a large population. 2. The outcome of each trial can be classified as a Success or Failure. Success is not necessarily a good thing. 3. The probability of a Success at each trial = p, is the same for all trials. This also means that the probability of a Failure is the same also = 1-p = q. 4. The trials of the experiment are independent. Outcomes of previous trials to not affect future trials. 5. The random variable counts the number of Successes in the n trials. Clues that the RV is Binomial. 1. Random sample of size n 7 2. Sample comes from a large population, or sampling with replacement. (this will make the trials independent) 3. Each trial can be classified as a Success or Failure. Binomial Formula P(X = x) =b(x; n, p)= n C x p x q n – x For x = 0, 1, 2, … , n For Multiple Choice test example. Y is Bin(n=3, p = 1/4 ) Sampling with replacement, if you answer T to question 1 you can answer T again, T is replaced. k = 0 means 0 questions correct. P(Y=0) = 3 C 0 (¼) 0 (¾) 3 P(Y=0) = 1 (1) (27/64) = 27/64 k = 1 means 1 question correct. P(Y=1) = 3 C 1 (¼) 1 (¾) 2 P(Y=1) = 3 (9/16) (1/4) = 9/64 On the TI83/84 [2nd] [DIST] (VARS key) 0: binompdf(n, p, x) binompdf (n, p, x) gives P(Y = x) b(x; n, p) if Y is Bin(n=3, p = ¼ ) P(Y = 2) = binompdf(3, ¼ , 2) = 9/64 = 0.1406 On the TI83/84 [2nd] [DIST] (VARS key) binomcdf(n, p, x) binomcdf(n, p, x) gives P(Y ≤ x) B(x; n, p) if Y is Bin(n=10, p = .6) P(Y ≤ 8) = binomcdf(10, .6, 8) = 0.9536 B(8; 10, .6) What if you were asked for P(Y > 8)? P(Y > 8) = 1 – P(Y ≤ 8) = 1 - .9536 =.0464 Use pdf for probability that Y exactly = a number and Use cdf for probability that Y <, > , ≤ , ≥ numbers. 8 Ex. In the NBA Dolph Schayes makes 90% of his free throws. In a particular game he shoots 10 free throws. Let X count the number free throws made. What must you assume for X to have a binomial distribution. Specify n and p. Find the probability that he made all 10 free throws. Find the probability that he made 9 free throws. Clues that it is binomial: sampling with replacement, each trial is a success or failure. A success = made the free throw. a. we need to assume that the free throws are independent. b. n = 10 and p = .9 c. P(X = 10) = use pdf = P(X = 10) = .3487 P(X = 9) use pdf = P(X = 9) = .3874 Extra Questions: What is the probability that Dolph makes: d. at most 9 free throws P(X ≤ 9) = use cdf = P(X ≤ 9) = .6513 Or P(X ≤ 9) = 1 – P(X > 9) = 1 – P(X =10) = 1 - .3487 e. at least 8 free throws? P(X ≥ 8) = 1 – P(X < 8) = 1 - P(X ≤ 7) = 1 - .0702 = .9298 f. 0 free throws. P(X = 0) = 1 E-10 = 0.0000000001 ≈ 0 g. at least 1 free throw. P(X ≥ 1) = 1 – P(X = 0) ≈ 1 – 0 ≈ 1 The mean or expected value of a Binomial RV is μ = np. The standard deviation of a Binomial RV is σ = √(npq) Recall q = 1- p Ex2. It is known that 10% of the US population is left-handed. A random sample of 15 people is taken. 1. What is the probability exactly 0 people are left-handed? 2. What is the probability exactly 1 person is left-handed? 3. What is the probability exactly 2 people are left-handed? 4. What is the probability less than 3 people are left-handed? 5. What is the probability of at least one left-handed person? 9 6. What are the mean and the standard deviation of left handed people in the sample? Answers: Let X count the number of left-handed people in the sample. Since we have a large population and each trial can be classified as a success or failure, X is a binomial random variable. Its parameters are n = 15 and p = 0.10. Then the answers to the questions are: 1. P(X = 0) = .2059 2. P(X = 1) = .3432 3. P(X = 2) = .2669 4. P(X < 3) = P(X≤ 2)=.8159 (used cdf) P(X< 3) = .2059 + .3432 + .2669 P(X <3) = .816 (Round off error) 5. P(X ≥ 1) = 1 – P(X < 1) = 1 – P(X = 0) = 1 - .2059 = .7941 6. μ= n*p = 15 * .1 = 1.5 σ = √(npq) = √(15 * .1 * .9) = √1.35 = 1.162 In Class. 48, 55, 57, 62 48. Circuit Boards X = number of defective circuit boards in the sample. n = 25, p = .05. X ~ binomial because we are sampling from a large population. a. P(X ≤ 2) = .8729 = probability that there are at most 2 defectives. b. P(X ≥ 5) = 1 – P(X ≤ 4) = 1 - .9928 = .0072 = Probability that there are at least 5 defectives. c. P(1 ≤ X ≤ 4) = .7154 = Probability that there are between 1 and 4 defectives, inclusive. d. P(X = 0) = .2774 e. μ = 25 * .05 = 1.25 σ = √(25 * .05 * .95) = 1.090 Prove that b(x; n, p ) = b(n - x; n, 1 – p) Let q = 1 - p b(x; n, p ) = nCx px q n-x b(n-x; n, q ) = nC(n-x) qn-x (1- q) n-(n-x) but 1 – q = p and n – (n – x) = x and we have previously shown that nCx = nC(n-x) so b(n-x; n, q ) = nCx qn-x p x = b(x; n, p ) Ex. Suppose we have 20 computers. Unknown to you, 20% or 4 computers are defective and will explode when you turn them on. You are to test a random sample of 3 computers. What is the probability that exactly 1 of 10 the computers explodes? What is the probability that exactly 2 of the computers explodes? What is the probability that exactly 3 of the computers explodes? What is the probability that exactly 0 of the computers explodes? Your first thought may be to use the binomial distribution, with n = 3, p = 4/20 = 1/5 = .2. But this is not correct because the population (20) is not large. Sure the probability of selecting a defective computer on the first selection is 4/20, but what about the second selection? If you selected the a defective computer on the first try, then the probability of selecting a defective computer on the second selection is 3/19 = .158, but if you did not select a defective computer on the first try, then the probability of selecting a defective computer on the second selection is 4/19 = .210. This violates the assumption of independence and a constant p of the binomial distribution. So how to do this problem: Use the Hyper-Geometric Distribution. Y is the random variable which counts the number of successes in the sample of size n from a population of size N and M is the total number of successes in the population. It is much like the Binomial, but used when n > .05 N. Each outcome can be characterized as a Success or Failure and there is a sample of size n. The pdf for the Hyper-Geometric RV is: MCx * ( N M )C (n x) C ( x, M ) * C (( n x), ( N M )) NCn C (n, N ) max[0, n – (N-M)] ≤ k ≤ min [M, n] P(Y x) In the computer example above: N = 20, n = 3, M = 4 where Y counts the number of defective computers. P(Y = 1) = 4 C 1 * 16 C 2 / 20 C 3 = 480 / 1140 = .421 h(y; n, M, N) = h(1; 3, 4, 20) k 0 1 2 3 P(Y = k) 0.491 0.421 0.084 0.004 The above probabilities were found using the HYPGEOMDIST function in Excel. Sad to say that your calculators do not have a Hyper-Geometric function. Note that as N increases the Hyper-Geometric distribution approaches the Binomial distribution. In fact most problems that we use the Binomial for, are theoretically Hyper-Geometric. However, we rarely know the 11 population size, N, and the approximation is very good. How good? We shall see lab. Note also that we did problems like this already in chapter 3 when we did combination problems. Ex. There are 15 students in a class. There are 8 women and 7 men. 4 are randomly selected to be on a committee. Let Y be the number of women on the committee. Find the probability distribution of Y. Y is Hyper-Geometric with N = 15, n = 4, M = 8. Possible values for Y = 0, 1, 2, 3, 4 k 0 1 2 3 4 P(Y = k) 0.026 0.205 0.431 0.287 0.051 Also note that if Y is Hyper-Geometric with N, n and M then Mn E (Y ) N ( N n) nM M Var (Y ) * * 1 ( N 1) N N so for the previous example: E(Y) = 8*4/15 = 2.133 V(Y) = 8(7)(4)(11)/225*14 = .783 σ = .884 In Class: 68, 71 64. 68. Cameras. X = number of 3 mega-pixel cameras in sample. X ~ HG(n=5, M=6, N=15) P(X =2) = 1260/3003 = .4196 P(X ≤2) = (126 + 756 + 1260)/3003 = 2142/3003 = .7133 P(X ≥ 2) = 1 – (P(0) + P(1)) = 2373/3003 = .7902 71 The Negative Binomial Distribution 1. Sequence of independent trials. (With replacement or large population) 12 2. Each trial results in a Success (S) or Failure (F). 3. The probability for a success is constant from trial to trail. P(S on any trial) = p 4. The experiment continues until r successes are observed, r is a preset positive integer. X counts the number of failures that precede the rth success. X~NB(r, p) x nb( x; r , p) x r 1C r 1 p r 1 p x=0, 1, 2, … E( X ) r (1 p) p V (X ) r (1 p) p2 Notes: for a Negative Binomial RV, there is no fixed sample size. Some other books count the total number trial not the failures. 75. Sex of children p = .5, 1 – p = .5 X = number of male children until 2 females, so r = 2 P(X = x) = (x + 1)C 1 (.52)(.5x) x = 0, 1, … P(X = x) = (x + 1) .52+x x = 0, 1, … P(X = 2) = 3 C 1 .54 = 3/16 = .1875 P(at most 4 children) = P(X ≤ 2) = P(0) + P(1) + P(2) = .25 + .25 + .1875 = .6875 E(Males) = 2 * .5 / .5 = 2 E(number of children) = 2 + 2 = 4 The Geometric Distribution is a special case of the Negative Binomial when r =1, and usually counts the number of trials, not failures. Ex. A door-to-door encyclopedia salesman is required to document 5 inhome visits each day. Suppose that he has a 30% chance of being invited into a home and that each home is independent. What is the probability that he requires fewer than 8 houses to achieve the 5 required? nb(x; p = .3, r = 5) 13 P(X < 3) = P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) P(X = 0)= 4 C 4 * .3^5 * .7^0 = .0024 P(X = 1) = 5 C 4 * .3^5 * .7^1 = .0085 P(X = 2) = 6C4 * .3^5 * .7^2 = .0178 P(X < 3) = .0288 The Poisson Distribution Characteristics: 1. The experiment consists of counting the number of times a certain event occurs during a specific time or in an area. 2. The Probability that the event occurs in a given time or area is the same for all units. 3. The number of events that occur in one unit is independent of the other units of time or area. 4. The mean or expected number of events in each unit is denoted by the Greek letter lambda = λ. If Y has a Poisson Distribution with mean λ then the pdf for Y is: x e P( X x) =p(x; λ) for k = 0, 1, 2, … x! E(Y) = λ and Var(Y) = λ Ex. The average number of occurrences of a certain disease in a certain population is 2 per 1000 people. In a random sample of 1000 people from this population, what is the probability that exactly 1 person in the sample has the disease? What is the probability that at least 1 person in the sample has the disease? Y = counts the number of people in the sample with the disease. Y has a Poisson distribution with λ = 2. P(Y = 1) = 21 * e -2 / 1! = .270 P(Y ≥ 1) = 1 – P(Y = 0) = 1 - .135 = .865 On the TI83/84 [2nd] [DIST] (VARS key) C: poissonpdf(λ, k) poissonpdf(λ, k) gives P(Y = k) for the last example: 14 poissonpdf(2, 1) = .270 poissonpdf(2, 0) = .135 On the TI83/84 [2nd] [DIST] (VARS key) poissoncdf(λ, k) poissoncdf(λ, k) gives P(Y ≤ k) In Class: 75, 83 79. X ~ Poisson(λ = 5) P(X ≤ 8) = .9319 P(X = 8) = .932 - .867 = .065 P(9 ≤ X) = 1 - .932 = .068 P(5 ≤ X ≤ 8) = .932 - .440 = .492 P(5 < X < 8) = P(X ≤ 7) – P(X ≤ 5) = .867 - .616 = .251 88. 15