STAT 401 Week 3 Lab 1 Geoffrey Thompson 6/3/2013 1 Example Problems The first problem is about binomial and hypergeometric distributions, the second problem is about calculating the expectation and variance of random variables, and the third problem is about continuous uniform and normal distributions. 1.1 Zombie Outbreak Unfortunately, there has been an outbreak of zombie-ism or something like it in the country. Everybody in the city has an independent probability p of being infected. Everybody who is infected becomes a zombie very quickly, so we will simply refer to infected people as zombies even if they are not yet presenting with symptoms. This is horribly inconvenient, but we will have lab today anyway. 1. There are 50000 people in Ames and p = 0.04. If we are interested in the number of zombies in town, what probability distribution is this? If you are only interested in the probability that one specific person is a zombie, what probability distribution is this? This is a binomial distribution. If you are only interested in one student, this is a bernoulli. 2. Given the numbers in (1), what is the expected number of zombies in Ames? The expectation for a binomial is np, so this is 50000 · 0.04 = 2000. 3. What is the variance of the number of zombies in Ames? The variance of a binomial is np(1 − p), so this is 50000 · 0.04 · 0.96 = 1920 4. There are 40 students in this class. What is the expected number of zombies in this class? What is the variance of the number of zombies in this class? 1 This is still a binomial problem because we have a number of students, n = 40, a probability that a given student is a zombie, p = 0.04, and we are interested in counting zombies. The expectation of a binomial is np, so here it is np = 40 · 0.04 = 1.6. The variance of a binomial is np(1 − p), so here it is np(1 − p) = 40 · 0.04 · .96 = 1.536 5. Suppose the class breaks into small groups to survive the zombie apocalypse. Suppose, for these questions, p = 0.1. (a) In a group of two people, what is the probability that none are zombies? What is the probability that 1 is a zombie? This is still a binomial. Since each student is independently considered, the size of the class is irrelevant. p = 0.1 no matter what. 2 0 P (X = 0) = p (1 − p)2 = (.9)2 = .81 0 2 P (X = 1) = p(1 − p) = 2(.1)(.9) = .18 1 (b) What about in a group of 5 people? 5 0 P (X = 0) = p (1 − p)5 = (.9)5 = 0.5905 0 5 P (X = 1) = p(1 − p)4 = 2(.1)(.9)4 = 0.1312 1 (c) What about in a group of 10 people? 10 0 P (X = 0) = p (1 − p)10 = (.9)10 = 0.3487 0 10 P (X = 1) = p(1 − p)9 = 2(.1)(.9)9 = 0.0775 1 6. Suppose we know that 5 people in class are zombies, but we do not know which. (a) In a group of two people, what is the probability that 1 is a zombie? HINT: hypergeometric. This is hypergeometric because we have a population, the class, we know exactly how many zombies are in the class, and we’re taking groups ut of the class and trying to figure out how many zombies are in those groups. From the definition of a hypergeometric in our notes, we have the following: N = 40; M = 5; n = 2; x = 1 2 From the definition, then: M x P (X = 1) = N −M n−x N n 5 1 35 1 40 2 = = 0.2244 (b) In a group of 5 people, what is the probability that all 5 are zombies? P (X = 5) = M x N −M n−x N n 5 5 35 0 40 5 = = 1.5197 × 10−6 (c) In a group of 5 people, what is the probability that exactly 2 people are zombies? P (X = 2) = M x N −M n−x N n 5 2 35 3 40 5 = = 0.0995 (d) In a group of 5 people, what is the probability that none are zombies? P (X = 0) = M x N −M n−x N n 5 0 35 5 40 5 = = 0.4934 (e) In a group of 10 people, what is the probability that all 10 are zombies? Since there are only 5 zombies, there cannot be 10 zombies in a group. The probability of any impossible event is 0. Of course, zombies themselves are impossible, but let’s ignore that for the sake of argument. (f) In a group of 10 people, what is the probability that exactly 2 are zombies? P (X = 2) = 1.2 M x N −M n−x N n = 5 2 30 8 40 10 = 0.069 Rental Property Mr. Brocklehurst owns three rental properties. All three have leases expiring in July and he still has not found new tenants. However, he knows that, for each property, he has a probability p = 0.8 of finding a new tenant for July. For each property, he has a fixed cost of $475. For each property that he rents out, he receives a rent of $750. He receives $0 for each property that he does not rent out. For each property he does not rent out, he has an additional maintenance expense of $50. 3 P (X = x) Profit(x) P (x) · P rof it(x) P (x) · P rof it(x)2 x=0 0.008 -1575 -12.6 1.9845 × 104 1 0.096 -775 -74.4 5.766 × 104 2 0.384 25 9.6 240 3 0.512 825 422.4 3.4848 × 105 Total 1 — 345 4.2622 × 105 1. What is the expected number of rented properties? This is a binomial distribution, so the expected number of rented properties is np = 3 · 0.8 = 2.4. 2. In the table above, X is the number of rented properties. Fill out P (x). The pmf is: P (X = x) 3 x p (1 − p)3−x x 3. Profit is a random variable that is a function of X. Profit(x) denotes the profit at a particular value of x of X. Calculate the profit for each scenario. P rof it(x) = −475 · 3 + 750x − 50(3 − x) = −1575 + 800x 4. After calculating the profit for each scenario, multiply the profits by the probability of that outcome. Sum them up to get the expected profit in the last column. You can either solve this by that method or note that E(P rof it) = −1575 + 800E(X) = 345 5. Fill in the bottom row and calculate V ar(P rof it). V ar(P rof it) = E(P rof it2 ) − E(P rof it)2 = 4.2622 × 105 − 3452 = 3.072 × 105 Alternatively, note that profit is a linear function of X and therefore V ar(P rof it) = 8002 V ar(X) = 3.072 × 105 1.3 Continuous Zombie Problems More bad news: there has been another zombie outbreak and it somehow involves continuous probability distributions. 1. The number of zombies in Ames is uniformly distributed between 1000 and 9000. (a) What is the expected number of zombies in Ames? The endpoints of the distribution are 1000 and 9000. By definition, the expectation is 9000 + 1000 = 5000 2 4 (b) What is the variance in the number of zombies in Ames? V ar(X) = 1 80002 (B − A)2 = = 5.3333 × 106 12 12 From the formula for uniform distributions. (c) What is the probability that between 3000 and 4000 zombies are in Ames? There are two ways of doing this: either an integral of the pdf or using what we know about the uniform distribution. Integral: 1 for x ∈ (1000, 9000) and 0 otherwise (from The pdf is fX (x) = 8000 the definitions). Z 4000 Z 4000 fX (x)dx = 3000 3000 dx 4000 − 3000 = = 0.125 8000 8000 Quick way: If you are trying to find P (a ≤ X ≤ b) for a uniform r.v. on the interval (A, B) with A ≤ a < b ≤ B, then: P (a ≤ X ≤ b) = b−a B−A (d) What is the probability less than 6000 zombies are in Ames? Using the shortcut above, we have: b−a 6000 − 1000 5 = = = 0.625 B−A 9000 − 1000 8 Note that we have 6000 − 1000 instead of 6000. (e) The zombie outbreak will cost the city $1,000,000 plus an additional $17,000 per zombie. What is the expected cost of the zombie outbreak? What is the standard deviation of the cost of the zombie outbreak? This is a linear function of X. Therefore, we can use the tools we already know. Y = 1000000 + 17000X E(Y ) = E(1000000 + 17000X) = 1000000 + 17000E(X) = 8.6 × 107 V ar(Y ) = V ar(1000000 + 17000X) = 170002 V ar(X) = σY2 q σY = σY2 = 3.926 × 107 (f) In the file http://gzt.public.iastate.edu/stat401/data/unifzombie. txt, I have simulated the draws from this distribution. In JMP, load 5 this data set and calculate the mean and variance. Plot a histogram. Sorry, this isn’t JMP, but it’s the easiest way to show a histogram. 2. (only if we’ve gotten to the normal distribution) The number of zombies in Ames is normally distributed with mean µ = 5000 and standard deviation σ = 2000. (a) Write out the formula for the pdf for the number of zombies in Ames. 1 (x − µ)2 1 (x − 5000)2 √ exp − exp − fX (x) = √ = 2σ 2 2 · 20002 2πσ 2000 2π (b) What is the probability fewer than 5000 zombies are in Ames? Here is a standard method for calculating probabilities in the normal distribution. The idea is that you transform it to a standard normal. P (X < 5000) = P (X − µ < 0) X − µ) =P < 0 = P (Z < 0) = Φ(0) = .5 σ It is helpful when doing this to keep the “X” terms as symbols (e.g., µ) while using their numeric equivalents with the numbers you do have. This lets you know when you have gotten to Z. 6 (c) What is the probability fewer than 7000 zombies are in Ames? We use the standard method above: P (X < 7000) = P (X − µ < 7000 − 5000 = 2000) X − µ) 2000 =P < = P (Z < 1) σ 2000 = Φ(1) = 0.8413 The idea is to subtract off the mean and then divide by the standard deviation to get a standard normal. Then look up the answer from a table. (d) What is the probability fewer than 3000 zombies are in Ames? P (X < 3000) = P (X − µ < 3000 − 5000 = −2000) −2000 X − µ) < = P (Z < −1) =P σ 2000 = Φ(−1) = 0.1587 (e) How would we calculate the probability between 3256 and 8821 zombies are in Ames? Set up the equations, we do not need to evaluate them. I would have demonstrated this in lab if lecture had covered normal distributions. This one is tricky! I don’t know if you’ve done this in class. There’s a hard way: doing an integral. We do not want to do that. There are a couple easier ways. One is to calculate FX (8821) and FX (2356) and then find FX (8821) − FX (3256), where FX is the cdf of X. A better way is to translate the first easier way into a problem involving standard normals. Here is how that works: P (3256 < X < 8821) = P (3256 − 5000 < X − µ < 8821 − 5000) = P (−1744 < X − µ < 3821) −1721 X −µ 3821 < < =P 2000 σ 2000 −1721 3821 =P <Z< 2000 2000 3821 −1721 =Φ −Φ = 0.7804 2000 2000 (f) The zombie outbreak will cost the city $1,000,000 plus an additional $17,000 per zombie. What is the expected cost of the zombie outbreak? What is the standard deviation of the cost of the zombie outbreak? 7 The idea here is that this is a linear function of X, so the usual tricks apply. E(Y ) = E(1000000 + 17000X) = 1000000 + 17000E(X) = 8.6 × 107 V ar(Y ) = V ar(1000000 + 17000X) = 170002 V ar(X) = 1.156 × 1015 q V ar(Y ) = σY2 ; σY = σY2 = 3.4 × 107 (g) In the file http://gzt.public.iastate.edu/stat401/data/normalzombie. txt, I have simulated the draws from this distribution. In JMP, load this data set and calculate the mean and variance. Plot a histogram. Make a normal quantile plot. There is something wrong with this data, what is it? Look at the histogram or a scatter plot to see. 8 Sorry once again for using something besides JMP. To do the same from JMP, look under Analyze > Distribution. The normal quantile plot looks fine - it’s mostly along a straight line. The JMP output is more helpful, actually. However, looking at the histogram, there is an obvious problem: there are data points less than 0. This is bad! You can have 0 zombies, but you can’t have less than 0 zombies. So this is a bad simulation. 2 References • Mathematical Modeling of an Outbreak of Zombie Infection • STAT 401 Page 9