251y0241 12/10/02 ECO251 QBA1 FINAL EXAM DECEMBER 10, 2002 Name KEY Class ________________ Part I. Do all the Following (16 Points) Make Diagrams! Show your work! x ~ N 5, 6 . 3.50 5 3.5 5 z P 1.42 z 0.25 P 3.50 x 3.50 P 6 6 P 1.42 z 0 P 0.25 z 0 .4222 .0987 .3235 24 5 6 5 z P0.17 z 3.17 2. P6 x 24 P 6 6 P0 z 3.17 P0 z 0.17 .4992 .0675 .4317 6.10 5 4.10 5 z P 0.15 z 0.18 3. P4.10 x 6.10 P 6 6 P 0.15 z 0 P0 z 0.18 .0596 .0714 .1310 2.83 5 P z 0.36 4. F 2.83 (Cumulative Probability) Px 2.83 P z 6 Pz 0 P 0.36 z 0 .5 .1406 .3594 6.01 5 P z 0.17 5. Px 6.01 P z 6 Pz 0 P0 z 0.17 .5 .0675 .4325 6. x.43 (Find z .43 first). (3 points) We want a point x.43 , so that Px x .43 .43 . Make a diagram for z , showing zero in the middle, 50% below zero, and the area above zero divided into 7% between z .43 and zero and 43% above z .43 . From the diagram, P0 z z.43 .0700. . The closest we can come is P0 z 0.18 .0714 . So z.43 0.18 , and x z.43 5 0.186 5 1.08 , or 6.08 . 1. 6.08 5 Px 6.08 P z Pz 0.18 6 Pz 0 P0 z 0.18 .5 .0714 .4286 .43 To check this note that 7. A symmetrical region around the mean with a probability of 43%. (3 points) We want two points and x .715 x.285 , so that Px.715 x x 285 .4300 . Make a diagram for z , showing zero in the middle, an area in the middle of 43%, split in two by zero so that 21.5% is above zero and 21.5% is above zero. Since .5 - .215 = .285, the points we want are z.285 and z.285 . From the diagram, P0 z z.285 .2150 . The closest we can come is P0 z 0.57 .2157 . So use z.285 0.57 , and x z.285 5 0.576 5 3.42 , 8.42 5 1.58 5 z or 1.58 to 8.42. To check this note that P1.58 x 8.42 P 6 6 P 0.57 z 0.57 2P0 z 0.57 2.2157 .4314 .43. 1 251y0241 12/10/02 II. (10 points-2 point penalty for not trying part a .) Show your work! We are investigating the reliability of a machine that fills 16ounce bottles. A sample of six is taken with the following results (Assume that it is correct to do a confidence interval from such a small sample): Be careful! There is a lot of chance for rounding error in this problem! 16.06 16.08 15.95 16.04 16.07 16.18 a. Compute the sample standard deviation, s , for the number of ounces. Show your work. (4) b. Compute a 99% Confidence interval for the mean. (6) c. (extra credit) Can you say that the mean that you got is significantly different from 16? You must give a reason for your answer to be read. (2) Solution: x a. 1 2 3 4 5 6 16.06 16.08 15.95 16.04 16.07 16.18 96.38 x2 So x 257.924 258.566 254.402 257.282 258.245 261.792 1548.211 x 96.38 , x 2 n 6 2 2 x nx 1548.211 616.06332 s x2 n 1 5 0.033359 0.00667 5 s x 0.00667 0.0817 . 1548.211 and n 6. b. sx sx n x 96.38 16.0633 0.00667 0.0333. The degrees of freedom are n 1 6 1 5 . 6 .01 5 .005 . From the t-table, tn2 1 t.005 4.032 . But what compelled some of you to decide 2 2 that 0.0817 was s in a) but in b)? n 1 Putting this all together x t s x 16.06 4.0320.0333 16.06 0.13 or 15.93 to 2 16.19. More formally, P15.93 16.19 .99 . c. Since 16 is included in this interval, we cannot say that the mean is significantly different from 16. Note that, because of the small numbers involved, Minitab gets a somewhat different solution, but the mean is still not significantly different from 16. MTB > TInterval 99.0 c1. Confidence Intervals Variable N Mean C1 6 16.0633 StDev 0.0739 SE Mean 0.0302 99.0 % C.I. ( 15.9416, 16.1850) 2 251y0241 12/10/02 III. Do at least 4 of the following 6 Problems (at least 12 each) (or do sections adding to at least 48 points Anything extra you do helps, and grades wrap around) . Show your work! Please indicate clearly what sections of the problem you are answering! If you are following a rule like E ax aEx please state it! If you are using a formula, state it! If you answer a 'yes' or 'no' question, explain why! If you are using the Poisson or Binomial table, state things like n , p or the mean. Avoid crossing out answers that you think are inappropriate - you might get partial credit. Choose the problems that you do carefully – most of us are unlikely to be able to do more than half of the entire possible credit in this section!) 1. 25 employees are randomly drawn from a large corporation and their sick days checked. Note that the sample size is 25 throughout this problem! a. The sample mean is 5.3. Assume that the population standard deviation is known to be 1.6 and create a 90% confidence interval for the average number of sick days. (4) b. Assume that the facts in part a are correct but that the corporation only has 245 employees. Create a 90% confidence interval for the average number of sick days again, highlighting what has changed. (4) c. Assume that the sample mean is 5.3 and the corporation has 245 employees, but that 1.6 is a sample standard deviation. Create a 90% confidence interval for the average number of sick days again, highlighting what has changed. (4) d. The sample mean is 5.3. Assume that the population standard deviation is known to be 1.6 and create a 91% confidence interval for the average number of sick days. (3) e. Explain the effect of the following on the size of a confidence interval (keep the reasons brief) (3): (i) A larger sample size. (ii) A larger population size. (iii) A higher significance level. Solution: Note that the sample size is 25 throughout this problem. Since the confidence level is 1 .90, the significance level is .10 Repeat after me! " z goes with (sigma - population variance); t goes with s (sample variance)!" Most of you seem to have been so hypnotized by the old exams that you had that you never noticed that most of this question concerned population variances. 1.6, x x 1.6 0.32 . From the t-table z 2 z.05 1.645 . 25 n x z 2 x 5.3 1.6450.32 5.3 0.53 or 4.77 to 5.83. More formally a) P4.77 5.83 90% . b) The population size, N 245 , is less than 20 times the sample size n 25 , so N n 1.6 245 25 x x 0.32 0.90164 0.304 . From the t-table 25 245 1 n N 1 z 2 z.05 1.645 . x z x 5.3 1.6450.304 5.3 0.50 or 4.80 to 5.80. More formally 2 P4.80 5.80 90% . 3 251y0241 12/10/02 n 1 25 1 24 . Since we are given s rather than , we t.05 1.711 . c) s 1.6, The degrees of freedom are must use n 1 t. From the t-table, t 2 The population size, 24 N 245 , is less than 20 times the sample size n 25 , so s x sx n N n N 1 245 25 0.32 0.90164 0.304 . Putting this all together 245 1 1.6 25 x tn1 s x 5.3 1.7110.304 5.3 0.52 or 4.78 to 5.82. More formally, 2 P4.78 5.82 .90 . x 1.6 d) x 0.32 . . Since the confidence level is 1 .91, the significance level is 25 n .09 . We need z 2 z.045. Make a diagram for z . It should show 50% above zero, divided into 4.5% above z.045 and 50% - 4.5% = 45.5% below z.045 . So P0 z z.045 .4550 P0 z 1.70 .4554 . x z 2 x 5.3 1.700.32 5.3 0.54 or 4.76 to 5.84. More formally From the Normal table, the closest we can come is P4.76 5.84 91% . It’s remarkable how many of you could find something like z .045 on page one but not here. e) The basic formula for a confidence interval is x z x 2 or x tn1 s x (i) A larger sample size. If we use the formula x tn1 s x and s x 2 sx , a larger sample size will n 2 make the standard error smaller because the sample size is in the denominator and will also make t smaller, as we can see from the t table. (ii) A larger population size. The formula for the standard error is s x size grows, sx n n 1 N n . As the population N 1 N n approaches one from below, making the confidence interval larger. N 1 (iii) A higher significance level. If we look at the t table, we see the value of t gets smaller as the significance level gets larger. So a higher significance level makes the interval smaller. 4 251y0241 12/10/02 2. A sample of 5 is taken from a large Normal population with a mean of 3 and a standard deviation of 11. a. What is the probability that an individual item x in the sample lies between 2 and 4? (2) b. What is the probability that the sample mean x lies between 2 and 4? (2) c. What is the probability that at least one of the 5 measurements lies between 2 and 4? (2) d. What is the probability that all the 5 measurements lie between 2 and 4? (2) e. Do b) again assuming that the sample of 5 is taken from a population of 200 (1) f. Find the 95th percentile of the distribution of x (2) g. Find the 95th percentile of the distribution of x (1) Solution: x ~ N , N 3,11 . Since x x ~ N , x N 3,4.9193 , n 6. x n 11 5 4.9193, 4 3 2 3 z P 0.09 z 0.09 P2 x 4 P 11 11 2P0 z 0.09 2.0359 .0718 43 23 z P 0.20 z 0.20 b) P2 x 4 P 4.9193 4.9193 2P0 z 0.20 2.0793 .1586 c) This problem and the next problem are Binomial problems with n 5, p .0718 and q 1 .0718 .9282 . Remember Px C xn p x q n x . a) Px 1 1 P0 1 C05 .9282 1 .68898 .31102 5 d) Px 5 C55 .0718 .00000191. 5 e) Because the population size is more than 20 times the sample size, the answer is essentially the same as b). f) Since the 95th percentile of z is z.05 1.645 , the 95th percentile of x is x z.05 3 1.64511 21.095 g) Since the 95th percentile of z is z.05 1.645 , the 95th percentile of x is x z.05 x 3 1.6454.9193 3 8.09 11.09 5 251y0241 12/10/02 3. An eight-sided die is rolled five times. It has sides numbered one through 8, all of which are equally likely. A one two or three wins $5.00, so that the probability of winning on each roll is 3/8. Let x represent the number of wins and y represent the amount won. Find the following: a. The complete distribution of x and y (6) – note that you do not have to do this part to answer the remainder of the question. b. The mean and variance of x . (2) c. The mean and variance of y .(2) d. The chance of at least one win (2 points if you did not do a) (1) e. Assume that the die is rolled 40 times, find the probability of 20 or more wins. (Note that your binomial tables won’t help and that if you try to do this without using a table you will be here until Christmas) (i) Can this problem be done using the Poisson distribution? – Why? (1) (ii) Can this problem be done using the Normal distribution? – Why? (1) (iii) Decide which of the two distributions is correct and answer the question. (2 or 2.5) Solution: This problem is a Binomial problems with n 5, Px C xn p x q n x C x5 83 p 3 and q 1 3 5 . Remember 8 8 8 x 5 5 x . 8 The distribution is as follows: x y 0 0 1 5 2 10 3 15 4 20 5 25 C05 38 8 5 58 5 .09537 1 4 C15 3 8 5 8 5.057220 .28610 2 3 C 25 3 8 5 8 10.034332 .34332 3 2 C35 38 5 8 10.020599 .20599 4 1 C 45 3 8 5 8 5.012360 .06180 5 0 C55 38 5 8 .00741 0 5 There is a minimal rounding error since these add to .99999. b. The mean is np 538 1.875 and the variance is 2 npq 1.8755 8 1.171875 Eax aEx and Varax a 2Varx we get E5x 5Ex 51.875 9.375 and Var5x 5 2 Varx 251.171875 29.2969 d. Px 1 1 P0 1 .09537 .90463 . e. (i) n 40 , p 3 8 . If we test to see if the Poisson Distribution can be used, we find n 40 106.67. Since this is not above 500, we cannot use the Poisson Distribution. p 38 (ii) np 403 8 15 5. nq 40 5 8 40 15 25 5. Since these are both above 5, we can 2 use the Normal distribution with np 403 8 15 and npq 155 8 9.375. c. The prize is y 5 x so, using the formulas (iii) With the continuity correction 19.5 15 Pz 1.47 .5 .4292 .0708 Px 19.5 P z 9.375 6 251y0241 12/10/02 4. a. I buy two mainframes from a computer manufacturer. One computer has a 21% chance of breaking down during the first year, the second has a 25% chance of breaking down during the same period. Let A represent the event that the first computer breaks down in the first year and B represent the event that the second computer breaks down. Tell how you will note the complement of these events and assume that the events are independent. Find the following, noting if each is an event like A B, A B, or similar events involving the complement. Assume A and B are independent (i) The probability that both computers break down in the first year . (2) (ii) The probability that one computer breaks down in the first year . (2) (iii) The probability that no computer breaks down in the first year . (2) (iv) If a breakdown costs you $1000 (and no machine will break down more than once in a year), what is the mean and variance of your breakdown costs? (4) b. I use an average of 2 boxes of paper in a day , but it takes 5 days to get a delivery. Given the average usage over 5 days , do the following: (i) What is the probability of using at least 5 boxes in the 5 day period? (1) (ii) What is the probability of using more than 5 boxes? (1) (iii) If x is the number of boxes used, what is P 3 x 7 ? (2) (iv) If I want to keep the probability of running out of paper below 5%, what is the minimum number of boxes I should have on hand when I reorder? (Why?) (2) Solution: A is the complement of A. A B PAPB .21.25 .0525 (ii) PA B PA B P APB PA PB .21.75 .79.25 .1575 .1975 .3550 . (This is P A B P A B ) (iii) PA B PA PB .79.75 .5925 . These probabilities add to one. a. (i) P (iv) If x represents the number of breakdowns, y 1000 x is the cost. x xPx Px 0 1 2 sum .5925 .3550 .0525 1.0000 0 .3550 .1050 .4600 x 2 P x 0 .3550 .2100 .5650 x Ex x Px .46 x2 Varx Ex 2 x2 .565 .462 .3534 Eax aEx and Varax a 2Varx , so E1000x 1000Ex 1000.46 460 and Var1000 x 1000 2 Varx 353400. b. (i) Poisson distribution 25 10. Px 5 1 Px 4 1 .02925 .97075 (ii) Px 5 1 Px 5 1 .06709 .94291 (iii) P3 x 7 Px 7 Px 2 .22022 .00277 .21745 (iv) You must have 15 boxes. Px 15 .95126 .This means Px 15 is below 5%. 7 251y0241 12/10/02 5. (Lee et. al.) The following joint probability table measures the satisfaction that customers have with local food stores. x is the satisfaction level, with 4 representing the highest value. y represents the number of years that the consumer has lived in the area, with 2 representing long term residents. x 1 2 3 4 Total 1 .04 .14 .23 .07 .48 2 .07 .17 .23 .05 .52 Total .11 .31 .46 .12 a. Are x and y independent? Why? (2) b. How do we know that this is a valid distribution? (1) c. Compute the mean and standard deviation of y (2) d. Compute the covariance between x and y and interpret it. Does this mean that long term residents are more satisfied? (3.5) e. Compute the correlation. What does this tell us that we couldn’t learn from the covariance? (2.5) f. Remember that these are joint probabilities. Find the conditional probability that a long-term resident is y very satisfied Px 4 y 2 . (2) g. Use the same method to find the complete conditional probability of x for long term residents, show that this is a valid distribution and compute the conditional mean for long-term residents. (5.5) h. If you are ready for some real thinking, find the conditional mean for short-term residents and use it to compare the satisfaction levels for the two kinds of residents. (To do this correctly we would need variances as well – have a nice break and we’ll worry about that next semester.) Solution: a. x and y are not independent. For example Px 1 y 1 .04 Px 1 P y 1 .11.48 .0528 b. The probabilities add to one and are not negative or above 1. y c. P y yP y 1 2 sum .48 .52 1.00 y 2 P y .48 1.04 1.52 0.48 2.08 2.56 y E y yP y 1.52 y2 E y 2 y2 2.56 1.522 0.2496 y .2496 .4996. x d. y 1 2 Px xPx x 2 P x P y yP y y 2 P y 0.48 1.04 1.52 0.48 2.08 2.56 1 .04 .07 .11 2 .14 .17 .31 3 .23 .23 .46 4 .07 .05 .12 .48 .52 1.00 .11 .62 1.38 .48 2.59 .11 1.24 4.14 1.92 7.41 Px 1, Ex xPx 2.59 , Ex x Px 7.41, P y 1, E y yP y 1.52 and Ey y P y 2.56 2 To summarize 2 x 2 2 y E xy = .04(1)(1) +.14(2)(1) +.23(3)(1) +.07(4)(1) +.07(1)(2) 0.04 + 0.14 +.17(2)(2) + 0.28 + 0.68 +.23(3)(2) + 0.69 + 1.38 +.05(4)(2) + 0.28 + 0.40 =3.89 251y0241 12/10/02 8 xy Covxy E xy x y 3.89 2.591.52 0.0468 . Negative, so x and y move oppositely. This means that long term residents are less satisfied. e. x2 Ex 2 x2 7.41 2.592 0.7019 and E y 2.56 1.52 0.2496 xy 0.0468 .0468 .111812 So that xy x y .7019 .2496 0.83780.4996 2 y 2 2 2 y Since the square of the correlation is between .01 and .02, on a zero-one scale it is quite weak. f. By the multiplication rule P x 4 y 2 P x 4 y 2 .05 .0962 P y 2 .52 g, h. If we divide the whole top row by .48, and the bottom row by .52 we get y 1 2 1 .083 .135 2 .292 .326 x 3 .479 .442 4 .146 .096 Total 1.000 0.999 For short-term residents .083(1) + .292(2) + .479(3) + .146(4) = .083 + .584 + 1.437 + .584 = 2.688 For long-term residents .135(1) + .326(2) + .442(3) + .096(4) = .135 + .652 + 1.326 + .384 = 2.497. Anyway long-term residents seem less satisfied. 9 251y0241 12/10/02 6. Answers to a-c can be left in factorial form. a. A bridge hand consists of 13 cards. A deck is a population of 52 cards, with a total of 4 kings. What is the probability of 2 kings? What is the distribution you are using? (3) b. What are the mean and variance of the number of kings in a hand? (2) c. Now comes the fun. Let’s say we take a load of decks – you do not need to know how many cards we have, and we deal you 13 cards, what is the probability of 2 kings now? (3) d. Are the mean and variance of the number of kings in your hand in c) the same as the values in b)? What changes and why? (1) e. (Keller, Warrack) The amount of gasoline sold by one of your service stations daily is uniformly distributed between a minimum of 2000 and a maximum of 5000 gallons. (i) What is the mean and standard deviation of sales? (1.5) (ii) What is the probability that sales on a given day will fall between 2500 and 3000 gallons? (2) (iii)What is the probability that sales will be over 4000 gallons? (1) (iv)What is the probability of sales between 1800 and 2800 gallons? (1) (v) If you did not do (iii) using cumulative distributions, do it now (2) (vi)If you own 10 identical service stations, what is the probability that at least one has sales over 4000 gallons? (3) Solution: a. There are 4 kings in a deck of 52. So this is a Hypergeometric distribution with M 4, N 52, n 13 and x 2 . P x M x N M n x N n C C C . So P2 4 2 48 11 52 13 C C C 4! 48! 2!2! 37!11! 52! 39!13! 4 3 48 47 46 45 44 43 42 41 40 39 38 43 39 38 2 1 1110 9 8 7 6 5 4 3 2 1 2 1 52 51 50 49 48 47 46 45 44 43 42 41 40 52 51 50 49 13 12 1110 9 8 7 6 5 4 3 2 1 13 12 639 3813 12 .21349 52 51 50 49 4 1 4 .07692 np 13 1 , 52 13 52 N n 52 13 4 48 39 4 48 2 .7647113.07101 npq 13 13 N 1 52 1 52 52 51 52 52 b. p .70588 so 0.70588 0.84017 . c. Binomial P x C p q n x x n x 13! 1 . So P2 C p q 2!11! 13 13 2 2 11 2 12 13 11 13 12 1 12 78.00591716.4145881 .19135 2 1 13 13 2 11 d. Because the population is now infinite, we remove the .76471 finite population adjustment from the variance formula. The mean is unchanged. 10 251y0241 12/10/02 e. (Keller, Warrack) The amount of gasoline sold by one of your service stations daily is uniformly distributed between a minimum of 2000 and a maximum of 5000 gallons. (i) What is the mean and standard deviation of sales? (1.5) (ii) What is the probability that sales on a given day will fall between 2500 and 3000 gallons? (2) (iii)What is the probability that sales will be over 4000 gallons? (1) (iv)What is the probability of sales between 1800 and 2800 gallons? (1) (v) If you did not do (iii) using cumulative distributions, do it now (2) (vi)If you own 10 identical service stations, what is the probability that at least one has sales over 4000 gallons? (3) c 2000 and d 5000 . 1 1 1 d c 5000 2000 3000 2 2 d c 5000 2000 c d 2000 5000 2 750000 3500 (i) , 12 12 2 2 so 750000 866.025 This is a continuous uniform distribution with (ii) P 2500 x 3000 500 .1667 3000 5000 4000 .3333 (iii) P x 4000 3000 2800 2000 .2667 (iv) P1800 x 2800 3000 xc for c x d , F x 0 for x c and F x 1 for x c and that (v) Recall that F x d c 4000 2000 2 1 .3333 . F 500 means Px 500 . Px 4000 1 F 4000 1 3000 3 1 2 n x n x (vi) This is your basic Binomial problem P x C x p q . p , q and n 10. 3 3 0 10 1 2 P x 1 1 P0 1 C 1 .01734 .9827 3 3 3 0 11 251y0241 12/10/02 6. (ctd) f. (Keller, Warrack) (Extra credit) The number of hours an alkaline battery lasts is exponentially distributed with a parameter c of 0.05. (i) What are the mean and standard deviation of a battery’s life? (2) (ii) What is the probability that the battery will last between 10 and 15 hours? (2) (iii)What is the probability that it will last for more than 20 hours (1) (iv)Are you surprised that a jorcillator has two such batteries and that it works as long as one of the batteries works. What is the probability that the jorcillator lasts more than 20 hours? (2) (v) (Difficult) What is the probability that the jorcillator lasts between 10 and 15 hours? (The answer to this will not be published!) (7) g. The Muggle detector in front of my tower will identify a Muggle correctly as a Muggle 98% of the time and will identify a Wizard correctly as a Wizard 94% of the time. Assume that 4% of the population are Wizards and the rest Muggles. The Muggle detector says that you are Wizard. What is the probability that you really are a Wizard? I only admit Wizards to my tower. (Hint: To do this you need decent notation or a good tree – Let Y (yes!) be the event that it says you are a Wizard and N (no!) be the event that it says you are a Muggle. W is the event that you really are a Wizard and M is the event that you are a Muggle. You need conditional probability to do this.) (6) Solution: f) You were told in advance that this section would use the exponential distribution. In the exponential distribution from the outline 1 c Note that this is only for F x 1 e cx , when the mean time to a success is 1 . c x 0 . There is no probability below zero. 1 1 20 . (ii) F 10 Px 10 1 e .0510 1 e 0.5 1 .60653 c .05 F 15 Px 15 1 e .0515 1 e .75 1 .47237 . So P10 x 15 Px 15 Px 10 .60653 .47237 .13416 .05 20 (iii) Px 20 1 F 20 1 1 e e 1 .36788 (i) (iv) The probability that one component lasts less than 20 hours is 1-.36788 = .63212. The probability that both fail before 20 hours is .63212 .39958 . The complement of this is .60042. (v) Hah! 2 g. Not a hard problem. You are given PN M .98 , PY W .94 , PM .96 and PW .04 . This implies that PY M .02 and PN W .06 You are asked for PW Y . By Bayes’ Rule PW Y PY W PW . PY PY PY W PW PY M PM .94.04 .02.96 .0376 .0192 .0568 So PW Y PY W PW PY .0376 .6620 .0568 Another way to do this is to say that of 10000 people who come to my tower, 9600 will be Muggles and 400 will be Wizards. Of the Muggles, 98% or 9408 will be identified as Muggles. The remaining 2% or 192 will be wrongly identified as Wizards. Of the 400 Wizards, 94% or 376 will be correctly identified. So 192 + 376 = 568 are identified as Wizards. Of the 568, 376 or 66.2% actually are Wizards. 12