251y0741 5/20/07 ECO 251 QBA1 FINAL EXAM, Version 1 MAY 7 and 10, 2007 Name KEY Class ________________ Part I. Do all the Following (14 Points) Make Diagrams! Show your work! Illegible and poorly presented sections will be penalized. Exam is normed on 75 points. There are actually 123+ possible points. If you haven’t done it lately, take a fast look at ECO 251 - Things That You Should Never Do on a Statistics Exam (or Anywhere Else). x ~ N 10, 7.9 Material in italics below is a description of the diagrams you were asked to make or a general explanation and will not be part of your written solution. The x and z diagrams should look similar. If you know what you are doing, you only need one diagram for each problem. General comment - I can't give you much credit for an answer with a negative probability or a probability above one, because there is no such thing!!! In all these problems we must x find values of z corresponding to our values of x before we do anything else. A diagram for z will help us because, if areas on both sides of zero are shaded, we will add probabilities, while, if an area on one side of zero is shaded and it does not begin at zero, we will subtract probabilities. Note: All the graphs shown here are missing a vertical line. They are also to scale. A hand drawn graph should exaggerate the distances of the points from the mean. Note also that, because of the rounding error necessary to use a conventional normal table, the results for x will disagree with results for z, especially for small values of z. There may also be discrepancies of .0001 between the computer generated z results and those taken from the Normal table due to the fact that Minitab carries its probabilities as more than 5 places. 14 10 Pz 0.51 .5 .1950 .3050 1. Px 14 P z 7.9 For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line! Shade the entire area above 0.51. Because this is entirely on one side of zero, we must subtract the area between zero and 0.51 from the entire area above zero. If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 10. Indicate the mean by a vertical line! Shade the area above 14. This area is entirely to the right of the mean (10), so we subtract the area between the mean and 14 from the half of the distribution that is above the mean. 1 251y0741 5/20/07 9 10 0 10 z P 1.27 z 0.13 P 1.27 z 0 P 0.13 z 0 2. P0 x 9 P 7 . 9 7.9 = .3980 - .0517 = .3463 For z make a diagram. Draw a Normal curve with a mean at 0. Indicate zero by a vertical line! Shade the entire area between -1.27 and -0.13. Because this is on one side of zero, we must subtract the area between -0.13 and zero from the larger area from -1.27 to zero. If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 10. Indicate the mean by a vertical line! Shade the area from 0 to 9. This area is on one side of the mean (10), so we subtract the area between 9 and the mean from the larger area between 0 and the mean. 2 10 Pz 1.01 Pz 0 P1.01 z 0 3. F 2.00 (Cumulative Probability) P z 7.9 .5 .3438 .1562 A cumulative distribution is represented by the entire area below a point. For z make a diagram. Draw a Normal curve with a mean at 0. Indicate zero by a vertical line! Shade the entire area below -1.01. Because this is on one side of zero, we must subtract the area between -1.02 and zero from the entire area below zero. If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 10. Indicate the mean by a vertical line! Shade the entire area below 2. This area is on one side of the mean (10), so we subtract the area between 2 and the mean from the larger area below the mean. Because the Normal distribution is symmetrical around the mean and the entire area under a Normal curve is 1, the area below the mean must be .5. 2 251y0741 5/20/07 36 10 2 10 z P 1.52 z 3.29 P1.52 z 0 P0 z 3.29 4. P2 x 36 P 7 . 9 7.9 .4357 .4995 .9352 Note that P0 z 3.29 must be read from the area below the main table. For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line! Shade the entire area between -1.52 and 3.29. Because this is on both sides of zero, we must add the area between -1.52 and zero to the area from zero to 3.29. If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 10. Indicate the mean by a vertical line! Shade the area from -2 to 36. This area is on both sides of the mean (13), so we add the area between -2 and the mean to the area between the mean and 36. 10 10 10 10 z P 2.53 z 0 .4943 5. P10 x 10 P 7 . 9 7.9 For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line! Shade the area between -2.53 and zero. Because this area begins at zero, we may read the probability directly from the standard Normal distribution table. If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 10. Indicate the mean by a vertical line! Shade the area from 10 to 10. This area starts at the mean (10), so we will not need to add or subtract areas. 6. x.34 Solution: z .34 is, by definition, the value of z with a probability of 34% above it. Make a diagram. The diagram for z will show an area with a probability of 100% - 34% = 66% below z .14 . It is split by a vertical line at zero into two areas. The lower one has a probability of 50% and the upper one a probability of 66% - 50% = 16%. The upper tail of the distribution above z .34 has a probability of 34%, so that the entire area below z .34 adds to 50% + 16% = 66%. From the diagram, we want one point z .34 so that Pz z .34 .66 or 3 251y0741 5/20/07 P0 z z.34 .1600 . If we try to find this point on the Normal table, the closest we can come is P0 z 0.41 .1591 . So we will use z.34 0.41 , though 0.42 might be acceptable. Since x ~ N 10, 7.9 , if we bother with a diagram for x.34 , it would show 66% probability below x.34 split in two regions on either side of the mean (10) with probabilities of 50% below 10 and 16% above 10 and below x.34 , and with 34% above x.34 . We already know that z.34 0.41 , so the value of x can then be written x.34 z.34 10 0.417.9 10 3.239 13.239 . 13 .239 10 Check: Px 13.239 P z Pz 0.41 Pz 0 P0 z 0.41 .5 .1591 .3409 34% 7.9 7. A symmetrical region around the mean with a probability of 32%. Solution: This is something of a gift since you can reuse z.34 0.41 . But let’s assume that you don’t know that. Make a diagram. The diagram for z will show a central area with a probability of 32%. It is split in two by a vertical line at zero into two areas with probabilities of 16%. The tails of the distribution each have a probability of 50% - 16% = 34%. From the diagram, we want two points z .34 and z.66 z.34 so that Pz.66 z z.34 .66 .34 .3200 . The upper point, z .34 will have P0 z z .34 32 % .1600 , and by 2 symmetry z.66 z.34 . From the interior of the Normal table the closest we can come to .1600 is P0 z 0.41 .1591 , which is slightly too low. The next best point would be 0.42 since P0 z 0.42 .1628 , but 0.41 is closer so we can say z.34 0.41 , and our 32% symmetrical interval for z is -0.41 to 0.41. Since x ~ N 10, 7.9 , the diagram for x (if we bother) will show 32% probability split in two 16% areas on either side of 10, with 16% above 10 and 16% below 10. The interval for x can then be written x z .34 10 0.417.9 10 3.239 or 6.761 to 13.239. 4 251y0741 5/20/07 13 .239 10 6.761 10 z To check this: P6.761 x 13.239 P P 0.41 z 0.41 7 . 9 7.9 2P0 z 0.41 2.1592 .3182 32% 5 251y0741 5/20/07 II. (10 points+, 2 point penalty for not trying parts a) and b) Show your work! Mark individual sections clearly. Answers without work and/or reasons cannot be accepted! (Webster) A local bar sells ’16 oz’ glasses of beer. A group of students buys 22 glasses of beer and, using their own measuring cup, tries to estimate the mean contents. The measurements are below. (Webster) 13.5 15.3 13.8 15.4 14.2 15.4 14.3 15.6 14.6 15.6 14.6 15.8 14.6 15.8 22 Using Minitab, I have computed the following 14.9 15.9 15.0 16.1 15.1 16.7 15.3 16.9 20 x 334 .4 and i 1 x 2 4533 .84 . i 1 a) Note that the sum of squares is the sum of only the first 20 numbers. To show me that you know how to compute x 2 using the total that I have given to you. (1) a sum of squares, compute b) Compute the sample variance of the contents of the 22 glasses given above. (2) c) Compute a 99% confidence interval for the mean contents (3) d) Is the mean significantly different from 16 oz.? Why? (1) e) Because of all the hoopla around the students’ experiment only 100 glasses of beer were sold that night. Assuming that 100 is the population size from which the sample of 22 was taken, recompute the confidence interval in c). (2) [9] f) Assume that the costs to the bar owner in drawing a glass of beer are $0.15 per glass plus $0.015 per oz, use the mean and variance that you found in b) to compute the mean and variance of the costs to the bar owner. There will be no credit in this section unless you use the mean and variance from b). (2.5) [11.5] Solution: The entire computation spreadsheet appears in Problem IIIA below. 22 a) x x 334 .4 15.20 s 0.7362 0.8580 20 x2 i 1 x 2 2 2 x 21 x 22 4533 .84 278 .89 285 .61 5098 .34 n i 1 b) s x2 x 2 nx 2 n 1 5098 .34 2215 .20 2 15 .46 0.7362 21 21 Of course, many people tried to use the definitional formula s x2 x x x x x n 1 22 2 . But since they had forgotten what 2 the formula meant, they computed computed x 2 n 1 instead. This is doubly self-defeating, since they had already . c) Because we do not know the population standard deviation of x, we must use t . Since our confidence level is 99%, the significance level is 1%. Since our sample size is n 22 , we have n 1 21 degrees of freedom. According to Table 18, t n1 t 21 2.831 . We also need the standard error of the mean sx s n to 15.72. 2 .005 0.7362 0.1829 . The interval is thus x t n1 s x 15.20 2.8310.1829 15.20 0.52 or 14.68 2 22 d) Since 16 is not on the confidence interval we can say with 99% confidence that our calculated mean is significantly different from 16. 6 251y0741 5/20/07 e) If we must assume that the population size is 100, our standard error becomes N n s 100 22 0.7362 0.787879 0.1829 0.887625 0.1829 0.1623 . This means that the N 1 n 100 1 22 interval is x t n1 s x 15.20 2.8310.1623 15.20 0.46 or 14.74 to 15.64. sx 2 f) The formula relating cost to ounces is w 0.015 x 0.15 . We know x 15 .20 and s x2 0.7362 . Since w ax b , and E w aEx and Varw a 2Varx apply to sample data, w 0.015 15.2 0.15 0.378 . s w2 0.015 2 .7362 .000166 . 7 251y0741 5/20/07 III. Do at least 5 of the following 6 sections (at least 12 each) (or do items adding to at least 48 points - Anything extra you do helps, and grades wrap around) . Show your work! Please indicate clearly what sections of the problem you are answering! If you are following a rule like E ax aEx please state it! If you are using a formula, state it! If you answer a 'yes' or 'no' question, explain why! If you are using the Poisson or Binomial table, state things like n , p or the mean. Avoid crossing out answers that you think are inappropriate - you might get partial credit. Choose the problems that you do carefully – most of us are unlikely to be able to do more than half of the entire possible credit in this section!) This is not an opinion questionnaire. Answers without reasons or supporting calculations or table references will not be accepted (except in multiple choice questions)!!!! Answers that are hard to follow will be penalized. Note that some sections extend over more than one page. A. Remember the problem on the previous page. The data is repeated here. x is volume in ounces. y is simply the order in which the beers were drawn. The students believe that the bartender got more generous as the evening x 2 from x and passed. To test this they computed the correlation of the volume with the order. You have the previous problem. It is quite easy to compute Row 1 2 3 4 5 6 7 8 9 10 11 x 13.5 13.8 14.2 14.3 14.6 14.6 14.6 14.9 15.0 15.1 15.3 y 253 y and Row 12 13 14 15 16 17 18 19 20 21 22 1 2 3 4 5 6 7 8 9 10 11 y 2 20 3795 and Minitab says xy 3237 .3 i 1 x 15.3 15.4 15.4 15.6 15.6 15.8 15.8 15.9 16.1 16.7 16.9 y 12 13 14 15 16 17 18 19 20 21 22 1) Note that the sum xy is the sum of only the first 20 numbers. To show me that you know how to compute a sum of this type, compute xy using the total that I have given to you. (1) 2). Compute the sample covariance s xy between x and y . (2) Explain what this covariance tells us. (0.5) 3) Compute the sample correlation rxy between x and y . (2) Explain what this correlation tells us about the relationship between the size of the beer and the order in which it was drawn. (1) [6.5] 4) Using the correlation and covariance you computed in b) and c) and the equation relating the size of the beer and its cost that you used on the last page, find the covariance and correlation between cost of the beer to the bar owner and the order in which it was drawn. (4) [10.5] 5) The true test of a correlation is a significance test which not only looks at the size of the correlation, but the size r r of the sample. To do such a test compute t . On the assumption that there is no significant sr 1 r 2 n2 n 2) n 2) correlation and using a 99% confidence level, 99% of the values of this t-ratio will fall between t .(005 and t .(005 . If your value of the t ratio does not fall between these two values, you can say that there is significant correlation. Find the appropriate values of t, compute the t-ratio and tell me if the correlation is significant. (5) [15.5] 22 1) You have already found i 1 20 x2 x 2 4533 .84 278 .89 285 .61 5098 .34 , x 15 .20 and i 1 8 251y0741 5/20/07 s x2 0.7362 s x 0.7362 0.8580 . You have been given y 2 3795 . You also have 22 Now you need x 334 .4 and x 2 y 253 (which implies that y 253 11 .50 ) and 22 5098 .34 . 20 xy 3237 .3 350 .7 371 .8 3959 .8 . The spreadsheet for computing these is at the end xy i 1 i 1 of this section. xy nxy 3959 .8 2215.211.5 114 .2 5.438095 2) s xy This implies that x and y move together. It is 21 21 n 1 impossible to judge the strength or significance of the relationship from the covariance. 3) We need a variance and/or a standard deviation for y. s 2y y 2 nx 2 n 1 3795 2211 .50 2 885 .50 42 .16667 ( s y 42.16667 6.493587 } 21 21 5.438095 2 0.952639 .9760 . Since the square of the 0.7362 42.16667 sx s y 0.7362 42 .16667 correlation is quite close to 1, this is quite strong. Note that 1 r 1 . A value not in this range is a fatal error. r s xy 5.438095 4) The formula relating cost to ounces is w 0.015 x 0.15 . We know that if w ax b and v cy d , s wv acs xy and rwv signacrxy . We can say that a 0.015 and c 1 (since v y 1y 0 ). So s wv 0.015 15.438095 0.8159 and rwv sign0.015 1.9760 .9760 . 5) t r sr r .9760 .9760 .952639 402 .99 20 .075 . The values of t that we .0023639 1 .952639 .002368 20 compare this to are t n2 t .20 005 2.845 . Obviously 20.075 is not between these two values, so we reject the 1 r n2 2 2 hypothesis that there is no significant correlation between x and y . The entire spreadsheet for computing the means, variances, covariances and correlations follows. You should have only done a very few numbers, as specified. Row 1 2 3 4 5 6 7 8 9 10 11 x y x2 y2 xy 13.5 13.8 14.2 14.3 14.6 14.6 14.6 14.9 15.0 15.1 15.3 1 2 3 4 5 6 7 8 9 10 11 182.25 190.44 201.64 204.49 213.16 213.16 213.16 222.01 225.00 228.01 234.09 1 4 9 16 25 36 49 64 81 100 121 13.5 27.6 42.6 57.2 73.0 87.6 102.2 119.2 135.0 151.0 168.3 2 Row x x xy y y2 12 15.3 12 234.09 144 183.6 13 15.4 13 237.16 169 200.2 14 15.4 14 237.16 196 215.6 15 15.6 15 243.36 225 234.0 16 15.6 16 243.36 256 249.6 17 15.8 17 249.64 289 268.6 18 15.8 18 249.64 324 284.4 19 15.9 19 252.81 361 302.1 20 16.1 20 259.21 400 322.0 21 16.7 21 278.89 441 350.7 22 16.9 22 285.61 484 371.8 Sum 334.4 253 5098.34 3795 3959.8 9 251y0741 5/20/07 B. Let us define the following events. A1 : y 3, A2 : y 5 , B1 : x 1 B 2 : x 2 . B1 B2 A1 A2 1) Create a joint probability table like the one at the left on the assumption that the four events are the only ones that can occur, P A1 .4 , PB1 .3 and that A1 and B1 are independent. (3) 2) Compute the variance of y and the covariance and correlation between x and y in 1). (3) 3) Repeat 1) on the assumption that A1 and B1 are mutually exclusive. (2) 4) Compute the variance of y and the covariance and correlation between x and y in 3). (3) 5) Use the addition rule to show that if P A1 .4 and PB1 .3 , these two events cannot be collectively exhaustive no matter what the relationship between these two events. (1) [12] Solution: 1) If P A1 .4 and PB1 .3 and these two events are independent, P A1 B1 P A1 PB1 B1 B2 .4 A .12 .28 . .4.3 .12 . If we fill in the table so that the probability of the events add to 1 we get 1 .6 A2 .18 .42 .30 .70 1.0 2) If we create a tableau as was done in Section K of the course, we get the following. x 3 5 Px xPx y 2 .28 .42 .70 1.40 1 .12 .18 .30 0.30 x 2 Px 0.30 x Ex 2 2 2.8 Px 3.1 , Var x E x P y yP y y 2 P y .4 1.2 3.6 .6 3.0 15 .0 To summarize 1.0 4.2 18 .6 1.1 3.1 P y 1 , y Ey Px 1 , x E x yP y 4.2 and E y y 2 2 xPx 1.1 , P y 18 .6 y2 Var y E y 2 y2 18.6 4.22 18.6 17.64 0.96 . Note that we do not need 3.1 1.12 3.1 1.21 1.89 . We also do not need to compute the covariance or correlation, since, if x and y are independent, the covariance and correlation are zero. But, if you don’t believe me, x2 E xy 2 2 x 13.12 23.28 0.36 1.68 4.62 . So xy Exy x y 25.42 0.90 4.20 xyPxy 15.18 4.62 1.14.2 4.62 4.62 0 . xy xy x y 0 x y 0. 10 251y0741 5/20/07 3) If P A1 .4 and PB1 .3 and these two events are mutually exclusive, P A1 B1 0 . If we fill in the table B1 B 2 .4 A 0 .4 so that the probability of the events add to 1 we get 1 . .6 A2 .3 .3 .30 .70 1.0 x y 4) 1 2 0 .4 .3 .3 .30 .70 0.30 1.40 3 5 Px xPx x 2 Px 0.30 x Ex 2 2 2.8 Px 3.1 , P y yP y y 2 P y .4 1.2 3.6 .6 3.0 15 .0 To summarize 1.0 4.2 18 .6 1.1 3.1 P y 1 , y Ey Px 1 , yP y 4.2 and E y y 2 Varx E x 3.1 1.1 3.1 1.21 1.89 130 23.40 0 2.4 E xy xyPxy 3.90 . So 15.30 25.30 1.5 3.0 x 2 E x xPx 1.1 , P y 18 .6 y2 Var y E y 2 y2 18.6 4.22 18.6 17.64 0.96 . 2 x 2 2 x 2 3.9 1.14.2 3.90 4.62 0.72 . xy xy x y 0.72 0.96 1.89 xy Exy x y 0.72 2 0.285714 0.5345 . 0.96 1.89 5) If the events A1 and B1 are collectively exhaustive, P A1 B1 1 . But the addition rule says P A1 B1 P A1 PB1 P A1 B1 . But this means P A1 PB1 P A1 B1 1 or P A1 PB1 1 P A1 B1 . Since P A1 B1 0 , it must be true that if the two events are collectively exhaustive P A1 PB1 1 . P A1 PB1 1 . So if P A1 .4 and PB1 .3 , since their probabilities do not add to 1 or more, they cannot be collectively exhaustive. 11 251y0741 5/20/07 C. Answer the following 6 multiple choice questions. (These should be 2 each, but to discourage guessing, how about 2.5 each for right answers and 0.5 penalty for wrong answers.) 1) Which of the following is a major difference between the binomial and the hypergeometric distributions? a) The sum of the outcomes can be greater than 1 for the hypergeometric. b) *The probability of a success changes in the hypergeometric distribution. c) The number of trials changes in the hypergeometric distribution d) The outcomes cannot be whole numbers in the hypergeometric distribution e) None of the above is correct. 2) The continuity correction factor is used when a) The sample size is at least 5. b) Both np and nq are at least 30. c) *A continuous distribution is used to approximate a discrete distribution d) A discrete distribution is used to approximate a continuous distribution e) A binomial distribution is used to approximate the hypergeometric distribution f) None of the above. 3) Suppose a population consisted of 20 items. How many different samples of n = 3 are possible? a) 20 b) 40 c) 120 d) 20 3 8000 e) *1140 f) 6840 g) None of the above is correct. C 320 20! 20 19 18 1140 17!3! 3 2 1 4) The finite population correction factor is used when a) * n/N is more than .05. b) N is more than 1000 c) np is greater than 5. d) n is more than 30. e) None of the above 12 251y0741 5/20/07 5) In the formula x z x , the 2 2 is a) The confidence level b) *The area of one tail of the sampling distribution of the mean c) The probability that the interval would not include the mean d) The proportion of confidence intervals that will contain the mean e) None of the above. 6) All Normal distributions have a) A finite range b) The same coefficient of variation c) The same probability density function f x d) *The same area between and 2 e) All of the above must be true. 2 P 1 z 2 . This z Explanation: P x 2 P computation does not depend on the value of the mean or standard deviation. 13 251y0741 5/20/07 D. 1) We are baking chocolate chip cookies again. Assume that all cookies are discarded that have less than 4 chips, but the mean is only 4.5, what proportion of cookies will be discarded? (1) 2) In view of the results of problem 1 and using only the distributions for which you have tables, what is the lowest value the mean can take if we want to be sure that no more than 1% of the cookies are discarded? 3) Find P1 x 15 for the following distributions. If you must substitute one distribution for another, show that the substitution criterion is satisfied. a) Poisson with parameter of 2.5 (1) b) Poisson with parameter of 36 (2) c) Binomial with n 35 and p .02 . (2) d) Binomial with n 35 and p .2 (2) e) Binomial with n 50 and p .2 (1) f) Binomial with n 50 and p .55 (2) [13] g) Continuous uniform c 8 and d 13 (1) 4) Find Px 1 for the following. a) Hypergeometric N 16 , p .25 , n 5 (2) b) Hypergeometric N 600 , p .25 , n 5 (2) c) Geometric p .25 (1) [19 – 58.5] Solution: 1) We are baking chocolate chip cookies again. Assume that all cookies are discarded that have less than 4 chips, but the mean is only 4.5, what proportion of cookies will be discarded? (1) For the Poisson distribution with a parameter of 4.5, Px 4 Px 3 .34230 . 2) In view of the results of problem 1 and using only the distributions for which you have tables, what is the lowest value the mean can take if we want to be sure that no more than 1% of the cookies are discarded? If we just wander through the tables, as the mean rises, the probability Px 3 falls. If the mean is 10, Px 3 = .01034. But if the mean is 10.5, Px 3 = .00715, which is below 1%. So the lowest value the mean can take is somewhere between 10 and 10.5. 3) Find P1 x 15 for the following distributions. If you must substitute one distribution for another, show that the substitution criterion is satisfied. The substitution table in ‘Great Distributions I Have Known’ is copied below. Replace Binomial With Poisson with m np Hypergeometric Binomial with p Binomial M N Normal with np , If n 500 p N 20 n np 5 and nq 5 npq Poisson Normal with m , if m 25 m Hypergeometric Normal p np , M , N np 5 and nq 5 , but think about Binomial if N 20 n N n npq N 1 There seems to be a fairly large minority that has decided that one can use the Normal distribution to approximate any distribution without justification. Wake up! 14 251y0741 5/20/07 Note that in all solutions below the continuity correction is used. No credit was lost for solutions that did not use it. Credit was added at my discretion when it was used. a) Poisson with parameter of 2.5 (1) P1 x 15 Px 15 Px 0 1 Px 0 1 .08208 .9178 For no good reason, as soon as they saw the word ‘Poisson,’ people started writing e m m x . This very rarely gets you any credit. If you don’t know how to use the x! tables, you shouldn’t be doing this section. b) Poisson with parameter of 36 (2). Since the mean is over 25, we can replace the Poisson distribution with the Normal. If we use the continuity correction, P1 x 15 Px 0.5 36 15 .5 36 PN 0.5 x 15 .5 P z P 5.92 z 3.41 36 36 .5 .4997 .0003 n 35 1750 500 , we can use the Poisson c) Binomial with n 35 and p .02 . (2). Since p .02 distribution with a mean of np .0235 0.7 . P1 x 15 Px 15 Px 0 1 .49659 .5034 . d) Binomial with n 35 and p .2 (2). Since the expected number of successes is np .235 7 5 and the expected number of failures is n np 35 7 28 5 , we can use the Normal distribution with standard deviation npq 7.8 5.60 15 .5 7 0.5 7 z 2.366 . P1 x 15 PN 0.5 x 15 .5 P 2 . 366 2.366 P2.74 z 3.60 .4969 .4998 .9967 e) Binomial with n 50 and p .2 (1) P1 x 15 Px 15 Px 0 = .96920 - .00001 =.9692 f) Binomial with n 50 and p .55 . (2) If the probability of failure is .45. 1 success corresponds to 49 failures and 15 successes corresponds to 35 failures. P35 x 49 Px 49 Px 34 = 1 - .99670 = .0033 g) Continuous uniform c 8 and d 13 . (1) Make a diagram. You will have a rectangle with a 1 1 1 . Since the entire area of the rectangle is between 1 and 15, height of d c 13 8 5 shade the whole bloody thing. The area of the rectangle is 1, so P1 x 15 = 1. 4) Find Px 1 for the following. C 4 C 12 a) Hypergeometric N 16 , p .25 , n 5 (2) M Np 4. 1 P0 1 0 165 C 5 12! 12 11 10 9 8 1 7!5! 12 11 10 9 8 1 1 5 4 3 2 1 1 1 .1813 .8187 16! 16 15 14 13 12 16 15 14 13 12 5 4 3 2 1 11!5! 15 251y0741 5/20/07 b) Hypergeometric N 600 , p .25 , n 5 . (2) Since the sample is much less than 5% of the population, use the binomial distribution with p .25 , n 5 . 1 P0 1 .23730 .7627 . The Normal distribution is not appropriate here because the mean is too small. c) Geometric p .25 (1). There are a number of ways to approach this like computing 1 P0 . However, the easiest way to do this is to note that if the first success must fall on a try after the first one, there must be a failure on the first try. Px 1 q .75 . [19 – 58.5] 16 251y0741 5/20/07 E. 1) Do the following confidence intervals. (Webster) Assume that a real estate developer wants to estimate the mean family income in an area where a mall is proposed. The developer assumes that the population standard deviation for income is $7200 a) If a survey of 100 families yields a sample mean of $35500, create a 90% confidence interval for the mean. (3) b) Find a confidence interval using the data in a) with a 32% confidence level. (2) c) Assume a 90% confidence level again and that the sample of 100 families comes from a neighborhood of only 1000 families. Repeat a) (3) d) (Extra Credit) Assume that your results in a) are correct and that there are 30000 families in the target area. Can you make the 90% confidence interval in a) into a confidence interval for total income? (2) d) Is the income found in a) significantly different from $36000? Why? (1) [9] e) A Business Week article claims that 25% of CEOs are ‘outsiders.’ A survey of 350 corporations is taken to check this ‘fact’ and the survey finds that 77 of the 350 firms have outsider CEOs. Create a 90% confidence interval for the proportion of firms that have outsider CEOs. Does this conflict with the statement in Business Week? Why? (4) 2) The amount of time a bank teller spends with each customer has a population mean 3.10 and a population standard deviation 0.40 minute. minutes a) If a random sample of 16 customers is selected from a large normally distributed population, what is the probability that the average (mean) time spent per customer is less than 3 minutes? (2.5) b) How would this change if the sample was taken from a population of only 144 customers? Give a specific answer! (2) c) If the teller puts in a 420 minute day, and the conditions in 2a) prevail, what is the probability of serving 150 customers or more? (2.5) [20] d) Under the circumstances in a), what is the probability that the time spent with 1 of the 16 customers is less than 3 minutes? (2) e) Under the circumstances in a), what is the probability that the time spent with each of the 16 customers in 2a) is less than 3 minutes apiece? (2) [26 – 84.5] Solution: 1 Assume that a real estate developer wants to estimate the mean family income in an area where a mall is proposed. The developer assumes that the population standard deviation for income is $7200 a) If a survey of 100 families yields a sample mean of $35500, create a 90% confidence interval for the mean. (3) The relevant formula for this section is x z x . 2 If 1 .90 , .10 and z z.05 1.645 and x 2 n 7200 720 100 35500 1.645 720 35500 1184 or 34316 to 36684. b) Find a confidence interval using the data in a) with a 32% confidence level. (2) If 1 .32 , .68 and z z.34 . We found z.34 0.41 on the first page of the exam. 2 35500 0.41720 35500 295 17 251y0741 5/20/07 c) Assume a 90% confidence level again and that the sample of 100 families comes from a neighborhood of only 1000 families. Repeat a) (3) x n 7200 100 1000 100 720 .9009 720 .9492 683 .40 1000 1 35500 1.645 638 .40 35500 1124 d) (Extra Credit) Assume that your results in a) are correct and that there are 30000 families in the target area. Can you make the 90% confidence interval in a) into a confidence interval for total income? (2) We have 35500 1.645 720 35500 1184 or 34316 to 36684. Just multiply each number by 30000. This gives us 10,294,800,000 to 11,005,200,000 d) Is the income found in a) significantly different from $36000? Why? (1) [9] Since 36000 is included in the confidence interval, we cannot say that the average income is significantly different from 36000. e) A Business Week article claims that 25% of CEOs are ‘outsiders.’ A survey of 350 corporations is taken to check this ‘fact’ and the survey finds that 77 of the 350 firms have outsider CEOs. Create a 90% confidence interval for the proportion of firms that have outsider CEOs. Does this conflict with the statement in Business Week? Why? (4) The observed proportion is p p p z 2 77 0.22 . A confidence interval for a proportion is 350 pq .22 .78 0.22 1.645 0.22 1.645 .00049 0.22 1.645 .0221 n 350 0.22 0.36 or .184 to .256. Since 25% is included in this interval, there is no conflict. 18 251y0741 5/20/07 2) The amount of time a bank teller spends with each customer has a population mean 3.10 minutes and a population standard deviation 0.40 minute. This is essentially Exercise 7.71 on CD and the solution was posted. a) If a random sample of 16 customers is selected from a large normally distributed population, what is the probability that the average (mean) time spent per customer is less than 3 minutes? (2.5) The sample mean has a Normal distribution with a mean of 3.10 and a 0.40 standard deviation of x 0.10 . So x ~ N 3.10,0.40 and x ~ N 3.1,0.10 16 3 3.10 Px 3 P z Pz 1.00 Pz 0 P 1 z 0 .5 .3531 .10 = .1469 b) How would this change if the sample was taken from a population of only 144 customers? Give a specific answer! (2) x x n .40 2 128 N n 0.40 144 16 0.008951 .09461 N 1 16 143 16 144 1 3 3.10 Px 3 P z Pz 1.06 Pz 0 P 1.06 z 0 .5 .3554 .09461 = .1446 c) If the teller puts in a 420 minute day, and the conditions in 2a) prevail, what is the probability of serving 150 customers or more? (2.5) [20] In order to do this, the teller must spend 420 2.87 minutes or less with a customer. an average of 150 2.87 3.10 Px 2.87 P z Pz 2.30 Pz 0 P 2.30 z 0 .10 = .5 - .4893 = .0107 d) Under the circumstances in a), what is the probability that the time spent with 1 of the 16 customers is less than 3 minutes? (2) x ~ N 3.10,0.40 3 3.10 Px 3 P z Pz 0.25 Pz 0 P 0.25 z 0 .40 =.5 - .0987 = .4013 e) Under the circumstances in a), what is the probability that the time spent with each of the 16 customers in 2a) is less than 3 minutes apiece? (2) [26 – 84.5] If we can assume that the times are independent, we have .4013 16 .000000452 19 251y0741 5/20/07 F. 1) (Dunleavy) Assume that for every 100,000 people treated in an emergency room, 5,000 are misdiagnosed, and 1,000 misdiagnosed patients die. Overall, 90,000 of the 100,000 live through the experience. Define the following events MD misdiagnosed ; S survives ; MD not misdiagnosed and S dies. Don’t just guess, make a table. Distinguish clearly between conditional and joint probabilities. a) What is the probability that a patient will be diagnosed correctly? (1) b) What is the probability that a patient will be diagnosed correctly and live? (1) c) What is the probability that a patient will die given that they were diagnosed correctly? (2) d) Are the diagnosis and survival independent? (1) e) If a patient dies, what is the probability that the patient was not misdiagnosed? (2) [7] f) What is the value of P MD S ? 2) OK. I ran out of ideas, so here is a mini-Jorcillator problem. Here is the joint probability table and the table relating joint events to failure times. Failure of the phillinx in period 1, 2, 3, 4 are events A1, A2 , A3 , and A4 . Failure of the flubberall in period 1, 2, 3, 4 are events B1, B2 , B3 , and B4 . Failure of the jorcillator in period 1, 2, 3, 4 are events C1, C2 , C3 , and C 4 . a) Assuming that the Jorcillator lives as long as one of its components lives, fill in the second table with failure times of 1, 2, 3 and 4. (1) A1 A2 A3 A4 B1 B2 B3 B4 .20 .12 .04 .04 .40 .15 .09 .03 .03 .30 .10 .06 .02 .02 .20 .05 .03 .01 .01 .10 B1 .5 .3 .1 .1 1.0 B2 B3 B4 A1 A2 A3 A4 b) Find PC1 (1) c) Find PC 2 (1) d) Find PC3 (1) e) Find PC 4 (1) f) Find P C3 A3 (1) [13 – 97.5] Solution: 1) (Dunleavy) Assume that for every 100,000 people treated in an emergency room, 5,000 are misdiagnosed, and 1,000 misdiagnosed patients die. Overall, 90,000 of the 100,000 live through the experience. Define the following events MD misdiagnosed ; S survives ; MD not misdiagnosed and S dies. 20 251y0741 5/20/07 Don’t just guess, make a table. Distinguish clearly between conditional and joint probabilities. a) What is the probability that a patient will be diagnosed correctly? (1) Facts: P MD 1 .05 .95 . As always, the easiest way to do this type of problem is a box. Let us assume 95000 .95 . 100000 This means that we can divide our group of 100 into 95 diagnosed correctly and 5 that were not. We also 90000 .90 so 90 out of 100 survive. know that P S 100000 that there are only 100 people available. The probability of being diagnosed correctly is S S S S MD 5 MD 5 MD 95 MD 95 100 . If someone is misdiagnosed the probability of dying is 90 10 100 S 1000 5000 .2 . So out of the 5 misdiagnosed, 20% or 1 die. MD MD S 1 5 95 Fill in the rest of the 90 10 100 S 4 MD table. MD 86 90 S 1 5 . This can be converted into a joint probability table by dividing by 100. 9 95 10 100 S MD MD S .04 .01 .05 .86 .09 .95 . 90 .1 1.00 b) What is the probability that a patient will be diagnosed correctly and live? (1) P MD S .86 c) What is the probability that a patient will die given that they (?) were diagnosed correctly? (2) .09 P S MD .09474 . No! You still cannot read a conditional probability directly from a joint .95 probability table. .09 .09474 . P S .10 . Since these are not d) Are the diagnosis and survival independent? P S MD .95 identical, they are not independent. Better, just note that the joint probabilities inside the table are not the products of the probabilities outside the table. (1) e) If a patient dies, what is the probability that the patient was not misdiagnosed? (2) [7] .09 .95 P S MD P MD .09 .95 .9 According to Bayes’ rule P MD S .10 .10 PS f) What is the value of P MD S ? I’m feeling lazy. Look at the table. 1- .86 = .14. 21 251y0741 5/20/07 2) OK. I ran out of ideas, so here is a mini-Jorcillator problem. Here is the joint probability table and the table relating joint events to failure times. Failure of the phillinx in period 1, 2, 3, 4 are events A1, A2 , A3 , and A4 . Failure of the flubberall in period 1, 2, 3, 4 are events B1, B2 , B3 , and B4 . Failure of the jorcillator in period 1, 2, 3, 4 are events C1, C2 , C3 , and C 4 . a) Assuming that the Jorcillator lives as long as one of its components lives, fill in the second table with failure times of 1, 2, 3 and 4. (1) A1 A2 A3 A4 B1 B2 B3 B4 .20 .12 .04 .04 .40 .15 .09 .03 .03 .30 .10 .06 .02 .02 .20 .05 .03 .01 .01 .10 B1 .5 .3 .1 .1 1.0 B2 B3 A1 A2 A3 A4 B4 Since whatever component lasts longest determines the result, an event like B1 A4 will down it in the 4th period. The table reads A1 A2 A3 A4 B1 B2 1 2 B3 3 4 B4 2 3 4 2 3 4 3 4 3 4 4 4 b) Find PC1 (1) .20. This section is very simple logic. If you were unwilling to figure it out, maybe you are allergic to logical thinking. c) Find PC 2 (1) .12 + .09 + .15 = .36 d) Find PC3 (1) .04 + .03 + .02 + .06 + .10 = .25 e) Find PC 4 (1) .1 + .1 - .01 = .19 These add to 1. f) Find P C3 A3 (1) PC 3 A3 Incidentally PA3 C 3 PC3 A3 PC 3 A3 .04 .03 .02 .36 . So Bayes’ rule says PC 3 .25 PA3 C3 PC3 P A3 PC 3 A3 .04 .03 .02 .9 P A3 .1 .36.25 .9 .1 [13 – 97.5] 22 251y0741 5/20/07 23