252y0753 10/19/07 (Open in ‘Print Layout’ format) ECO252 QBA2 FIRST EXAM October 4 and 8, 2007 Version 3 Name:_KEY___________ Class hour: _____________ Student number: __________ Show your work! Make Diagrams! Include a vertical line in the middle! Exam is normed on 50 points. Answers without reasons are not usually acceptable. I. (8 points) Do all the following. x ~ N 4, 11 0 4 Pz 0.36 0.36 z 0 Pz 0 = .1406 + .5 = .6406 1. Px 0 P z 11 2 4 33 4 z P 3.36 z 0.18 P3.36 z 0 P0.18 z 0 2. P33 x 2 P 11 11 =.4996 - .0714 = .4282 1 252y0753 10/19/07 (Open in ‘Print Layout’ format) 4 4 4 4 z P 0.73 z 0 = .2673 3. P4 x 4 P 11 11 4. x.075 (Do not try to use the t table to get this.) For z make a diagram. Draw a Normal curve with a mean at 0. z .075 is the value of z with 7.5% of the distribution above it. Since 100 – 7.5 = 92.5, it is also the .925 fractile. Since 50% of the standardized Normal distribution is below zero, your diagram should show that the probability between z .075 and zero is 92.5% - 50% = 42.5% or P0 z z.075 .4250 . If we check this against the Normal table, the closest we can come to .4250 is P0 z 1.44 .4251 . So z .075 1.44 . This is the value of z that you need for a 85% confidence interval. To get from z .075 to x.075 , use the formula x z , which is the x . x 4 1.4411 19 .84 . If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 4. Show that 50% of the distribution is below the mean (4). If 7.5% of the distribution is above x.075 , it must be above the mean and have 42.5% of the distribution between it and the mean. 19 .84 4 Pz 1.44 Pz 0 P0 z 1.44 Check: Px 19 .84 P z 11 .5 .4251 .0749 .075 . This is identical to the way you normally get a p-value for a right-sided test. opposite of z 2 252y0753 10/19/07 (Open in ‘Print Layout’ format) II. (9 points-2 point penalty for not trying part a.) Monthly incomes (in thousands) of 6 randomly picked individuals in the little town of Rough Corners are shown below. 2.5 7.3 3.1 2.6 2.4 3.0 a. Compute the sample standard deviation, s , of expenditures. Show your work! (2) b. Assuming that the underlying distribution is Normal, compute a 99% confidence interval for the mean. (2) c. Redo b) when you find out that there were only 50 people living in Rough Corners. (2) d. Assume that the population standard deviation is 2 and create an 85% two-sided confidence interval for the mean. (2) e. Use your results in a) to test the hypothesis that the mean income is above 2.3(thousand) at the 99% level. (3) State your hypotheses clearly! f. (Extra Credit) Given the data, test the hypothesis that the population standard deviation is below 2. (3) Solution: a) Compute the sample standard deviation, s , of expenditures. The first two columns are needed for the x Row xx x2 x x 2 computational (shortcut) method. The first, third 1 2.5 6.25 -0.98333 0.9669 and fourth are needed for the definitional 2 7.3 53.29 3.81667 14.5669 method. Using (both methods or) the 3 3.1 9.61 -0.38333 0.1469 definitional method wastes time. 4 2.6 6.76 -0.88333 0.7803 5 2.4 5.76 -1.08333 1.1736 x 20 .9 , x 2 90 .67 , x x 0 (a 6 x 3.0 20.9 9.00 90.67 -0.48333 0.0 x 20.9 3.4833 s x n 2 x 6 0.2336 17.8682 2 nx 2 n 1 check), x x 2 544 and n 6. 90 .67 63.4833 2 17 .8697 3.5739 5 5 s x 3.5739 1.8905 . If you used the definitional method, you would have gotten 1.8904. There seems to be a lot of potential for rounding error here. Note that the x x column, even though it carries an extra place, does not quite add to the expected zero but to .00002. b) Assuming that the underlying distribution is Normal, compute a 99% confidence interval for the mean. (2) x tn1 s x 3.4833 4.032 0.77178 3.483 3.112 or 0.371 to 6.595. 2 1.8905 3.5739 5 0.59565 0.77178 t n1 t.005 4.032 2 6 n 6 c) Redo b) when you find out that there were only 50 people living in Rough Corners. (2) x tn1 s x 136 .00 4.604 4.3304 136 .00 19.98 or 116.02 to 155.98 sx sx 2 N n 1.8905 50 6 3.5739 44 9 0.59565 0.41237 0.6422 6 49 N 1 50 1 13 n 6 d) Assume that the population standard deviation is 2 and create an 85% two-sided confidence interval for the mean. (2) (2) We found z.075 1.44 on the last page. We have 2 , n 6 , x 3.4833 and sx x sx x 2 n 6 2.3075 to 4.6591. 4 0.66667 0.8165 x z 3 x1 3.4833 1.440.8165 3.4833 1.1758 or 6 e) Use your results in a) to test the hypothesis that the mean income is above 2.3(thousand) at the 99% level. (3) State your hypotheses clearly! The statement that the mean is above 2.3 does not contain an equality, so it must be an alternate hypothesis. We have the following information. .01 , x 3.4833 , s 5 3.365 . n 6 and s x x 0.77178 . Since this is a one-sided hypothesis we will use tn 1 t .01 n Needless to say, because of the small sample size, we are assuming that the parent distribution is Normal. 3 252y0753 10/19/07 (Open in ‘Print Layout’ format) H : 2.3 Our hypotheses are 0 so 0 2.3 . Since we are worrying about the mean being too large, this H 1 : 2.3 is a right-sided test. There are three ways to do this. Do only one of them. x 0 3.4833 2.3 (i) Test Ratio: t 1.5332 . This is a right-sided test - the larger the sample mean sx 0.77178 is, the more positive will be this ratio. We will reject the null hypothesis if the ratio is larger than 5 tn 1 t .01 3.365 . Make a diagram showing a Normal curve with a mean at 0 and a shaded 'reject' zone above 3.365. Since the test ratio is below 3.365, we cannot reject H 0 . If you wish to find a p-value for your hypothesis, note that the t-ratio is 1.5332. The p-value will be the probability that t is above 3.365. The line of the t table for 5 degrees of freedom is below. {ttable} df .45 .40 .35 .30 .25 .20 .15 .10 .05 .025 .01 .005 .001 5 0.132 0.267 0.408 0.559 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 5.893 What this tells us, among other things, is that Pt 1.476 .10 and Pt 2.015 .05 . Since 1.5332 lies between 1.476 and 2.015, the probability that t lies above 1.5332 must be between .05 and .10. .05 p value .10 . This is above our significance level of .01, so we will not reject the null hypothesis. (ii) Critical value: We need a critical value for x above 2.3. Common sense says that if the sample mean is too far above 2.3, we will not believe H 0 : 2.3 . The formula for a critical value for the sample mean is x t n1 s , but we want a single value above 2.3, so use x t n1 s cv 0 2 x cv 0 x 2.3 3.365 0.77178 2.3 2.5970 4.8970 . Make a diagram showing an almost Normal curve with a mean at 2.3 and a shaded 'reject' zone above 4.8970. Since x 3.4833 is not above 4.8970, we do not reject H 0 . (iii) Confidence interval: x t sx is the formula for a two sided interval. The rule for a one-sided 2 confidence interval is that it should always go in the same direction as the alternate hypothesis. Since the alternative hypothesis is H 1 : 2.3 , the confidence interval is x tn1 s x or 3.4833 3.365 0.77178 0.8863 . Make a diagram showing an almost Normal curve with a mean at x 3.4833 and, to represent the confidence interval, shade the area above 0.8863 in one direction. Then, on the same diagram, to represent the null hypothesis, H 0 : 2.3 , shade the area below 2.3 in the opposite direction. Notice that these overlap. What the diagram is telling you is that it is possible for 0.8863 and H 0 : 2.3 to both be true. (If you follow my more recent suggestions, it is actually enough to show that 2.3 is on the confidence interval.) So we do not reject H 0 . f.) (Extra Credit) Given the data, test the hypothesis that the population standard deviation is below 2. (3) This is an alternate hypothesis, H 1 : 2 . The null hypothesis is H 0 : 2 Remember n 6 , .01 s x2 3.5739 . Table 3 says that the test ratio is 2 n 1s 2 02 53.5739 22 4.4674 . 4 252y0753 10/19/07 (Open in ‘Print Layout’ format) Recall df n 1 5. The first paragraph of the chi-squared table appears below. If we look at the 5 4 column, we see that the lower 1% of values of chi-squared are cut off by 2 .99 0.2971, so that the reject region is below 0.5543. Degrees of Freedom 0.005 0.010 0.025 0.050 0.100 0.900 0.950 0.975 0.990 0.995 1 2 3 4 5 6 7.87946 10.5966 12.8382 14.8603 16.7496 18.5476 6.63491 9.2103 11.3449 13.2767 15.0863 16.8119 5.02389 7.3778 9.3484 11.1433 12.8325 14.4494 3.84146 5.9915 7.8147 9.4877 11.0705 12.5916 2.70554 4.6052 6.2514 7.7794 9.2364 10.6446 0.01579 0.2107 0.5844 1.0636 1.6103 2.2041 0.00393 0.1026 0.3518 0.7107 1.1455 1.6354 0.00098 0.0506 0.2158 0.4844 0.8312 1.2373 0.00016 0.0201 0.1148 0.2971 0.5543 0.8721 0.00004 0.0100 0.0717 0.2070 0.4117 0.6757 7 20.2778 18.4753 16.0128 14.0671 12.0170 2.8331 2.1674 1.6899 1.2390 0.9893 8 21.9550 20.0902 17.5346 15.5073 13.3616 3.4895 2.7326 2.1797 1.6465 1.344 9 23.5893 21.6660 19.0228 16.9190 14.6837 4.1682 3.3251 2.7004 2.0879 1.7349 Computer output for parts b) d) e) and f) follows ————— 10/5/2007 1:39:55 PM ———————————————————— Welcome to Minitab, press F1 for help. MTB > WOpen "C:\Documents and Settings\RBOVE\My Documents\Minitab\252x075123.MTW". Retrieving worksheet from file: 'C:\Documents and Settings\RBOVE\My Documents\Minitab\252x0751-23.MTW' Worksheet was saved on Wed Oct 03 2007 Results for: 252x0751-23.MTW MTB > describe x Descriptive Statistics: x Variable x N 6 N* 0 Mean 3.483 SE Mean 0.772 MTB > onet c1; SUBC> conf 99. StDev 1.890 Minimum 2.400 Q1 2.475 Median 2.800 Q3 4.150 Maximum 7.300 Part b) One-Sample T: x Variable x N 6 Mean 3.48333 StDev 1.89041 MTB > onez c1; SUBC> sigma 2; SUBC> conf 85. SE Mean 0.77176 99% CI (0.37149, 6.59517) Part d) One-Sample Z: x The assumed standard deviation = 2 Variable N Mean StDev SE Mean x 6 3.48333 1.89041 0.81650 MTB > Onet c1; SUBC> Test 2.3; SUBC> Confidence 99; SUBC> Alternative 1. 85% CI (2.30796, 4.65871) Part e) One-Sample T: x Test of mu = 2.3 vs > 2.3 Variable x N 6 Mean 3.48333 StDev 1.89041 SE Mean 0.77176 99% Lower Bound 0.88642 T 1.53 P 0.093 MTB > Save "C:\Documents and Settings\RBOVE\My Documents\Minitab\252x075123.MTW"; SUBC> Replace. 5 252y0753 10/19/07 (Open in ‘Print Layout’ format) Saving file as: 'C:\Documents and Settings\RBOVE\My Documents\Minitab\252x0751-23.MTW' Existing file replaced. Part f) MTB > %sigtest c1 4 Tests data in column 1 for variance of 4. Packaged Minitab Macro. This is a 2-sided test. Executing from file: sigtest.MAC The value of the test statistic is 4.4671. If the test statistic is less than 0.8312 or greater than 12.8325 then there is statistical evidence indicating that your variance does not equal to 4.0000, at alpha = 0.0500. MTB > %sigtest c1 4; Since I didn’t have time to input a 1sided test. I ran two-sided tests with a confidence level of 98% and 90% because their lower critical values would be the same as for 1-sided tests with confidence levels of 99% and 95%. SUBC> alpha 98. Executing from file: sigtest.MAC The value of the test statistic is 4.4671. If the test statistic is less than 4.2789 or greater than 4.4249 then there is statistical evidence indicating that your variance does not equal to 4.0000, at alpha = 0.9800. MTB > %sigtest c1 4; SUBC> alpha 90. Executing from file: sigtest.MAC The value of the test statistic is 4.4671. If the test statistic is less than 3.9959 or greater than 4.7278 then there is statistical evidence indicating that your variance does not equal to 4.0000, at alpha = 0.9000. 6 252y0753 10/19/07 (Open in ‘Print Layout’ format) III. Do as many of the following problems as you can.(2 points each unless marked otherwise adding to 13+ points). Show your work except in multiple choice questions. (Actually – it doesn’t hurt there either.) If the answer is ‘None of the above,’ put in the correct answer if possible. 1) If I want to test to see if the mean of x is smaller than the given population mean 0 my null hypothesis is: i) 0 ii) 0 iii) * 0 iv) 0 v) Could be any of the above. We need more information. vi) None of the above Explanation: 0 is our alternate hypothesis since it doesn’t contain an equality. 0 is the opposite, so it must be the null hypothesis. 2) Assuming that you have a sample mean of 100 based on a sample of 36 taken from a population of 300 with a sample standard deviation of 80, the 99% confidence interval for the population mean is 80 a) 100 2.576 36 300 36 80 b) 100 2.576 300 1 36 80 c) 100 2.576 300 80 d) 100 2.724 300 300 36 80 e) * 100 2.724 300 1 36 80 f) 100 2.724 36 80 g) 100 2.438 300 300 36 80 h) 100 2.438 300 1 36 80 i) 100 2.438 36 300 36 80 j) 100 2.438 300 1 300 g) None of the above. Fill in a correct answer. Explanation: The formula for a confidence interval when the variance is known when the sample is more than 20% of the population was given in the solution to problem A2 as x t n1 x , where x 100, n 36, N 300, s 80 , 1 99% and .01 . 2 7 252y0753 10/19/07 (Open in ‘Print Layout’ format) Here s x 80 N n N 1 36 sx n 80 36 100 2.724 300 36 300 1 300 36 35 2.724 . So and t n 1 t .005 2 300 1 3) Which of the following is a Type 2 error? a) Rejecting the null hypothesis when the null hypothesis is true. b) Not rejecting the null hypothesis when the null hypothesis is true. c) *Not rejecting the null hypothesis when the null hypothesis is false. d) Rejecting the null hypothesis when the null hypothesis is false. e) All of the above f) None of the above. 4) If a random sample is gathered to get information about a population proportion, what do we mean by a p-value? a) P-value is the probability that, if the null hypothesis was false, that, if we were to repeat the experiment many times, we would get a sample proportion as extreme as or more extreme than the sample proportion actually observed. b) *P-value is the probability that, if the null hypothesis was true, that, if we were to repeat the experiment many times, we would get a sample proportion as extreme as or more extreme than the sample proportion actually observed. c) P-value is the population proportion in the null hypothesis. d) P-value is the population proportion in the alternate hypothesis. e) P-value is the probability of a type 2 error. f) P-value is the probability that the alternate hypothesis is true, given the sample proportion actually observed. g) None of the above is true. 5) If a difference in proportions (in a business-related problem) is called statistically significant at the 1% significance level, this means that a) *If the null hypothesis is true, the difference in proportions is surprisingly large. b) There is a 99% chance that the null hypothesis is true. c) The difference in proportions is large enough so that we must take account of it in our business decisions. d) All of the above 6) (Wonnacott & Wonnacott) When an industrial process is in control, it produces bolts with a hardness that has a mean of at least 80 and a (population) standard deviation of 8. If the hardness is too far below 80, you must shut down the process. Every hour you take a sample of 16 bolts. How low must the average hardness of these bolts be before we shut the process down? (Use a 10% significance level, and don’t forget to state your hypotheses) (3) Solution: We have 0 80 , 8 , n 16 , H 1 : 80 , H 0 : 80 , .10 , z .10 1.282 , x n 8 16 2 , xcv 0 z x 80 1.282 2 77.436 8 252y0753 10/19/07 (Open in ‘Print Layout’ format) 7) Your boss, who doesn’t know any statistics, tells you to shut down the process in 6) if the hardness level from a sample of 16 bolts is 79 or lower. . It is known that the population standard deviation is 8. If we assume that the process is producing bolts with an average hardness of 80, what is the probability that it will be shut down? (Think p-value?) (2) [15] 79 80 Pz 0.5 .5 .1915 .3085 Solution: Px 79 P z 2 8) A 1989 Gallup poll revealed that 59% of women believed that the Republican Party were more likely than the Democratic Party to keep the country prosperous (My, how things change!). We were already sure that 68% of men believed that this was true. a) How many women had to be polled before we could state that the proportion for women is .59 .03 ? (Use a 5% significance level.) (2) Solution: The outline says “The usually suggested formula is n pqz 2 …..This is the formula everyone e2 forgets that we covered.” So, we have p .59 , q 1 p 1 .59 .41 , z 1.960 , e .03 and n .59 .411.960 2 .03 2 1032 .54 . So use a sample of at least 1033. b) If we wished to test our belief that women were less likely to believe that the Republicans were more likely than the Democratic Party to keep the country prosperous and we were already sure that 68% of men believed that this was true, let p represent the proportion of women. What are our null and alternative hypotheses? (1) Solution: The statement implied in the problem p .68 does not contain an equality, so the null hypothesis is the opposite p .68 . c) The actual poll covered a sample of 750 women. Using a 95% confidence level and assuming that your hypothesis in b) is correct, test the hypothesis. (2) [20] Solution: From Table 3. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Proportion p p0 H 0 : p p0 p p z 2 s p pcv p0 z 2 p z H : p p p 1 0 pq p0 q0 sp p n n q 1 p q0 1 p0 .05 , z z.05 1.645 , p 0 .68 , q 0 1 p 0 1 .68 .32 , p .59 and n 750 . First, for the test ratio or critical value we need p p0 q0 .68 .32 .000290 .0170333 . For the confidence 750 n pq .59 .41 .000323 .01796 . n 750 Use one of the following 3 methods. H 0 : p .68 , H 1 : p .68 interval we need p .59 , q 1 p 1 .59 .41 . s p Critical Value Method: Since we have H 1 : p .68 , this is a left-sided test. We use a critical value for the proportion of pcv p0 z p .68 1.645 .0170333 .6520 . Make a diagram showing a normal curve with a center at p 0 .68 and a rejection region below .6528. Since p .59 , is below .6520, we reject the null hypothesis. p p0 .59 .68 5.284 . Make a diagram showing a normal curve with a Test Ratio Method: z .0170333 p center at zero and a rejection region below -1.645. Since z 5.284 , is below -1.645, we reject the null hypothesis. The p-value would be P p .59 Pz 5.284 .5 .5000 =.0. 9 252y0753 10/19/07 (Open in ‘Print Layout’ format) This would lead to rejection of the null hypothesis for most values of , since the p-value would be below the significance level. Confidence interval method: Since we have H 1 : p .68 , we need a one-sided ‘ ’ interval. This would be p p z p .59 1.645 0.01796 .6195 The null hypothesis H 0 : p .68 is contradicted by the confidence interval . p .6195 . 10 252y0753 10/19/07 (Open in ‘Print Layout’ format) ECO252 QBA2 FIRST EXAM October 8, 2007 TAKE HOME SECTION Name: _________________________ Student Number and class: _________________________ IV. Do at least 3 problems (at least 7 each) (or do sections adding to at least 20 points - Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H 1 where appropriate. You have not done a hypothesis test unless you have stated your hypotheses, run the numbers and stated your conclusion. (Use a 95% confidence level unless another level is specified.) Answers without reasons usually are not acceptable. Neatness and clarity of explanation are expected. This must be turned in when you take the in-class exam. Note that answers without reasons and citation of appropriate statistical tests receive no credit. Failing to be transparent about which section of which problem you are doing can lose you credit. Many answers require a statistical test, that is, stating or implying a hypothesis and showing why it is true or false by citing a table value or a p-value. If you haven’t done it lately, take a fast look at ECO 252 - Things That You Should Never Do on a Statistics Exam (or Anywhere Else). A group of 30 employees are interviewed to determine the minimum amount that they will take to give up a vacation day. After careful interviewing, a psychologist repots the following amounts. 479 616 627 648 488 622 522 557 512 595 621 631 547 628 657 511 578 634 539 625 My calculations say that the sum of these 30 numbers is x 2 553 612 520 509 499 633 606 616 612 598 x 17395 and that the sum of squares is 10171575 . This is a sample of 30. Personalize these data as follows. Take the second to last digit of your student number and multiply it by 5. Add this quantity to each of the 30 numbers. If the second to last digit of your student number is 0, add 50. Label your exam by version number as follows. If the second to last digit of your student number is 1, you are doing Version 1. If the second to last digit is 2, you are doing Version 2. Etc. If the second to last digit is zero you are doing version 10. Last term's exam said the following. If you add a quantity a to a column of numbers, x a x na, x a x 2a x na . For example, if a 60 , x 60 x 3060 , 17395 + 1800 = ? and x 60 x 260 x 3060 1017157512017395 303600. 2 2 2 2 2 2 Test the following Problem 1: Count the number of people in your sample that demand more than $602.50 and make it into a sample proportion. Test the following 3 hypotheses: I) that 60% demand more than $602.50, II) that more than 60% demand more than $602.50 and III) that less than 60% demand more than $602.50, using a 98% confidence level. For each of these three tests a) state your null and alternative hypotheses (2), b) test each one using a test ratio or a critical value for the proportion (2) and c) find a p-value for the null hypotheses (3). Label each part clearly so that I know which is I, II and III and a), b) c). Make sure that I know where the ‘reject’ zone is. d) Using the proportion you found above, how large a sample would you need to estimate a 2-sided 98% confidence interval for the proportion with and error of at most .001? Assume that your sample is of that size and show that the confidence interval has an error of at most .001. (3) [10] e) (Extra credit) Assume that you are testing the hypothesis that (II) more than 60% demand over $602.50, find the power of the test if you use a sample of 30 the true proportion is 70% (3) 11 252y0753 10/19/07 (Open in ‘Print Layout’ format) Problem 2: Assume that the underlying data for problem 1 is not Normal and using the data for problem 1 test the following three hypotheses: I) that the median demand is $602.50, II) that median demand is more than $602.50 and III) that the median demand is less than $602.50, using a 98% confidence level. a) state your null and alternative hypotheses and the hypotheses that you will actually test for each of the 3 tests (3), b) test each one using a test ratio or a critical value (3), c) find a p-value for the 2-sided test and explain whether and why it would lead to a rejection of the null hypothesis at the 95% confidence level (1), d) (extra credit) Show explicitly what the conclusion in c) would be if the sample of 30 came from a population of 60. (1) e) (extra credit) Find a two sided confidence interval for the median (2) [17] Problem 3: a) Find the sample mean and sample standard deviation for the data in Problem 1 (1) b) Test the hypothesis that the mean is 602.50 using critical values for the sample mean, first stating your hypotheses clearly. Use a 98% confidence level (2) c) Test the hypothesis in b) using a test ratio. Find an approximate p-value and state and explain whether this will lead to a rejection of the null hypothesis if we continue to use a 98% confidence level. (2) d) Using the test ratio you found in c) find a p-value for the null hypothesis that the mean is at most 602.50 (1) e) Using the test ratio you found in c) find a p-value for the null hypothesis that the mean is at least 602.50 (1) f) Test the null hypothesis that the mean is at most 602.50 using an appropriate confidence interval (1) g) Test the null hypothesis that the mean is at least 602.50 using an appropriate confidence interval (1) [26] Problem 4: Assume that the population standard deviation is known to be 30 but that we are still working with a problem like Problem 3. (98% confidence level, sample of 30.) Do either Problem 4.1 or Problem 4.2. Make sure that I know which one! Problem 4.1. a) Find a critical value for the sample mean if we are testing whether the population mean is below 30. Clearly state your null and alternative hypotheses (2) b) Assume that the sample mean is 30 minus the second to last digit of your student number. (Use 10 if this digit is zero.) Find a p-value for your null hypothesis. (1) c) Create a power curve for the test (6) Problem 4.2. a) Find critical values for the sample mean if we are testing whether the population mean is 30. Clearly state your null and alternative hypotheses (2) b) Assume that the sample mean is 30 minus the second to last digit of your student number. (Use 10 if this digit is zero.) find a p-value for your null hypothesis. (1) c) Create a power curve for the test (8) [37] Problem 5: In problem 4 we assumed that the population standard deviation is 30. a) Do a 98% confidence interval for the mean using the mean that you found in Problem 3 and assuming that our sample of 30 came from a population of 300. (2) b) How large a sample would we need if we wanted to make the error term no more than 1 and the sample came from an infinite population? (2) c) Using a 98% confidence level and a sample size of 30 create a confidence interval for the population standard deviation using your sample variance or standard deviation from Problem 3. (2) d) Repeat c) assuming that you had a sample of 300. (2) e) Can we say that the standard deviation is significantly different from 30 on the basis of c) and d)? (1) f) Using the data and sample size from problem 3 can we say that the standard deviation is above 30? State your hypotheses and do an appropriate hypothesis test. (3) [49] 12 252y0753 10/19/07 (Open in ‘Print Layout’ format) Problem 1: Count the number of people in your sample that demand more than $602.50 and make it into a sample proportion. Test the following 3 hypotheses: I) that 60% demand more than $602.50, II) that more than 60% demand more than $602.50 and III) that less than 60% demand more than $602.50, using a 98% confidence level. For each of these three tests a) state your null and alternative hypotheses (2), b) test each one using a test ratio or a critical value for the proportion (2) and c) find a p-value for the null hypotheses (3). Label each part clearly so that I know which is I, II and III and a), b) c). Make sure that I know where the ‘reject’ zone is. d) Using the proportion you found above, how large a sample would you need to estimate a 2-sided 98% confidence interval for the proportion with and error of at most .001? Assume that your sample is of that size and show that the confidence interval has an error of at most .001. (3) [10] e) (Extra credit) Assume that you are testing the hypothesis that (II) more than 60% demand over $602.50, find the power of the test if you use a sample of 30 the true proportion is 70% (3) Solution: The data sets that you had are presented in order. A line divides the numbers above $602.50 from those below. xb is the number below 602.50. x 30 x b is the number below 602.50. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 xb x V1 484 493 504 514 516 517 525 527 544 552 558 562 583 600 603 611 617 617 621 621 626 627 630 632 633 636 638 639 653 662 14 16 V2 489 498 509 519 521 522 530 532 549 557 563 567 588 605 608 616 622 622 626 626 631 632 635 637 638 641 643 644 658 667 13 17 V3 494 503 514 524 526 527 535 537 554 562 568 572 593 610 613 621 627 627 631 631 636 637 640 642 643 646 648 649 663 672 13 17 V4 499 508 519 529 531 532 540 542 559 567 573 577 598 615 618 626 632 632 636 636 641 642 645 647 648 651 653 654 668 677 13 17 V5 504 513 524 534 536 537 545 547 564 572 578 582 603 620 623 631 637 637 641 641 646 647 650 652 653 656 658 659 673 682 12 18 V6 509 518 529 539 541 542 550 552 569 577 583 587 608 625 628 636 642 642 646 646 651 652 655 657 658 661 663 664 678 687 12 18 V7 514 523 534 544 546 547 555 557 574 582 588 592 613 630 633 641 647 647 651 651 656 657 660 662 663 666 668 669 683 692 12 18 V8 519 528 539 549 551 552 560 562 579 587 593 597 618 635 638 646 652 652 656 656 661 662 665 667 668 671 673 674 688 697 12 18 V9 524 533 544 554 556 557 565 567 584 592 598 602 623 640 643 651 657 657 661 661 666 667 670 672 673 676 678 679 693 702 12 18 V10 529 538 549 559 561 562 570 572 589 597 603 607 628 645 648 656 662 662 666 666 671 672 675 677 678 681 683 684 698 707 10 20 Let p be the proportion that demand more than $602.50. a) State your null and alternative hypotheses (2) H 0 : p .6 I) 60% demand more than $602.50 H 1 : p .6 H 0 : p .6 II) More than 60% demand more than $602.50 H 1 : p .6 13 252y0753 10/19/07 (Open in ‘Print Layout’ format) H : p .6 III) Less than 60% demand more than $602.50 0 H 1 : p .6 b) Test each hypothesis using a test ratio or a critical value for the proportion (2) The relevant formulas are in Table 3. n 30 , .02 , z z.02 2.054 was found in Grass1 and the t table says z z.01 2.327 p 2 x V1 16 V2 17 V3 17 p0 q0 .6.4 = .008 .08944 30 n V4 17 V5 18 V6 18 V7 18 V8 18 V9 18 V10 20 x x .5333 .5667 .5667 .5667 .6000 .6000 .6000 .6000 .6000 .6667 n 30 Here is the slice of Table 3 for proportions. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Proportion p p0 p p z 2 s p pcv p0 z 2 p H 0 : p p0 z p H1 : p p0 pq p0 q0 sp p n n q 1 p q0 1 p0 p H : p .6 I) 60% demand more than $602.50 0 Critical Value: pcv p0 z p 2 H 1 : p .6 .6 2.327 .08966 .6 .2086 Make a diagram. Draw a Normal curve centered at .6 with rejection zones below .3914 and above .8086. None of the values of p falls into the rejection region. p p0 p .6 Test Ratio: z . We reject the null hypothesis unless z falls between .08944 p .5333 .6 0.7458 , V2-4: .08944 .5667 .6 .6000 .6 .6667 .6 z 0.3723 , V5-9: z 0 , V10: z 0.7458 None of these .08944 .08944 .08944 falls in the ‘reject’ region. H 0 : p .6 II) More than 60% demand more than $602.50 Critical Value: pcv p0 z p H 1 : p .6 .6 2.054 .08966 .6 0.1842 .7842 . Make a diagram. . Draw a Normal curve centered at .6 with a rejection zone above .7842. None of our values of p falls into the rejection zone. z 2 z.01 2.327 and z 2 z.01 2.327 . V1: z Test Ratio: z p p0 p p .6 . We reject the null hypothesis if z falls above z.02 2.054 . .08944 None of our values of z falls into the rejection zone. H 0 : p .6 III) Less than 60% demand more than $602.50 Critical Value: pcv p0 z p H 1 : p .6 .6 2.054 .08966 .6 0.1842 .4158 . Make a diagram. Draw a Normal curve centered at .6 with a rejection zone below .4158. None of our values of p falls into the rejection zone. p p0 p .6 Test Ratio: z . We reject the null hypothesis if z falls below z .02 2.054 . .08944 p None of our values of z falls into the rejection zone. 14 252y0753 10/19/07 (Open in ‘Print Layout’ format) c) Find a p-value for the null hypotheses In response to a student inquiry, I wrote the following paragraph about p-value. We could say that to compute a value for z or t, if it is a left sided test, find the probability below your value of z using what you know about finding Normal probabilities (if it is t approximate the probability using the t table.) If it is a right sided test find the probability above your value of z. If it is a 2-sided test and z is negative, proceed as you would in a left sided test and double the probability. If it is a 2 sided test and z is positive, proceed as you would in a right sided test and double the probability. H : p .6 I) 60% demand more than $602.50 0 H 1 : p .6 V1: z 0.7458 , p value 2Pz 0.7458 2.5 .2734 .4532 V2-4: z 0.3723 , p value 2Pz 0.3723 2.5 .1443 .7114 V5-9: z 0 , p value 2Pz 0 2.5 1.0000 V10: z 0.7458 , p value 2Pz 0.7458 2.5 .2734 .4532 H : p .6 II) More than 60% demand more than $602.50 0 H 1 : p .6 V1: z 0.7458 , p value Pz 0.7458 .5 .2734 .7734 V2-4: z 0.3723 , p value Pz 0.3723 .5 .1443 .6443 V5-9: z 0 , p value Pz 0 .5 V10: z 0.7458 , p value Pz 0.7458 .5 .2734 .2266 H : p .6 III) Less than 60% demand more than $602.50 0 H 1 : p .6 V1: z 0.7458 , p value Pz 0.7458 .5 .2734 .2266 V2-4: z 0.3723 , p value Pz 0.3723 .5 .1443 .3557 V5-9: z 0 , p value Pz 0 .5 V10: z 0.7458 , p value Pz 0.7458 .5 .2734 .7734 d) Using the proportion you found above, how large a sample would you need to estimate a 2-sided 98% confidence interval for the proportion with and error of at most .001? Assume that your sample is of that size and show that the confidence interval has an error of at most .001. (3) [10] n pqz 2 . I have not worked this out for all versions, and it is up to you to decide what e2 confidence level you will use. A solution for Version 1 with .01 is probably the largest result that you could get. n .5333 .6667 2.328 2 .0012 1203 .3 . The sample should be, at least 1204. e) (Extra credit) Assume that you are testing the hypothesis that (II) more than 60% demand over $602.50, find the power of the test if you use a sample of 30 the true proportion is 70% (3) .7842 .7. Find P p .7842 p .7 P z . If you answer was close to this and I didn’t give .7.3 30 you credit, complain. 15 252y0753 10/19/07 (Open in ‘Print Layout’ format) Problem 2: Assume that the underlying data for problem 1 is not Normal and using the data for problem 1 test the following three hypotheses: I) that the median demand is $602.50, II) that median demand is more than $602.50 and III) that the median demand is less than $602.50, using a 98% confidence level. a) state your null and alternative hypotheses and the hypotheses that you will actually test for each of the 3 tests (3), b) test each one using a test ratio or a critical value (3), c) find a p-value for the 2-sided test and explain whether and why it would lead to a rejection of the null hypothesis at the 95% confidence level (1), d) (extra credit) Show explicitly what the conclusion in c) would be if the sample of 30 came from a population of 60. (1) e) (extra credit) find a two sided confidence interval for the median (2) [17] Let p be the proportion that demand more than $602.50. The data has been exhibited in Problem 1. We have calculated the following. V1 16 x V2 17 V3 17 V4 17 V5 18 V6 18 V7 18 V8 18 V9 18 V10 20 x x .5333 .5667 .5667 .5667 .6000 .6000 .6000 .6000 .6000 .6667 n 30 The relevant formulas are in Table 3. n 30 , .02 , z z.02 2.054 was found in Grass1 and the t p table says z z.01 2.327 p 2 p0 q0 .5.5 = .00833 .091287 . 30 n If we check the table in the outline {252ones}, we have the correspondences below. We will use the hypotheses about a proportion on the left. Hypotheses about A median Hypotheses about a proportion If p is the If p is the proportion proportion above 0 below 0 H 0 : 0 H 1 : 0 H 0 : 0 H 1 : 0 H 0 : 0 H 1 : 0 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 a) State your null and alternative hypotheses and the hypotheses that you will actually test for each of the 3 tests (3) b) Test each one using a test ratio or a critical value (3) H 0 : 602 .50 H 0 : p .5 I) The median demand is $602.50 H 1 : 602 .50 H 1 : p .5 Critical Value: pcv p0 z p .5 2.327 .091287 .5 .2124 . Make a diagram. Draw a 2 Normal curve centered at .5 with rejection zones below .2878 and above .7124. None of the values of p fall into the rejection region. Test Ratio: z p p0 p p .5 . We reject the null hypothesis unless z falls between .091287 .5333 .5 0.3648 , V2-4: .091287 .5667 .5 .6000 .5 .6667 .5 z 0.7307 , V5-9: z 1.0954 , V10: z 1.8261 None of .091287 .091287 .091287 these falls in the ‘reject’ region. z 2 z.01 2.327 and z 2 z.01 2.327 . V1: z 16 252y0753 10/19/07 (Open in ‘Print Layout’ format) H : 602 .50 II) The median demand is more than $602.50 0 H 1 : 602 .50 Critical Value: This is a right-sided test so our critical value is H 0 : p .5 H 1 : p .5 pcv p0 z p .5 2.054 .091287 .5 .1875 .6875 . Make a diagram. Draw a Normal curve centered at .5 with a rejection zone above .6875. None of the values of p fall into the rejection region. p p0 p .5 Test Ratio: z . We reject the null hypothesis if z falls above z .091287 p z.02 2.054 . V1: z 0.3648 , V2-4: z 0.7307 , V5-9: z 1.0954 , V10: z 1.8261 None of these falls in the ‘reject’ region. H : 602 .50 H 0 : p .5 III) The median demand is less than $602.50 0 H 1 : 602 .50 H 1 : p .5 Critical Value: This is a right-sided test so our critical value is pcv p0 z p .5 2.054 .091287 .5 .1875 .3125 . Make a diagram. Draw a Normal curve centered at .5 with a rejection zone below .3125. None of the values of p fall into the rejection region. p p0 p .5 Test Ratio: z . We reject the null hypothesis if z falls below z .091287 p z .02 2.054 . V1: z 0.3648 , V2-4: z 0.7307 , V5-9: z 1.0954 , V10: z 1.8261 None of these falls in the ‘reject’ region. Alternate formulas for this section include those below. i. Test Ratio: With continuity correction z same as testing z against z 2 p .5 n p 0 p , p p0 q0 2x 1 n or z . This is the n n 2x n .5 . Without continuity correction z . To allow for a finite n p n N n . N 1 p0 1 2n z 2 p . To make a finite population correction, multiply p by population, divide these by ii. Critical Value: pcv N n . N 1 iii. Confidence Interval: p p 1 by 2n z 2 s p . To make a finite population correction, multiply s p N n . N 1 c) Find a p-value for the 2-sided test and explain whether and why it would lead to a rejection of the null hypothesis at the 95% confidence level (1) . See problem 1 for an explanation of p-value. H 0 : 602 .50 H 0 : p .5 I) The median demand is $602.50 H 1 : 602 .50 H 1 : p .5 V1: z 0.3648 , p value 2Pz 0.3648 2.5 .1406 .7188 V2-4: z 0.7307 , p value 2Pz 0.7307 2.5 .2673 .4654 V5-9: z 1.0954 , p value 2Pz 1.0954 2.5 .3621 .2758 V10: z 1.8261 p value 2Pz 1.8261 2.5 .4664 .0672 17 252y0753 10/19/07 (Open in ‘Print Layout’ format) Since none of these are below the significance level of 5%, none of these lead to a rejection of the null hypothesis at a 95% confidence level. H : 602 .50 H 0 : p .5 II) The median demand is more than $602.50 0 H 1 : 602 .50 H 1 : p .5 V1: z 0.3648 , p value Pz 0.3648 .5 .1406 .4406 V2-4: z 0.7307 , p value Pz 0.7307 .5 .2673 .2327 V5-9: z 1.0954 , p value Pz 1.0954 .5 .3621 .1379 V10: z 1.8261 p value Pz 1.8261 .5 .4664 .0336 Since only the p-value for Version 10 is below the significance level of 5%, only in version 10 do we reject the null hypothesis at a 95% confidence level. H : 602 .50 H 0 : p .5 III) The median demand is less than $602.50 0 H 1 : 602 .50 H 1 : p .5 V1: z 0.3648 , p value Pz 0.3648 .5 .1406 .3594 V2-4: z 0.7307 , p value Pz 0.7307 .5 .2673 .7673 V5-9: z 1.0954 , p value Pz 1.0954 .5 .3621 .8621 V10: z 1.8261 p value Pz 1.8261 .5 .4664 .9664 Since none of these are below the significance level of 5%, none of these lead to a rejection of the null hypothesis at a 95% confidence level. d) (Extra credit) Show explicitly what the conclusion in c) would be if the sample of 30 came from a population of 60. (1) p0 q0 60 30 .5.5 = 0.50847 .00833 .0042373 .065094 60 1 30 n p p0 p .5 The test ratio is z and is now larger in absolute value than it was in c). We can put the .065094 p p N n N 1 p-values for the one-sided hypotheses under the hypotheses in the table below. Version z-score .5333 .5 0.5116 .065094 .5667 .5 0.7307 V2-4: z .091287 .6000 .5 1.5362 V5-9: z .065094 .6667 .5 2.5609 V10: z .065094 V1: z H 0 : 602 .50 H 1 : 602 .50 H 0 : p .5 H 1 : p .5 H 0 : 602 .50 H 1 : 602 .50 H 0 : p .5 H 1 : p .5 Pz 0.5116 .5 .1950 .3050 Pz 0.5116 .5 .1950 .6950 Pz 0.7307 .5 .2673 .7673 Pz 0.7307 .5 .2673 .2327 Pz 1.5362 .5 .4382 .9382 Pz 1.5362 .5 .4382 .0618 Pz 2.5609 .5 .4948 .9948 Pz 2.5609 .5 .4948 .0052 . If we look at these, mentally double the smaller of the two probabilities and compare the p-values with .05 , we see that, though some of these p-values have fallen, the only change to our results is that for H 0 : 602 .50 H 0 : p .5 Version 10 we will now reject the null hypothesis for as well as for the H 1 : 602 .50 H 1 : p .5 right-sided test. e) (Extra credit) Find a two sided confidence interval for the median (2) . At this point I’m unsure if there is any significance level implied. The easiest way for me to do this is to copy out the first part of the binomial 18 252y0753 10/19/07 (Open in ‘Print Layout’ format) table for n 30 . {bin} Recall that if we take the k th number from both the top and the bottom as our interval. We get a significance level of 2 Px k 1 , when p .5 . For example if n 50 2Px 19 1 203245 .06490 , 2Px 18 1 2.01642 .03264 2Px 17 1 2.00767 .01534 and 2Px 16 1 200330 .00660 . If the confidence level is to be at least 1 , the significance level must be at most . So, if we want a 95% confidence interval we need k 18 (and 50 k 1 33 ), for a 98% confidence level we need k 17 (and 32) and for a 99% confidence level we need k 16 (and 33). Unfortunately, we do not have a binomial table for n 30 n 1 z .2 n 30 1 1.960 30 10.13 , for 98% this 2 2 30 1 2.327 30 30 1 2.576 30 would be k 9.13 and for 99% this would be k 8.44 . These 2 2 are all rounded down and are paired with the number with index 30 k 1 , which takes the values 21, 22 and 23. The confidence intervals are given on the following table. so we must try k . For 95% this would be k 95% confidence interval for the median Index V1 10 552 to 21 626 V2 557 V3 562 V4 567 V5 572 V6 577 V7 582 V8 587 V9 592 V10 597 631 636 641 646 651 656 661 666 671 98% confidence interval for the median Index V1 9 544 to 22 627 V2 549 V3 554 V4 559 V5 564 V6 569 V7 574 V8 579 V9 584 V10 589 632 637 642 647 652 657 662 667 672 99% confidence interval for the median Index V1 8 527 to 23 630 V2 532 V3 537 V4 542 V5 547 V6 552 V7 557 V8 562 V9 567 V10 572 635 640 645 650 655 660 665 670 675 19 252y0753 10/19/07 (Open in ‘Print Layout’ format) Problem 3: a) Find the sample mean and sample standard deviation for the data in Problem 1 (1) b) Test the hypothesis that the mean is 602.50 using critical values for the sample mean, first stating your hypotheses clearly. Use a 98% confidence level (2) c) Test the hypothesis in b) using a test ratio. Find an approximate p-value and state and explain whether this will lead to a rejection of the null hypothesis if we continue to use a 98% confidence level. (2) d) Using the test ratio you found in c) find a p-value for the null hypothesis that the mean is at most 602.50 (1) e) Using the test ratio you found in c) find a p-value for the null hypothesis that the mean is at least 602.50 (1) f) Test the null hypothesis that the mean is at most 602.50 using an appropriate confidence interval (1) g) Test the null hypothesis that the mean is at least 602.50 using an appropriate confidence interval (1) [26] The data in use is as below. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 V1 484 653 527 600 552 662 583 544 558 525 504 611 617 621 493 562 626 633 516 639 630 617 514 638 621 603 632 627 517 636 V2 489 658 532 605 557 667 588 549 563 530 509 616 622 626 498 567 631 638 521 644 635 622 519 643 626 608 637 632 522 641 V3 494 663 537 610 562 672 593 554 568 535 514 621 627 631 503 572 636 643 526 649 640 627 524 648 631 613 642 637 527 646 V4 499 668 542 615 567 677 598 559 573 540 519 626 632 636 508 577 641 648 531 654 645 632 529 653 636 618 647 642 532 651 V5 504 673 547 620 572 682 603 564 578 545 524 631 637 641 513 582 646 653 536 659 650 637 534 658 641 623 652 647 537 656 V6 509 678 552 625 577 687 608 569 583 550 529 636 642 646 518 587 651 658 541 664 655 642 539 663 646 628 657 652 542 661 V7 514 683 557 630 582 692 613 574 588 555 534 641 647 651 523 592 656 663 546 669 660 647 544 668 651 633 662 657 547 666 V8 519 688 562 635 587 697 618 579 593 560 539 646 652 656 528 597 661 668 551 674 665 652 549 673 656 638 667 662 552 671 V9 524 693 567 640 592 702 623 584 598 565 544 651 657 661 533 602 666 673 556 679 670 657 554 678 661 643 672 667 557 676 V10 529 698 572 645 597 707 628 589 603 570 549 656 662 666 538 607 671 678 561 684 675 662 559 683 666 648 677 672 562 681 Minitab offers the summary statistics below. Version 1 2 3 4 5 6 7 8 9 10 n 30 30 30 30 30 30 30 30 30 30 x 584.83 589.83 594.83 599.83 604.83 609.83 614.83 619.83 624.83 629.83 sx 9.91 9.91 9.91 9.91 9.91 9.91 9.91 9.91 9.91 9.91 sx Q1 Median Q3 54.26 54.26 54.26 54.26 54.26 54.26 54.26 54.26 54.26 54.26 526.50 531.50 536.50 541.50 546.50 551.50 556.50 561.50 566.50 571.50 607.00 612.00 617.00 622.00 627.00 632.00 637.00 642.00 647.00 652.00 630.50 635.50 640.50 645.50 650.50 655.50 660.50 665.50 670.50 675.50 x x 17545 17695 17845 17995 18145 18295 18445 18595 18745 18895 2 10346275 10522475 10700175 10879375 11060075 11242275 11425975 11611175 11797875 11986075 20 252y0753 10/19/07 (Open in ‘Print Layout’ format) a) Find the sample mean and sample standard deviation for the data in Problem 1 (1) Solution: There isn’t a good reason to repeat the calculations here for more than one example. So I will stick to Version 1 Index x 1 484 234256 2 653 426409 3 527 277729 4 600 360000 5 552 304704 6 662 438244 7 583 339889 8 544 295936 9 558 311364 10 525 275625 11 504 254016 12 611 373321 13 617 380689 14 621 385641 15 493 243049 16 562 315844 17 626 391876 18 633 400689 19 516 266256 20 639 408321 21 630 396900 22 617 380689 23 514 264196 24 638 407044 25 621 385641 26 603 363609 27 632 399424 28 627 393129 29 517 267289 30 636 404496 Sum 17545 10346275 For these numbers x x 17545, x 2 x x 2 xx x2 -100.833 68.167 -57.833 15.167 -32.833 77.167 -1.833 -40.833 -26.833 -59.833 -80.833 26.167 32.167 36.167 -91.833 -22.833 41.167 48.167 -68.833 54.167 45.167 32.167 -70.833 53.167 36.167 18.167 47.167 42.167 -67.833 51.167 0.000 10167.4 4646.7 3344.7 230.0 1078.0 5954.7 3.4 1667.4 720.0 3580.0 6534.0 684.7 1034.7 1308.0 8433.4 521.4 1694.7 2320.0 4738.0 2934.0 2040.0 1034.7 5017.4 2826.7 1308.0 330.0 2224.7 1778.0 4601.4 2618.0 85374.2 10346275 and n 30 . This means that x 17545 584 .3333 . If we subtract this mean from all 30 numbers in the first column, we get n 30 the 3 column which has the sum rd x x 2 x x 0 . If we square the 3 rd column, we get 85374.2 . Using the computational or definitional formula, we get the following. s x2 x 2 nx 2 n 1 10346275 30 584 .8333 2 29 x x n 1 2 85374 .17 2943 .9367 29 2 s 2943 .9367 98 .1312 9.9061 . n 30 You could have gotten this using the shortcut at the beginning of the Takehome document as follows. So s x 2943 .9367 54 .25806 . s x x a x na, 17395 305 17545 1071575 2517545 330 52 10346275 . x a x 2a x na 2 2 2 21 252y0753 10/19/07 (Open in ‘Print Layout’ format) b) Test the hypothesis that the mean is 602.50 using critical values for the sample mean, first stating your hypotheses clearly. Use a 98% confidence level (2) Solution: As usual, we go back to Table 3. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Mean ( x t 2 s x xcv t 2 s x x 0 H0 : 0 t unknown) sx H1 : 0 DF n 1 s sx n 29 n 1 Note that .02 and n 30 , so that t =2.462. Recall that s 9.9061 and t 2 .01 x H 0 : 602 .50 . xcv t s x 602 .50 2.462 9.9061 602 .50 24.39 or 578.11 to 626.89 2 H 1 : 602 .50 Make a diagram. Show an approximately Normal curve with a mean at 602.50 and shaded ‘reject’ zones above 626.89 and below 578.11. None of the means below will fall into a ‘reject’ zone except the sample mean for Version 10. Version x 1 2 3 4 5 6 7 8 9 10 584.83 589.83 594.83 599.83 604.83 609.83 614.83 619.83 624.83 629.83 c) Test the hypothesis in b) using a test ratio. Find an approximate p-value and state and explain whether this will lead to a rejection of the null hypothesis if we continue to use a 98% confidence level. (2) x 0 x 602 .50 t If you wish, make an approximately Normal curve with a mean at zero and sx 9.9061 29 ‘reject’ zones above t .01 = 2.462 and below -2.462 and compare your value of t with the ratios computed below. In order to find the p-value, we look at the t table to find the following for 29 degrees of freedom. df .45 .40 .35 .30 .25 .20 .15 .10 .05 .025 .01 .005 .001 29 0.127 0.256 0.389 0.530 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.396 Values of t appear in the table below. If we compare t 1.78375 , the ratio for Version 1, with the table 29 29 29 1.699< 1.78375 < t .025 2.045. Since t .05 1.699, means Pt 1.699 .05 , we can values we find t .05 conclude that .025 Pt 1.78735 .05 or, by symmetry, .025 Pt 1.78735 .05 . For a two-sided test the p-value is the probability of getting something as extreme as or more extreme than x 584 .83 is twice the probability Pt 1.78735 , so we can say that .05 p value .10 . The rest are shown on the table below. t comp Location of t comp Version x 1 584.83 29 29 -1.78375 t .05 1.699< t comp < t .025 2.045 .05 p value .10 2 589.83 29 29 -1.27901 t .15 1.055< t comp < t .10 1.311 .20 p value .30 3 594.83 29 29 -0.77427 t .25 0.683< t comp < t .20 0.854 .40 p value .50 4 599.83 29 29 -0.26953 t .40 0.256< t comp < t .35 0.389 .70 p value .80 5 604.83 29 29 0.23521 t .45 0.127< t comp < t .40 0.256 .80 p value .90 Approximate p-value 22 252y0753 10/19/07 (Open in ‘Print Layout’ format) 6 609.83 29 29 0.73995 t .25 0.683< t comp < t .20 0.854 .40 p value .50 7 614.83 29 29 1.24469 t .15 1.055< t comp < t .10 1.311 .20 p value .30 8 619.83 29 29 1.74943 t .05 1.699< t comp < t .025 2.045 .05 p value .10 9 624.83 29 29 2.25417 t .025 2.045< t comp < t .01 2.462 .02 p value .05 10 629.83 29 29 2.75891 t .005 2.756< t comp < t .001 3.396 .002 p value .01 If .02 , only Version 10 would give a rejection of the null hypothesis. d) Using the test ratio you found in c) find a p-value for the null hypothesis that the mean is at most 602.50 H : 602 .50 (1) We are now testing 0 . This is a right-sided test. H 1 : 602 .50 Values of t are repeated in the table below. If we compare t 1.78375 , the ratio for Version 1, with the 29 29 29 table values we find t .05 1.699< 1.78375 < t .025 2.045. Since t .05 1.699, means Pt 1.699 .05 , we can conclude that .025 Pt 1.78735 .05 or, by symmetry, .025 Pt 1.78735 .05 . For a right-sided test the p-value is the probability of getting something as high as or higher than x 584 .83 is the probability Pt 1.78735 , so we can say that .95 p value .975 . The rest are shown on the table below. Note that if the significance level is .02, we will definitely reject the null hypothesis in Version 10 and probably in Version 9. Version Location of t comp Approximate p-value x t comp 1 584.83 29 29 -1.78375 t .05 1.699< t comp < t .025 2.045 .95 p value .975 2 589.83 29 29 -1.27901 t .15 1.055< t comp < t .10 1.311 .85 p value .90 3 594.83 29 29 -0.77427 t .25 0.683< t comp < t .20 0.854 .75 p value .90 4 599.83 29 29 -0.26953 t .40 0.256< t comp < t .35 0.389 .60 p value .75 5 604.83 29 29 0.23521 t .45 0.127< t comp < t .40 0.256 .40 p value .45 6 609.83 29 29 0.73995 t .25 0.683< t comp < t .20 0.854 .20 p value .25 7 614.83 29 29 1.24469 t .15 1.055< t comp < t .10 1.311 .10 p value .15 8 619.83 29 29 1.74943 t .05 1.699< t comp < t .025 2.045 .025 p value .05 9 624.83 29 29 2.25417 t .025 2.045< t comp < t .01 2.462 .025 p value .01 10 629.83 29 29 2.75891 t .005 2.756< t comp < t .001 3.396 .001 p value .005 e) Using the test ratio you found in c) find a p-value for the null hypothesis that the mean is at least 602.50 (1) H 0 : 602 .50 We are now testing . This is a left -sided test. H 1 : 602 .50 Values of t are repeated in the table below. If we compare t 1.78375 , the ratio for Version 1, with the 29 29 29 1.699< 1.78375 < t .025 2.045. Since t .05 1.699, means Pt 1.699 .05 , table values we find t .05 we can conclude that .025 Pt 1.78735 .05 or, by symmetry, .025 Pt 1.78735 .05 . For a right-sided test the p-value is the probability of getting something as low as or lower than x 584 .83 is the probability Pt 1.78735 , so we can say that .025 p value .05 . The rest are shown on the table below. Note that your p-values for d) and e) should add to 1. 23 252y0753 10/19/07 (Open in ‘Print Layout’ format) t comp Location of t comp Version x 1 584.83 29 29 -1.78375 t .05 1.699< t comp < t .025 2.045 .025 p value .05 2 589.83 29 29 -1.27901 t .15 1.055< t comp < t .10 1.311 .10 p value .15 3 594.83 29 29 -0.77427 t .25 0.683< t comp < t .20 0.854 .20 p value .25 4 599.83 29 29 -0.26953 t .40 0.256< t comp < t .35 0.389 .35 p value .40 5 604.83 29 29 0.23521 t .45 0.127< t comp < t .40 0.256 .55 p value .60 6 609.83 29 29 0.73995 t .25 0.683< t comp < t .20 0.854 .75 p value .80 7 614.83 29 29 1.24469 t .15 1.055< t comp < t .10 1.311 .85 p value .90 8 619.83 29 29 1.74943 t .05 1.699< t comp < t .025 2.045 .95 p value .975 9 624.83 29 29 2.25417 t .025 2.045< t comp < t .01 2.462 .975 p value .99 10 629.83 29 29 2.75891 t .005 2.756< t comp < t .001 3.396 .995 p value .999 Approximate p-value Note that if the significance level is .02, we will never reject the null hypothesis. f) Test the null hypothesis that the mean is at most 602.50 using an appropriate confidence interval (1) 29 I’m surprised that no one called me on this. To do this correctly you need t .02 2.150, which is not on any H 0 : 602 .50 of your tables. Here you get points for thinking, so I’ll see what you did. . Recall H 1 : 602 .50 .02 , n 30 , s x 9.9061 and the two-sided formula is x t s x , which becomes x t s x x 2.150 9.9061 x 21.30 . For the results see the table after g). 2 g) Test the null hypothesis that the mean is at least 602.50 using an appropriate confidence interval (1) H 0 : 602 .50 29 2.150, n 30 , s x 9.9061 and the two-sided formula is Recall .02 , t .02 H : 602 . 50 1 x t 2 s x , which becomes x t s x x 2.150 9.9061 x 21.30 . The intervals in both f) and g) contain the sample mean. H 0 : 602 .50 H 0 : 602 .50 x t s x x t s x Version x 1 584.83 = 563.53 = 606.13 2 589.83 = 568.53 = 611.13 3 594.83 = 573.53 = 616.13 4 599.83 = 578.53 = 621.13 5 604.83 = 583.53 = 626.13 6 609.83 = 588.53 = 631.13 7 614.83 = 593.53 = 636.13 8 619.83 = 598.53 = 641.13 9 624.83 = 603.53 * = 646.13 10 629.83 = 608.53 * = 651.13 Note that the two starred confidence intervals contradict the null hypothesis and thus imply rejection. 24 252y0753 10/19/07 (Open in ‘Print Layout’ format) Problem 4: Assume that the population standard deviation is known to be 30 but that we are still working with a problem like Problem 3. (98% confidence level, sample of 30.) Do either Problem 4.1 or Problem 4.2. Make sure that I know which one! 30 Let’s start with Table 3. .02 , n 30 , 30 and x 30 5.4772 n 30 Interval for Confidence Hypotheses Test Ratio Critical Value Interval Mean ( x z 2 x xcv 0 z 2 x x 0 H0 : 0 z known) x H1 : 0 x n Problem 4.1. a) Find a critical value for the sample mean if we are testing whether the population mean is below 30. Clearly state your null and alternative hypotheses (2) b) Assume that the sample mean is 30 minus the second to last digit of your student number. (Use 10 if this digit is zero.) Find a p-value for your null hypothesis. (1) c) Create a power curve for the test (6) Solution: a) We find a critical value for the sample mean if we are testing whether the population mean is H : 30 below 30. We state our null and alternative hypotheses (2) 0 z.02 2.054 We need a critical H 1 : 30 value for the mean that is below 30. We use xcv 0 z x 30 2.054 5.4772 30 11.250 18.750 b) We assume that the sample mean is 30 minus the second to last digit of our student number. (Use 10 if x 0 x 30 this digit is zero.) We find a p-value for our null hypothesis. z calc will be our test ratio x 5.4772 and we will calculate p value Pz z calc . For example if the mean is 29, we compute 29 30 0.18 . Using the Normal table we find p value Pz 0.18 .5 .0714 .4286 . The 5.4772 values below were computer generated. Yours should be close. Version x z calc Pz z calc z calc 1 2 3 4 5 6 7 8 9 10 29 28 27 26 25 24 23 22 21 20 -0.18258 -0.36515 -0.54773 -0.73030 -0.91288 -1.09545 -1.27803 -1.46060 -1.64318 -1.82575 0.427566 0.357500 0.291940 0.232603 0.180654 0.136660 0.100620 0.072063 0.050173 0.033944 c) We create a power curve for the test. We do not reject the null hypothesis if our sample mean is above H 0 : 30 and that we need a power curve for all x cv 18 .750 . Remember that our hypotheses are H 1 : 30 possible values of that are below 30. The distance between 30 and the critical value is 30 – 18.750 = 11.25, half of that is 5.62, which we can round to 6. Let’s try using 30, 24, 18.75, 12 and 6 as 1 . We will 18 .750 1 . Make a diagram. Show a Normal curve with a mean of compute Px 18 .750 1 P z 5.4772 30 and shade a ‘reject’ zone below 18.750. On the same diagram make a second Normal curve of the same size as the first one with a mean at a value of 1 and shade a ‘do not reject’ zone that includes the entire 25 252y0753 10/19/07 (Open in ‘Print Layout’ format) area under the second curve above 18.750. For 1 29 this becomes P x 18.750 1 24 18 .750 24 P z Pz 0.96 .5 .3315 .8315 . If we let the computer do the dirty work, we get 5.4772 the following. 18 .750 1 z calc Point 1 Pz z calc power 1 5.4772 1 2 3 4 5 30.00 24.00 18.75 12.00 6.00 -2.05397 -0.95852 0.00000 1.23238 2.32783 0.980011 0.831099 0.500000 0.108903 0.009961 0.019989 0.168901 0.500000 0.891097 0.990039 As was explained in class, you do not need to do the calculations for points 1 and 3 since the power for 1 0 is always equal to the significance level and the power at 1 xcv is always .5. Graph the power on your y-axis against 1 on your x-axis. Problem 4.2. a) Find critical values for the sample mean if we are testing whether the population mean is 30. Clearly state your null and alternative hypotheses (2) b) Assume that the sample mean is 30 minus the second to last digit of your student number. (Use 10 if this digit is zero.) find a p-value for your null hypothesis. (1) c) Create a power curve for the test (8) [37] Solution: a) We find critical values for the sample mean if we are testing whether the population mean is H 0 : 30 30. We state our null and alternative hypotheses x 5.4772 , .02 and z.01 2.327 . H 1 : 30 We need critical values for the sample mean that are both above and below 30. These are xcv 0 z x 2 30 2.327 5.4772 30 12.745 or 17.255 and 42.745. We reject the null hypothesis if the sample mean does not fall between these values. b) We assume that the sample mean is 30 minus the second to last digit of our student number. (We use 10 x 0 x 30 if this digit is zero.) and find a p-value for our null hypothesis. z calc will be our test x 5.4772 ratio and we will calculate p value 2Pz z calc (since all our values of x are to the left of the alleged 29 30 0.18 . Using the Normal table we 5.4772 find p value 2Pz 0.18 2(.5 .0714 ) 2(.4286 ) .8572 . If we let the computer do the work, we get the table below. Your results should be similar. Version x z calc 2Pz z calc mean.). For example if the mean is 29, we compute z calc 1 2 3 4 5 6 7 8 9 10 29 28 27 26 25 24 23 22 21 20 -0.18258 -0.36515 -0.54773 -0.73030 -0.91288 -1.09545 -1.27803 -1.46060 -1.64318 -1.82575 0.855131 0.714999 0.583881 0.465207 0.361308 0.273319 0.201241 0.144125 0.100347 0.067888 c) We create a power curve for the test . We do not reject the null hypothesis if the sample mean lies between 17.255 and 42.745. The alleged mean is 30 and the distance between 30 and the critical values is 12.745, half of which we can round to 6.5. We need the power for every value of the mean. Let’s try using 30, 36.5, 42.745, 49.5 and 56 for 1 on the top side of 30 and 30, 23.5, 17.255, 10.5 and 4 on the bottom 26 252y0753 10/19/07 (Open in ‘Print Layout’ format) 42 .745 1 17 .255 1 z . For example side of 30. We will compute P17 .255 x 42 .745 1 P 5.4772 5.4772 42 .745 36 .5 17 .255 36 .5 z if 1 36 .5 , we find P17 .255 x 42 .745 1 36 .5 P 5.4772 5.4772 P3.51 1.14 .4998 .3729 .8727 . The table that follows is computer generated. Because of rounding error in the standard deviation only the first four significant figures of the operating characteristic and the power columns should be taken seriously, but your results should be very close to these. 17 .255 1 42 .745 1 z calc1 z calc2 Point 1 Pz calc1 z z calc2 power 1 5.4772 5.4772 1 2 3 4 5 6 7 8 9 4.000 10.500 17.255 23.500 30.000 36.500 42.745 49.500 56.000 2.42003 1.23329 0.00000 -1.14018 -2.32692 -3.51366 -4.65384 -5.88713 -7.07387 7.07387 5.88713 4.65384 3.51366 2.32692 1.14018 0.00000 -1.23329 -2.42003 0.007760 0.108733 0.499998 0.872674 0.980030 0.872674 0.499998 0.108733 0.007760 0.992240 0.891267 0.500002 0.127326 0.019970 0.127326 0.500002 0.891267 0.992240 Of course, this is much less work than it looks like. Only points 1, 2 and 4 need to be computed. Note that points 3 and 7 are at critical values and give powers of .5 and that point 5 is the null hypothesis mean and gives a power equal to the significance level (2%). Also the power for the points 9 through 6 is identical to the power for points 1 through 4, so that only three computations are necessary to compute the operating characteristic curve. 27 252y0753 10/19/07 (Open in ‘Print Layout’ format) Problem 5: In problem 4 we assumed that the population standard deviation is 30. a) Do a 98% confidence interval for the mean using the mean that you found in Problem 3 and assuming that our sample of 30 came from a population of 300. (2) b) How large a sample would we need if we wanted to make the error term no more than 1 and the sample came from an infinite population? (2) c) Using a 98% confidence level and a sample size of 30 create a confidence interval for the population standard deviation using your sample variance or standard deviation from Problem 3. (2) d) Repeat c) assuming that you had a sample of 300. (2) e) Can we say that the standard deviation is significantly different from 30 on the basis of c) and d)? (1) f) Using the data and sample size from problem 3 can we say that the standard deviation is above 30? State your hypotheses and do an appropriate hypothesis test. (3) [49] Solution: Interval for Confidence Hypotheses Test Ratio Critical Value Interval Mean ( x 0 x t 2 s x xcv t 2 s x H0 : 0 t unknown) s H : DF n 1 x s 1 0 sx n a) Do a 98% confidence interval for the mean using the mean that you found in Problem 3 and assuming that our sample of 30 came from a population of 300. (2) 29 2.462, s x2 2943.9367 , s x 54.25806 and used x t s x . With the In problem 3 we found t .01 2 finite population correction, we have the following. s x N n (9.9061 ) N 1 270 2943 .9367 2658 .4044 51 .5597 , so that x 2.462 51.5597 x 126 .94 . 299 If we use the population variance at the beginning of this problem, z.01 2.327 , x2 900 , x 30 and x z x . With the finite population correction we have the following. x N n (5.4772 ) N 1 270 900 812 .7090 28 .5081 , so that x 2.327 28.5081 x 66.34 . 299 Using the means for the various versions, we can get our intervals easily. x Version 1 2 3 4 5 6 7 8 9 10 584.83 589.83 594.83 599.83 604.83 609.83 614.83 619.83 624.83 629.83 x 126.94 457.89 462.89 467.89 472.89 477.89 482.89 487.89 492.89 497.89 502.89 to to to to to to to to to to 711.77 716.77 721.77 726.77 731.77 736.77 741.77 746.77 751.77 756.77 x 66.34 518.49 523.49 528.49 533.49 538.49 543.49 548.49 553.49 558.49 563.49 to to to to to to to to to to 651.17 656.17 661.17 666.17 671.17 676.17 681.17 686.17 691.17 696.17 b) How large a sample would we need if we wanted to make the error term no more than 1 and the sample came from an infinite population? (2) Solution: n z 2 2 e2 . Depending on what we believe, we can use 2 900 or s x2 2943.9367 in the 2 slot. If the confidence level is 98%, we will use z.01 2.327 and since e 2 1 , we can leave it out of 28 252y0753 10/19/07 (Open in ‘Print Layout’ format) the equation. We have either n 2.327 2 (900) = 4873.44 or n 2.3272 (2943.9367) 15941.21 and use 4874 or 15942 Problems c-f concern the variance and standard deviation and use formulas from Table 3. Interval for Confidence Hypotheses Test Ratio Critical Value Interval VarianceH 0 : 2 02 n 1s 2 n 1s 2 .25 .5 2 02 2 2 2 2 Small Sample s cv .5 .5 2 02 n 1 H1: : 2 02 VarianceLarge Sample s 2DF z 2 2DF H 0 : 2 02 H1 : 2 02 z 2 2 2DF 1 s cv 2 DF z 2 2 DF c) Using a 98% confidence level and a sample size of 30 create a confidence interval for the population standard deviation using your sample variance or standard deviation from Problem 3. (2) Solution: Recall the following. n 30 s x2 2943.9367 or s x 54.25806 . For (30 – 1) degrees of 29 29 n 1s 2 2 n 1s 2 , and freedom, 2 .99 14.2565 and 2 .01 49.5881, take the formula 2 2 substitute s x2 2943.9367 to get 29 2943 .9367 49 .5881 Or, if we take square roots, 41.49 77.38. 29 2943 .9367 2 14 .2565 2 1 2 or 1721.6663 2 59844.4379 d) Repeat c) assuming that you had a sample of 300. (2) For the appropriate value of z and the square root of twice the degrees of freedom z.01 2.327 , s 2 DF z 2 DF 2 s 2 DF z 2 DF 2 DF 2299 24 .45404 , take the formula and substitute s x 54.25806 to get , for .02 , 2 54 .25806 24 .45404 54 .25806 24 .45404 or 49.54 59.96 2.327 24 .45404 2.327 24 .45404 e) Can we say that the standard deviation is significantly different from 30 on the basis of c) and d)? (1) It is enough to check our results from the confidence intervals, though a more formal test of H 0 : 30 could be done using 2 n 1s 2 02 and/or setting z 2 2 2 DF 1 could be done. Simply put, since 30 falls on neither interval, there is a significant difference between 30 and the standard deviation from our sample. f) Using the data and sample size from problem 3 can we say that the standard deviation is above 30? State H 0 : 30 your hypotheses and do an appropriate hypothesis test. (3) The pair of hypotheses are H 1 : 30 H 0 : 2 900 equivalent to . If n 30 , so there are 29 degrees of freedom, we can use the test ratio H 1 : 2 900 29 2943 .9367 94 .8602 . If we maintain a 98% confidence level, our ‘reject’ zone will 900 29 29 be the area above 2 .02 . Our table does not give us this value, but we can say that 2 .01 49.5881 2 n 1s 2 02 29 252y0753 10/19/07 (Open in ‘Print Layout’ format) 29 29 and 2 .025 45.7224 , so that 2 .02 must lie between them and 94.8602 must be in the ‘reject’ zone. A p29 value approach would observe that the largest number in the df = 29 column is 2 .005 52.3360 and, since 94.8602 is above this p value P 2 94.8602 .005 .02 . So we reject H 0 . 30