252 1takehome081 2/22/08 ECO252 QBA2 FIRST EXAM February, 2008 TAKE HOME SECTION Name: _________________________ Student Number and class time: _________________________ IV. Do at least 3 problems (at least 7 each) (or do sections adding to at least 20 points - Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H 1 where appropriate. You have not done a hypothesis test unless you have stated your hypotheses, run the numbers and stated your conclusion. (Use a 95% confidence level unless another level is specified.) Answers without reasons usually are not acceptable. Neatness and clarity of explanation are expected. This must be turned in when you take the in-class exam. Note that answers without reasons and citation of appropriate statistical tests receive no credit. Failing to be transparent about which section of which problem you are doing can lose you credit. Many answers require a statistical test, that is, stating or implying a hypothesis and showing why it is true or false by citing a table value or a p-value. If you haven’t done it lately, take a fast look at ECO 252 - Things That You Should Never Do on a Statistics Exam (or Anywhere Else). Problem 1: (Doane and Seward) A fast food restaurant has just started serving hot cocoa. The management wishes to serve cocoa of an average temperature of 142 degrees. 24 measurements of the temperature in 10 stores are taken. You are manager of store a and will use the corresponding column, where a is the second to last digit of your student number. (For example, Seymour Butz’ student number is 543987 so he uses column x8.) If that number is zero, use column 10. You are testing to see if the mean for your store is 142. There will be a penalty if you do not make it clear what column you are using. Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 x1 140 142 141 142 141 141 145 142 142 142 137 139 139 144 140 141 140 140 140 139 146 138 139 142 x2 142 143 141 142 139 142 144 145 143 143 141 139 143 144 140 138 141 140 142 140 138 139 142 141 x3 142 143 141 142 142 140 143 142 141 136 142 138 142 139 140 143 139 139 139 141 143 141 140 143 x4 143 138 140 140 142 139 142 139 145 139 142 138 142 141 142 141 141 140 142 141 143 141 140 140 x5 144 139 144 143 141 142 141 138 144 145 139 141 145 142 144 145 142 144 136 138 141 142 140 141 x6 142 142 142 140 141 141 137 142 139 141 141 143 141 142 140 142 142 142 139 142 143 143 142 140 x7 143 144 144 145 146 142 145 142 145 143 142 142 141 147 141 137 142 140 144 142 147 146 140 143 x8 146 145 144 144 145 144 141 141 144 144 139 142 141 142 144 145 139 142 143 146 142 144 144 146 x9 143 144 141 144 142 140 142 141 146 140 143 144 142 143 142 141 142 144 144 145 145 141 143 142 x10 143 139 145 144 145 142 141 140 142 142 142 146 142 143 142 140 144 143 141 144 143 141 139 142 Assume that the Normal distribution applies to the data and use a 98% confidence level. a. Find the sample mean and sample standard deviation of the incomes in your data, showing your work. (1) (Your mean should be between 140 and 146 and your sample standard deviation should be around 2.) b. State your null and alternative hypotheses (1) c. Test the hypothesis using a test ratio (1) d. Test the hypothesis using a critical value for a sample mean. (1) 1 252 1takehome081 2/22/08 e. Test the hypothesis using a confidence interval (1) f. Find an approximate p-value for the null hypothesis. (1) g. On the basis of your tests, is the mean temperature correct in your restaurant?? Why? (1) h. How do your conclusions change if the random sample of 24 temperatures is taken on a day in which only 48 cups cocoa are sold? (2) i. Assume that the Normal distribution does not apply and test to see if the median is 142. Be careful! What should you do with numbers that are exactly 142? (2) [12] j. (Extra Credit) Do a 98% confidence interval for the median. (2) Problem 2: Once again assume that the Normal distribution applies to the data in Problem 1, but that we know that the population standard deviation is 2. Our confidence level remains 98%, but we are now testing the hypothesis that the mean is below 143 degrees. a. State your null and your alternative hypotheses. (1) b. Find the value of z that you need for a critical value for a 1-sided test if the confidence level is 98%. You may use a confidence level of 99% if you wish for slightly less credit. c. Find a critical value for the sample mean to test if the mean is below 143 degrees. (1) d. Test the hypothesis that the mean is below 143 degrees using an appropriate confidence interval. (2) e. Using your critical value from 2b, create a power curve for your test. (6) f. Assume that the population standard deviation is 2. How large a sample do you need to get a two-sided 98% confidence interval with an error not exceeding 0.5 degrees? (2) [22] Problem 3: According to Doane and Seward about 13% of goods bought at a department store are returned. An organization called Return Exchange will sell you a software product called Verify-1for which it makes the claims below. Verify-1® is quickly operational. And it authorizes returns even quicker Verify-1® identifies fraud and abuse at the point of return before they become liabilities to your brand equity or profits. In stand-alone mode, this easy-to-use, turnkey solution can be operational in 30 days and will reduce your return rate immediately, without disrupting your business or IT configuration. Verify-1® also integrates easily into your existing POS platform. You set the policy, Verify-1® enforces it With Verify-1®, your returns are dealt with consistently utilizing advanced statistical modeling in combination with state return laws and your existing return policies. At the point of return, using the customer’s driver’s license or other valid identification, Verify-1® automatically checks prior return behavior and authorizes or declines the transaction. Customers identified as risks for presenting fraudulent returns are declined, while legitimate returns are speedily accepted. You take a sample of n items and find that there were x returns (about 9%).You are the manager of store a . (a is the last digit of your student number. (For example, Seymour Butz’ student number is 543987 so he manages store 7.) The sample size and number of returns for your store is given below. On the basis of this sample, can you now say that the return rate is now below 13%? Use a confidence level of 95%. Store 1 2 3 4 5 6 7 8 9 10 n 275 250 225 200 175 150 125 100 75 50 x 25 22 20 18 16 13 11 9 7 4 a) State your null and alternative hypotheses. (1) Make sure I know which store you manage. b) Test the hypothesis using a test ratio or a critical value for the observed proportion. (1) Make a diagram showing clearly where your ‘reject’ region is. (Do not round excessively. If you compute proportions carry at least 3 significant figures.) c) Find a p-value for your null hypothesis. (1) d) Test your hypothesis using an appropriate confidence interval. (2) [5] e) Using the 13% proportion as an estimate of the true proportion, find out how large a sample you need to create a 95% confidence interval with an error of no more than 1% (2) 2 252 1takehome081 2/22/08 f) (Extra credit) Remember that the method that you have been using to deal with proportions substitutes the Normal distribution for the binomial distribution. In general the p-values that you have computed are lower than you would get if you used the binomial distribution. Verify this by making a continuity correction as described in the outline and repeating your test in c). (2) g) (Extra credit) Using 13%, your critical value, a point between your critical value and 13% and one or two other points on the side of the critical value implied by the alternative hypothesis (only one point on this side may give a reasonable value for a proportion) put together a power curve for your test. Remember that your standard error will change if the true proportion changes. (8) h) Go back to the test in parts a) b) and c) of this problem. Take your values of n and x and multiply them by 1.6, rounding your values to the nearest whole number (or numbers) if necessary. Find the new value of the test ratio and get a p-value. What does the change in p-value between parts c) and g) suggest about the effect of increased sample size on the power of the test? (3) [32] Problem 4: According to Doane and Seward both the mean and the standard deviation of pH (a measure of acidity) are of interest to winemakers. Assume that your firm (store from the last problem) has gotten into the wine business. A sample of 16 wine bottles is taken. Your column has the same number as your store. Minitab has calculated all sorts of sample statistics on your data. These are listed below. Use them. Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 C1 C2 C3 C4 C5 C6 C7 C8 C9 3.41 3.45 3.51 3.52 3.68 3.29 3.39 3.57 3.38 3.14 3.61 3.23 3.48 3.39 3.49 3.50 3.44 3.42 3.45 3.48 3.68 3.45 3.42 3.50 3.41 3.36 3.69 3.40 3.48 3.48 3.45 3.63 3.61 3.59 3.63 3.65 3.87 3.62 3.59 3.67 3.58 3.52 3.87 3.57 3.66 3.65 3.62 3.81 3.39 3.37 3.41 3.44 3.66 3.41 3.37 3.45 3.36 3.30 3.66 3.35 3.44 3.43 3.40 3.60 3.41 3.39 3.43 3.46 3.69 3.43 3.39 3.48 3.38 3.32 3.70 3.37 3.46 3.45 3.42 3.63 3.43 3.41 3.45 3.47 3.69 3.44 3.41 3.49 3.40 3.34 3.70 3.39 3.48 3.47 3.44 3.64 3.40 3.38 3.42 3.45 3.68 3.42 3.38 3.47 3.37 3.31 3.68 3.36 3.45 3.44 3.41 3.62 3.56 3.53 3.59 3.63 3.95 3.58 3.53 3.65 3.52 3.43 3.95 3.51 3.63 3.62 3.57 3.87 3.53 3.56 3.63 3.65 3.82 3.39 3.50 3.70 3.49 3.23 3.75 3.32 3.59 3.51 3.61 3.62 Variable C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 N 16 16 16 16 16 16 16 16 16 16 N* 0 0 0 0 0 0 0 0 0 0 Variable C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 Maximum 3.6800 3.6900 3.8700 3.6600 3.7000 3.7000 3.6800 3.9500 3.8200 3.4400 Mean 3.4400 3.4837 3.6569 3.4400 3.4631 3.4781 3.4525 3.6325 3.5562 3.2000 SE Mean 0.0347 0.0245 0.0259 0.0268 0.0281 0.0265 0.0278 0.0388 0.0382 0.0347 StDev 0.1387 0.0980 0.1037 0.1072 0.1124 0.1061 0.1110 0.1553 0.1528 0.1387 Minimum 3.1400 3.3600 3.5200 3.3000 3.3200 3.3400 3.3100 3.4300 3.2300 2.9000 Q1 3.3825 3.4200 3.5900 3.3700 3.3900 3.4100 3.3800 3.5300 3.4925 3.1425 C10 3.17 3.21 3.27 3.28 3.44 3.05 3.15 3.33 3.14 2.90 3.37 2.99 3.24 3.15 3.25 3.26 Median 3.4650 3.4500 3.6250 3.4100 3.4300 3.4450 3.4200 3.5850 3.5750 3.2250 Q3 3.5175 3.4950 3.6675 3.4475 3.4750 3.4875 3.4650 3.6450 3.6450 3.2775 3 252 1takehome081 2/22/08 You must state H 0 and H 1 where applicable to get credit for any of the tests below. Make sure that I know which column you are using! a) The acceptable standard deviation for wine pH is 0.10. Using the data for your store, test the hypothesis that the standard deviation is 0.10 using a 95% confidence level. (2) b) Test the hypothesis that the standard deviation is below .14. (1) c) Repeat a) and b) using the sample (mean and) variance you used in a) and b) but assuming a sample size of 100. Find p-values. (4) d) Find 2-sided 95% confidence interval for the standard deviation using data from your store and assuming a sample size of 16. (2) e) Repeat d) for a sample size of 100. (1) [41] f) Here’s the easiest question on the exam. By now you should have figured out that you don’t have to understand a statistical test at all if you know i) what it assumes, ii) what the null hypothesis is and iii) what the p-value is associated with the null hypothesis. So, I am going to do a test that the standard deviation is 0.1 on the following data set. C11 3.53 3.49 3.51 3.57 3.54 3.57 3.57 3.54 3.78 3.72 3.54 3.51 3.59 3.50 3.44 3.78 Then I am going to run a Lilliefors test on these data using Minitab. The null hypothesis of the Lilliefors test is that the sample comes from the Normal distribution. The test makes no assumptions about the mean and standard deviation of the population and computes these as sample statistics from the data. After it printed ‘Probability plot of C11,’ the computer printed a graph of the data, but the only thing I looked at was the p-value which was less than .01. After the Lilliefors test, the computer printed out the results of two versions of a statistical test on the standard deviation. The ‘Standard’ version is the method that you learned and is only applicable if the data comes from a Normal distribution. The ‘Adjusted’ version is for all other cases. So explain what p-value I look at and what it tells me. MTB > NormTest c11; SUBC> KSTest. Probability Plot of C11 MTB > OneVariance c11; SUBC> Test .1; SUBC> Confidence 95.0; SUBC> Alternative 0; SUBC> StDeviation. Test and CI for One Standard Deviation: C11 Method Null hypothesis Alternative hypothesis Sigma = 0.1 Sigma not = 0.1 The standard method is only for the normal distribution. The adjusted method is for any continuous distribution. Statistics Variable N C11 16 StDev 0.100 Variance 0.0100 95% Confidence Intervals Variable C11 Tests Variable C11 Method Standard Adjusted CI for StDev (0.074, 0.155) (0.071, 0.170) Method Standard Adjusted Chi-Square 15.06 11.12 CI for Variance (0.0055, 0.0240) (0.0050, 0.0288) DF 15.00 11.07 P-Value 0.895 0.880 4