Practice Problems for Exam2 Paraphrased problems from EMBS. 30. (p 484) A large auto insurance company selected random samples of single and married male policyholders and recorded the number who made a claim over the preceding three-year period. Single Male Policy-Holders n=400 Number making claims = 76 Married Male Policy-Holders n=900 Number making claims = 90 Is the observed difference in claim rates statistically significant? claim single none 76 married 90 166 324 400 810 900 1134 1300 EXPECTED 51.1 114.9 348.9 785.1 12.2 5.4 1.8 0.8 DISTANCES Calculated Chi-squared 20.1 Pvalue = chidist(20.1,1) 7.20624E-06 pvalue = chitest(observed,expected) 7.20624E-06 YES, we reject the null hypothesis of independence because the p-value is less than 0.05. The difference in sample claim rates are statistically significant. 40. (p 486) The Wall Street Journal Subscriber Study gathered data on the employment status of a random sample of subscribers. Sample results were broken out for subscribers of the eastern and western editions. Region EMPLOYMENT STATUS Eastern Edition Western Edition Full-Time 1105 574 Part-Time 31 15 Self Employed 229 186 Not employed 485 344 Test the hypothesis that employment status is independent of region. EXPECTED 1046.2 28.7 258.6 516.6 1850 632.8 17.3 156.4 312.4 1119 3.31 0.19 3.39 1.93 5.46 0.32 5.60 3.19 1679 46 415 829 2969 DISTANCES Calculated Chi-square 23.4 p-vaue=chidist(2969,3) 3.3759E-05 p-value=chitest(observed,expected) 3.3759E-05 We reject the null hypothesis that employment status and region are independent. PEP Problem NOT in the Book. THE MARK and UPTOWN KITCHEN are separate restaurants owned by the same firm. The weekend revenue generated by THE MARK is normally distributed with mean $80,000 and standard deviation $10,000. The UPTOWN KITCHEN revenues are normally distributed with mean $100,000 and standard deviation of $20,000. The following questions refer to the total weekend revenue enjoyed by the firm. a. What is the mean? b. What is the standard deviation? c. What is the shape of the distribution? e. Which of your answers to a,b,c requires revenues from the two restaurants to be independent? f. Which of your answers to a,b,c requires that revenues at each store are normal? g. (Difficult?) Next weekend, which restaurant will bring in more revenue? a. mean of the sum is the sum of the means, always. The mean total revenue is $180,000. b. The variance of the sum is the sum of the variances IF independent. So if independent revenues, the standard deviation of total revenue is (10000^2+20000^2)^.5 = $22,361 c. Normal. Sums of normal are always normal. d. only the answer to b requires independence. e. only the answer to c. requires normality of individual revenues. The central limit theorem does not apply because n is only 2. f. The answer to b. requires independence so that we can add the variances. g. The trick to answering this is to realize it is a question about the difference in revenues. The difference in revenues (UPTOWN – MARK) will be normal with mean $20,000 and standard deviation $22,361 (the variance of a difference is the sum of the variances given independence….so the difference is just as unpredictable as the sum). So the difference will be negative (MARK will bring in more revenue) with probability NORMDIST(0,20000,22361,true) = 0.186. So the Mark will bring in more revenue than UPTOWN with probability 0.186 (given independence…which is NOT a reasonable assumption…but without it, we can’t do much…) 46 (p 341) AARP estimated that the average annual expenditure on restaurants and carryout food was $1,873 for people 50 and over. Suppose this estimate is based on a random sample of 80 people and that the sample standard deviation is $550. b. What is the 95% confidence interval for the population mean amount spent on restaurants and carryout. sample mean t.inv.2t(.05,79) s s/n^.5 margin of error Lower limit Upper limit 1873 1.99045 550 61.5 122.4 1750.6 1995.4 d. If the distribution of amounts is positively skewed (not normal), would you expect the sample median amount to be greater or less than $1,873? We would expect the sample median to be less than the sample mean for a positively skewed distribution. e. (not in the book). If the distribution of amounts is positively skewed (not normal), does this make your answer to b.) invalid? Briefly explain. A normal population is not required for our confidence intervals to be valid. This is because of the central limit theorem saying sample means are normal when n is big regardless of the shape of the population distribution. Here n is 80 and big enough for the central limit theorem to apply. 52 (p 390). The chamber of commerce of a Florida Gulf Coast community advertises that residential property is available at a mean cost of $125,000 or less per acre. A random sample of 32 properties has a sample mean cost per acre of $130,000 and sample standard deviation of $12,500. Comment on the validity of the advertising statement. H0 is mean=125. Ha is mean > 125. T-stat is (130-125)/(12.5/32^.5) = 2.27. Pvalue = t.dist.rt(2.27,31) = 0.015. This is statistically significant. We can reject the hypothesis that the claim is truthful in favor of an alternative hypothesis that the mean is greater than $125K. 56 (p391). Virtual call centers are staffed by individuals working from home. Regional Airways is considering switching from its traditional call center to a virtual one but only if a level of customer satisfaction greater than 80% can be maintained. In a test of home agents, 252 out of 300 randomly chosen customers reported being satisfied with their experience with the call center. a. Formulate a relevant null and alternative hypothesis. b. What is the sample proportion of satisfied customers? c. What is the p-value of your test of hypothesis? d. What is your conclusion? Let H0 be P (the satisfaction probability) = 0.8 and Ha: P>0.8. Here we hope to reject the null hypothesis. We have a variety of ways to calculate the p-value: Using the binomial p-value = Pr(X>=252 given H0) = 1-BINOMDIST(251,300,.8,true) = 0.046. Using the normal approximation to the binomial p-value = Pr(X>=252 gien Ho) = 1-NORMDIST(252,300*.8,(300*.8*.2)^.5,true) = 0.042. Using the Z-statistic calculated using number satisfied p-value = Pr(Z>=(252-240)/(300*.8*.2)^.5) = Pr(Z>=1.73) = 1-NORM.S.DIST(1.73,true) = 0.042. Using the Z-statistic calculated using sample proportion satisfied p-value = Pr(Z>= (252/300-.8)/(.8*.2/300)^.5) = 1-NORM.S.DIST(1.73,true) = 0.042. The sample proportion satisfied is 252/300 and our conclusion is that this sample proportion is statistically significant higher than 0.8. 42 (p446). Mutual funds are either load or no-load. Because load mutual funds charge fees not charged by no load funds, the question is whether load funds provide a higher mean return. Returns from a random sample of 30 load and 30 no load mutual funds are provided in an accompanying Excel spreadsheet. a. Formulate a null and alternative hypothesis such that rejection of the null leads to the conclusion that load funds have higher mean returns. b. Use the data to test your hypothesis. What is the p-value and your conclusion? H0: mean returns are equal. Ha: mean return is greater for load fund. This will be a two-sample, onetailed t-test of differences in means. The resulting p-value is 0.28 which means we cannot reject the null in favor of load funds having a higher mean return. The difference in sample means is not statistically significant. 44 (p 446). Typical prices of single-family homes in Florida are shown for a random sample of 15 metropolitan areas (Naples Daily News, February 23, 2003). Data are in thousands of dollars and are in the spreadsheet. Metro Area Daytona Beach Fort Lauderdale Fort Myers Fort Walton Beach Gainesville Jacksonville Lakeland Miami Naples Ocala Orlando Pensacola Sarasota-Bradenton Tallahassee Tampa-St. Petersburg Jan-03 117 207 143 139 131 128 91 193 263 86 134 111 168 140 139 Jan-02 96 169 129 134 119 119 85 165 233 90 121 105 141 130 129 Have mean prices changed across the two years? Formulate and test an appropriate hypothesis. H0: means are equal. Ha: they are not. Use a two sample paired test. The p-value is 0.000166, so we reject the null hypothesis. There is a statistically significant difference in means. 46 (p 448). A study claimed that self-employed individuals do not experience greater job satisfaction than individuals who are not self-employed. Job satisfaction was measured using 18 questions with answers ranging from 1 to 5. The total score was the measure of job satisfaction. Scores for individuals in four separate professions are given below and in the spreadsheet. Lawyer Physical Therapist Cabinetmaker Systems Analyst 44 55 54 44 42 78 65 73 74 80 79 71 42 86 69 60 53 60 79 64 50 59 64 66 45 62 59 41 48 52 78 55 64 55 84 76 38 50 60 62 Are the differences in sample mean job satisfaction scores across the four professions statistically significant? Yes, the p-value from the ANOVA single factor test of equality of the four means is only 0.0061. The differences in sample means ARE statistically significant.